Title: | Fst-Heterozygosity Smoothed Quantiles |
---|---|
Description: | A program to generate smoothed quantiles for the Fst-heterozygosity distribution. Designed for use with large numbers of loci (e.g., genome-wide SNPs). The best case for analyzing the Fst-heterozygosity distribution is when many populations (>10) have been sampled. See Flanagan & Jones (2017) <doi:10.1093/jhered/esx048>. |
Authors: | Sarah P. Flanagan and Adam G. Jones |
Maintainer: | Sarah P. Flanagan <[email protected]> |
License: | GPL-2 |
Version: | 1.0.1 |
Built: | 2024-10-31 20:31:47 UTC |
Source: | https://github.com/cran/fsthet |
This counts the number of times each allele occurs at a locus from a list of genotypes (the sum of all the counts is 2*number of individuals).
allele.counts(genotypes)
allele.counts(genotypes)
genotypes |
A list of genotypes. |
AlleleCounts |
The number of times each allele is recorded at the locus. |
#create a random sample of genotypes genotypes<-sample(c("0101","0102","0202"),50,replace=TRUE) counts<-allele.counts(genotypes)
#create a random sample of genotypes genotypes<-sample(c("0101","0102","0202"),50,replace=TRUE) counts<-allele.counts(genotypes)
This is a list with a data.frame of bins (the lower and upper bounds for each heterozygosity bin) and a list of fsts that fall into each bin, with the name of each set of Fst values being the upper heterozygosity bound from the data.frame of bins.
bins
bins
list
bins<-make.bins(fsts)
See Flanagan & Jones
This calcualtes global Fsts from a genepop dataframe. This does not include bootstrapping.
calc.actual.fst(df, fst.choice="fst")
calc.actual.fst(df, fst.choice="fst")
df |
Provide the genepop dataframe (from my.read.genepop). |
fst.choice |
Specify which type of fst calculation should be used. See fst.options.print for the choices. |
fsts |
This returns a dataframe with Locus, Ht, and Fst characters. |
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20), loc0=sample(c("0101","0102","0202"),40,replace=TRUE)) fsts<-calc.actual.fst(gpop) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) fsts<-calc.actual.fst(gpop) ## End(Not run)
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20), loc0=sample(c("0101","0102","0202"),40,replace=TRUE)) fsts<-calc.actual.fst(gpop) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) fsts<-calc.actual.fst(gpop) ## End(Not run)
This calculates allele frequencies from a list of genotypes.
calc.allele.freq(genotypes)
calc.allele.freq(genotypes)
genotypes |
A list of genotypes. |
obs.af |
A list of observed allele frequencies in the genotypes list. |
#create a random sample of genotypes genotypes<-sample(c("0101","0102","0202"),50,replace=TRUE) af<-calc.allele.freq(genotypes)
#create a random sample of genotypes genotypes<-sample(c("0101","0102","0202"),50,replace=TRUE) af<-calc.allele.freq(genotypes)
This calculates Weir & Cockerham (1993)'s beta-hat. Beaumont & Nichols (1996) used this formulation in FDIST2 (and is implemented in Lositan) See the vignette for details on the calculation of beta.
calc.betahat(df, i)
calc.betahat(df, i)
df |
A dataframe containing the genepop information, where the first column is the population ID. |
i |
Column number containing genotype information. |
ht |
HB (or 1-F1). This is a single numerical value. |
fst |
The calculated betahat value ((F0-F1)/(1-F1))for this locus. |
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } bh<-calc.betahat(gpop, 3) #calculate betahat for the SNP gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) beta1<-calc.betahat(gpop,3) #calculate betahat for the first SNP
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } bh<-calc.betahat(gpop, 3) #calculate betahat for the SNP gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) beta1<-calc.betahat(gpop,3) #calculate betahat for the first SNP
This calculates expected heterozygosities from a list of allele frequencies.
calc.exp.het(af)
calc.exp.het(af)
af |
is a list of allele frequencies. |
ht |
The expected heterozygosity under Hardy-Weinberg expectations. This is a single numerical value. |
#create a random sample of genotypes genotypes<-sample(c("0101","0102","0202"),50,replace=TRUE) af<-calc.allele.freq(genotypes) hs<-calc.exp.het(af)
#create a random sample of genotypes genotypes<-sample(c("0101","0102","0202"),50,replace=TRUE) af<-calc.allele.freq(genotypes) hs<-calc.exp.het(af)
This calculates Fst. The caluclation is done as (Ht-Hs)/Ht, where Ht is the expected heterozygosity for all populations and Hs is the expected heterozygosity for each population. This calculation is used in bootstrapping functions.
calc.fst(df, i)
calc.fst(df, i)
df |
A dataframe containing the genepop information, where the first column is the population ID. |
i |
Column number containing genotype information. |
ht |
The expected heterozygosity under Hardy-Weinberg expectations. This is a single numerical value. |
fst |
The calculated Fst value for this locus. |
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20), loc0=sample(c("0101","0102","0202"),40,replace=TRUE)) fst1<-calc.fst(gpop,3) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) fst1<-calc.fst(gpop,3) #calculate fst for the first SNP ## End(Not run)
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20), loc0=sample(c("0101","0102","0202"),40,replace=TRUE)) fst1<-calc.fst(gpop,3) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) fst1<-calc.fst(gpop,3) #calculate fst for the first SNP ## End(Not run)
This calculates Weir (1990)'s theta. See the vignette for details on the calculation of beta.
calc.theta(df, i)
calc.theta(df, i)
df |
A dataframe containing the genepop information, where the first column is the population ID. |
i |
Column number containing genotype information. |
ht |
T2. This is a single numerical value. |
fst |
The calculated theta value (T1/T2) for this locus. |
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20), loc0=sample(c("0101","0102","0202"),40,replace=TRUE)) theta1<-calc.theta(gpop, 3) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) theta1<-calc.theta(gpop,3) #calculate theta for the first SNP ## End(Not run)
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20), loc0=sample(c("0101","0102","0202"),40,replace=TRUE)) theta1<-calc.theta(gpop, 3) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) theta1<-calc.theta(gpop,3) #calculate theta for the first SNP ## End(Not run)
This calculates the mean upper and lower confidence intervals from a list of bootstrap CI matrices.
ci.means(boot.out.list)
ci.means(boot.out.list)
boot.out.list |
A list of matrices. Each matrix is the CIs from fst.boot (boot.out[[3]]). |
avg.cil |
A list of the average lower CI values |
avg.ciu |
A list of the average upper CI values |
## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) quant.out<-fst.boot(gpop, bootstrap = FALSE) quant.list<-ci.means(quant.out[[3]]) ## End(Not run)
## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) quant.out<-fst.boot(gpop, bootstrap = FALSE) quant.list<-ci.means(quant.out[[3]]) ## End(Not run)
Example list of data.frames with smoothed quantiles from fsthet output from numerical simulations The data were generated using a numerical analysis with Nm = 10, 75 demes, and 5 population samples taken. No selection was imposed. This is a list with a single data.frame containing values from 95 percent smoothed quantiles.
cis
cis
list
Ninety-five percent smoothed quantiles, using the dataframe gpop.
See Flanagan & Jones
Example list of CI data.frames from fsthet output from numerical simulations The data were generated using a numerical analysis with Nm = 10, 75 demes, and 5 population samples taken. No selection was imposed. This is a list of data.frames containing values from 99 percent and 95 percent smoothed quantiles.
cis.list
cis.list
list
From multiple smoothed quantile alpha thresholds, using the dataframe gpop.
See Flanagan & Jones
This calcualtes global Fsts from a genepop dataframe and then does: p-value calculations plots the Heterozygosity-Fst relationship with smoothed CIs outputs the loci lying outside the confidence intervals. Returns a data frame containing Locus ID, Ht, Fst, P-value, a Benjamini-Hochberg-corrected P-value, and a true/false value of whether it's an outlier.
fhetboot(gpop, fst.choice="fst", alpha=0.05,nreps=10)
fhetboot(gpop, fst.choice="fst", alpha=0.05,nreps=10)
gpop |
Provide the genepop dataframe (from my.read.genepop). |
fst.choice |
Specify which type of fst calculation should be used. See fst.options.print for the choices. |
alpha |
The alpha value for the confidence intervals and the p-value adjustment calculations (default is 0.05). |
nreps |
The number of bootstrap replicates to use. The default is 10. |
fsts |
This returns a dataframe with Locus, Ht, Fst, P-value, correcte P-value, and True/False of whether it's an outlier. |
## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) out.dat<-fhetboot(gpop, fst.choice="fst", alpha=0.05,nreps=10) ## End(Not run)
## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) out.dat<-fhetboot(gpop, fst.choice="fst", alpha=0.05,nreps=10) ## End(Not run)
This identifies all of the SNPs outside of the smoothed quantiles in the dataset.
find.outliers(df, boot.out, ci.df = NULL, file.name = NULL)
find.outliers(df, boot.out, ci.df = NULL, file.name = NULL)
df |
Provide the dataframe with Ht and Fst values. |
boot.out |
Bootstrap output. You must provide this. |
ci.df |
List of confidence intervals. You may provide this in addition to bootstrap output to save a small amount of time. |
file.name |
You may provide a file name to output the outliers to a csv file. Otherwise, the function will only return the outliers. |
out |
A list of the outlier loci |
## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) fsts<-calc.actual.fst(gpop) boot.out<-as.data.frame(t(replicate(10, fst.boot(gpop)))) outliers<-find.outliers(fsts,boot.out) ## End(Not run)
## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) fsts<-calc.actual.fst(gpop) boot.out<-as.data.frame(t(replicate(10, fst.boot(gpop)))) outliers<-find.outliers(fsts,boot.out) ## End(Not run)
This takes the output from make.bins and calculates the smoothed quantiles.
find.quantiles(bins,bin.fst,ci=0.05)
find.quantiles(bins,bin.fst,ci=0.05)
bins |
A dataframe containing with upper and lower het and Fst values for each bin (output from make.bins). |
bin.fst |
A list with the Fst values for each bin (output from make.bins). |
ci |
A value for the confidence intervals alpha (default is 0.05). |
fst.CI |
A list of data.frames, one for each ci value with the upper and lower Fst quantiles for each Heterozygosity bin. |
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } fsts<-calc.actual.fst(gpop) nloci<-(ncol(gpop)-2) boot.out<-as.data.frame(t(replicate(nloci, fst.boot.onecol(gpop,"fst")))) bins<-make.bins(boot.out,25,Ht.name="V1",Fst.name="V2") fst.CI<-find.quantiles(bins$bins,bins$bin.fst)
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } fsts<-calc.actual.fst(gpop) nloci<-(ncol(gpop)-2) boot.out<-as.data.frame(t(replicate(nloci, fst.boot.onecol(gpop,"fst")))) bins<-make.bins(boot.out,25,Ht.name="V1",Fst.name="V2") fst.CI<-find.quantiles(bins$bins,bins$bin.fst)
This randomly samples all of the loci, with replacement (so if you have 200 loci, it will choose 200 loci to calculate Fst for, but some may be sampled multiply) It makes use of fst.boot.onerow. To calculate the confidence intervals, this function bins the Fst values based on heterozygosity values. The bins are overlapping and each bin is the width of smooth.rate. The Fst value which separates the top 100*(ci/2) and bottom 100*(ci/2) percent in each bin are the upper and lower CIs. This function can be slow. We recommend running it 10 times to generate confidence intervals for analysis.
fst.boot(df,fst.choice="fst",ci=0.05,num.breaks=25, bootstrap = TRUE,min.per.bin=20)
fst.boot(df,fst.choice="fst",ci=0.05,num.breaks=25, bootstrap = TRUE,min.per.bin=20)
df |
A dataframe containing the genepop information, where the first column is the population ID. |
fst.choice |
A character defining which fst calculation is to be used. See fst.options.print() for the choices. |
ci |
A value for the confidence intervals alpha (default is 0.05). |
num.breaks |
The number of breaks used to create bins (default is 25) |
bootstrap |
A TRUE/FALSE statement telling the program whether to bootstrap and then determine the bins or to calculate bins and confidence intervals from the empirical dataset without bootstrapping. The default is TRUE, which means bootstrapping occurs. |
min.per.bin |
The minimum number of loci that are required for a bin to be retained. Default is 20. |
Fsts |
The bootstrapped Fst and Ht values |
Bins |
A dataframe containing the bins start and stop Ht values. |
fst.CI |
A list of dataframes containing the lower and upper confidence intervals' Ht values. |
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } fsts<-calc.actual.fst(gpop) quant.out<-as.data.frame(t(replicate(1, fst.boot(gpop,bootstrap=FALSE)))) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) fsts<-calc.actual.fst(gpop) quant.out<-as.data.frame(t(replicate(1, fst.boot(gpop,bootstrap=FALSE)))) ## End(Not run)
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } fsts<-calc.actual.fst(gpop) quant.out<-as.data.frame(t(replicate(1, fst.boot(gpop,bootstrap=FALSE)))) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) fsts<-calc.actual.fst(gpop) quant.out<-as.data.frame(t(replicate(1, fst.boot(gpop,bootstrap=FALSE)))) ## End(Not run)
This calculates mean heterozygosity and Fst values for each bin used in bootstrapping.
fst.boot.means(boot.out)
fst.boot.means(boot.out)
boot.out |
The first item in the output lists from fst.boot (aka boot.out[[1]]. |
bmu |
A dataframe containing four columns: heterozygosity Fst the number of loci in the bin the lower Ht value for the bin and the upper Ht value for the bin. |
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } fsts<-calc.actual.fst(gpop) boot.out<-as.data.frame(t(replicate(1, fst.boot(gpop)))) outliers<-find.outliers(fsts,boot.out) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) fsts<-calc.actual.fst(gpop) boot.out<-as.data.frame(t(replicate(10, fst.boot(gpop)))) outliers<-find.outliers(fsts,boot.out) ## End(Not run)
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } fsts<-calc.actual.fst(gpop) boot.out<-as.data.frame(t(replicate(1, fst.boot(gpop)))) outliers<-find.outliers(fsts,boot.out) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) fsts<-calc.actual.fst(gpop) boot.out<-as.data.frame(t(replicate(10, fst.boot(gpop)))) outliers<-find.outliers(fsts,boot.out) ## End(Not run)
This calculates Fst using calc.fst. It randomly selects a column containing genotype information for all individuals. It then calculates Fst and Ht for that locus.
fst.boot.onecol(df, fst.choice)
fst.boot.onecol(df, fst.choice)
df |
A dataframe containing the genepop information, where the first column is the population ID. |
fst.choice |
A character defining which fst calculation is to be used. The three options are: Wright's Fst (Wright, wright, WRIGHT, W, w) Weir and Cockerham 1993's beta (WeirCockerham,weircockerham,wc,WC) Corrected Weir and Cockerham 1993's beta from Beaumont and Nichols 1996 (WeirCockerhamCorrected, weircockerhamcorrected,corrected,wcc,WCC) |
ht.fst |
A vector containin Ht and Fst |
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } fsts<-calc.actual.fst(gpop) nloci<-(ncol(gpop)-2) boot.out<-as.data.frame(t(replicate(nloci, fst.boot.onecol(gpop,"fst")))) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) fsts<-calc.actual.fst(gpop) nloci<-(ncol(gpop)-2) boot.out<-as.data.frame(t(replicate(nloci, fst.boot.onecol(gpop,"fst")))) outliers<-find.outliers(fsts,boot.out) ## End(Not run)
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } fsts<-calc.actual.fst(gpop) nloci<-(ncol(gpop)-2) boot.out<-as.data.frame(t(replicate(nloci, fst.boot.onecol(gpop,"fst")))) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) fsts<-calc.actual.fst(gpop) nloci<-(ncol(gpop)-2) boot.out<-as.data.frame(t(replicate(nloci, fst.boot.onecol(gpop,"fst")))) outliers<-find.outliers(fsts,boot.out) ## End(Not run)
This prints the options for choosing an Fst calculation.
fst.options.print()
fst.options.print()
fst.options.print()
fst.options.print()
This calcualtes global Fsts from a genepop dataframe and then does: calculates smoothed quantiles plots the Heterozygosity-Fst relationship with smoothed quantiles outputs the loci lying outside the quantiles. Returns a data frame containing Locus ID, Ht, Fst, and a true/false value of whether it's an outlier.
fsthet(gpop, fst.choice="fst", alpha=0.05)
fsthet(gpop, fst.choice="fst", alpha=0.05)
gpop |
Provide the genepop dataframe (from my.read.genepop). |
fst.choice |
Specify which type of fst calculation should be used. See fst.options.print for the choices. |
alpha |
The alpha value for the quantiles (default is 0.05 to generate 95 percent quantiles). |
fsts |
This returns a dataframe with Locus, Ht, Fst, and True/False of whether it's an outlier. |
## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) out.dat<-fsthet(gpop) ## End(Not run)
## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) out.dat<-fsthet(gpop) ## End(Not run)
Example fst calculations from a genepop file. The original data were generated by using a numerical analysis with Nm = 10, 75 demes, and 5 population samples taken. No selection was imposed. The fsts were calculated using calc.actual.fst(gpop) This file contains a dataframe with 2000 columns and 3 rows. The first column is the Locus ID, the second column is the Ht for that locus, and the third column is the Fst for that locus.
fsts
fsts
data.frame
Generated by numerical analysis
See Flanagan & Jones
Example fst calculations using beta (fst.choice="var") from a genepop file. The original data were generated by using a numerical analysis with Nm = 10, 75 demes, and 5 population samples taken. No selection was imposed. The fsts were calculated using calc.actual.fst(gpop,fst.choice="var") This file contains a dataframe with 2000 columns and 3 rows. The first column is the Locus ID, the second column is the Ht for that locus, and the third column is the Fst for that locus.
fsts.beta
fsts.beta
data.frame
Generated by numerical analysis
See Flanagan & Jones
Example fst calculations using betahat (fst.choice="betahat") from a genepop file. The original data were generated by using a numerical analysis with Nm = 10, 75 demes, and 5 population samples taken. No selection was imposed. The fsts were calculated using calc.actual.fst(gpop,fst.choice="betahat") This file contains a dataframe with 2000 columns and 3 rows. The first column is the Locus ID, the second column is the Ht for that locus, and the third column is the Fst for that locus.
fsts.betahat
fsts.betahat
data.frame
Generated by numerical analysis
See Flanagan & Jones
Example fst calculations using theta (fst.choice="theta") from a genepop file. The original data were generated by using a numerical analysis with Nm = 10, 75 demes, and 5 population samples taken. No selection was imposed. The fsts were calculated using calc.actual.fst(gpop,fst.choice="theta") This file contains a dataframe with 2000 columns and 3 rows. The first column is the Locus ID, the second column is the Ht for that locus, and the third column is the Fst for that locus.
fsts.theta
fsts.theta
data.frame
Generated by numerical analysis
See Flanagan & Jones
Example genepop file from numerical simulations. It was generated by using a numerical analysis with Nm = 10, 75 demes, and 5 population samples taken. No selection was imposed. This file contains a dataframe with 2002 columns and 250 rows. The first two columns are the population name and the individual name. The remaining columns are genotypes for each locus (one column per locus). Each row is an individual.
gpop
gpop
data.frame
Generated by numerical analysis
See Flanagan & Jones
This breaks up Fst values into a designated number of overlapping heterozygosity bins. It returns a list containing a data.frame called bins a list called bin.fst with the Fst values for each of the Het categories.
make.bins(fsts,num.breaks=25, Ht.name="Ht", Fst.name="Fst",min.per.bin=20)
make.bins(fsts,num.breaks=25, Ht.name="Ht", Fst.name="Fst",min.per.bin=20)
fsts |
A dataframe containing at least the columns with heterozygosity and Fst values. |
num.breaks |
The number of breaks used to create bins (default is 25) |
Ht.name |
Provide the name of the column with the heterozygosity values, unless the column is named "Ht". |
Fst.name |
Provide the name of the column with the Fst values, unless the column is named "Fst". |
min.per.bin |
If you have a smaller dataset, you can change the minimum number of loci required to be in each bin. Default is 20. |
list(bins , bin.fst)
|
A list with a data.frame called bins with the upper and lower Fst and Ht values and a list called bin.fst with the Fst values for each of the Het categories. |
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } fsts<-calc.actual.fst(gpop) nloci<-(ncol(gpop)-2) boot.out<-as.data.frame(t(replicate(nloci, fst.boot.onecol(gpop,"fst")))) bins<-make.bins(boot.out,25,Ht.name="V1",Fst.name="V2") ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) fsts<-calc.actual.fst(gpop) nloci<-(ncol(gpop)-2) boot.out<-as.data.frame(t(replicate(nloci, fst.boot.onecol(gpop)))) make.bins(boot.out,25) ## End(Not run)
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20)) for(i in 1:40){ gpop[1:20,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) gpop[21:40,(i+2)]<-sample(c("0101","0102","0202"),20,replace=TRUE) } fsts<-calc.actual.fst(gpop) nloci<-(ncol(gpop)-2) boot.out<-as.data.frame(t(replicate(nloci, fst.boot.onecol(gpop,"fst")))) bins<-make.bins(boot.out,25,Ht.name="V1",Fst.name="V2") ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) fsts<-calc.actual.fst(gpop) nloci<-(ncol(gpop)-2) boot.out<-as.data.frame(t(replicate(nloci, fst.boot.onecol(gpop)))) make.bins(boot.out,25) ## End(Not run)
This reads a genepop file into R. It was adapted from a similar functionin adegenet.
my.read.genepop(file, ncode = 2L, quiet = FALSE)
my.read.genepop(file, ncode = 2L, quiet = FALSE)
file |
is the filename of the genpop file. |
quiet |
If quiet = FALSE updates will be printed. If quiet = T status updates will not be printed. |
ncode |
Do not change this argument. |
res |
A dataframe with the Population ID in column 1, the Individual ID in column 2, and the genotypes in columns following that. There is one row per individual. |
http://adegenet.r-forge.r-project.org/
gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile)
gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile)
This calculates mean heterozygosity and Fst values for each bin used in bootstrapping.
p.boot(actual.fsts, boot.out,boot.means=NULL)
p.boot(actual.fsts, boot.out,boot.means=NULL)
actual.fsts |
The first item in the output lists from fst.boot. |
boot.out |
The output from a bootstrapping run. Either supply this or boot.means. |
boot.means |
The output from fst.boot.means. Either supply this or bootstrapping output. |
pvals |
A numeric containing uncorrected p-values for each locus. The names attribute are the locus names. |
## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) fsts<-calc.actual.fst(gpop) boot.out<-as.data.frame(t(replicate(10, fst.boot(gpop)))) boot.pvals<-p.boot(fsts,boot.out=boot.out) ## End(Not run)
## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.genepop(gfile) fsts<-calc.actual.fst(gpop) boot.out<-as.data.frame(t(replicate(10, fst.boot(gpop)))) boot.pvals<-p.boot(fsts,boot.out=boot.out) ## End(Not run)
This plots a dataframe of fsts with bootstrapped confidence intervals.
plotting.cis(df,boot.out,ci.df=NULL,sig.list=NULL,Ht.name="Ht",Fst.name="Fst", ci.col="red", pt.pch=1,file.name=NULL,sig.col=ci.col,make.file=TRUE)
plotting.cis(df,boot.out,ci.df=NULL,sig.list=NULL,Ht.name="Ht",Fst.name="Fst", ci.col="red", pt.pch=1,file.name=NULL,sig.col=ci.col,make.file=TRUE)
df |
A dataframe of Fst and Ht values. It must have at least two columns, one named "Ht" and one named "Fst". Or you must pass the column names to the function |
boot.out |
Bootstrap output. You must either provide this or a list of confidence interval values. |
ci.df |
Data frame of confidence intervals. You must either provide this or bootstrap output. |
sig.list |
List of significant locus names (this acts as a way to highlight particular loci). This is optional and colors some of the points using the same shape as pt.pch and the color of sig.col (default sig.color is same as ci.col). |
Ht.name |
Provide the name of the column with the heterozygosity values, unless the column is named "Ht". |
Fst.name |
Provide the name of the column with the Fst values, unless the column is named "Fst". |
ci.col |
You can input the colors of the confidence intervals to be plotted. First is the 95 percent CI, second is the 99 percent CI. Defaults are "red" and "gold". |
pt.pch |
You can change the point shape here. Default is 1 (open circles) |
sig.col |
The color of the significant loci, if that option is taken. The default is the same color as the confidence interval. |
file.name |
You can provide the filename. If not provided, default is "OutlierLoci" in the current directory. |
make.file |
A boolean value (TRUE or FALSE). If TRUE, a file will be created with the plot. If FALSE, the plot will be made in R only (and can be further annotated). |
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20), loc0=sample(c("0101","0102","0202"),40,replace=TRUE), loc1=sample(c("0101","0102","0202"),40,replace=TRUE)) fsts<-calc.actual.fst(gpop) bins<-make.bins(fsts) cis<-find.quantiles(bins = bins$bins,bin.fst = bins$bin.fst) quant.list<-cis$CI0.95 plotting.cis(df=fsts,ci.df=quant.list,make.file=FALSE) ## Not run: load(fsts) bins<-make.bins(fsts) cis<-find.quantiles(bins = bins$bins,bin.fst = bins$bin.fst) quant.list<-cis$CI0.95 plotting.cis(df=fsts,ci.df=quant.list,make.file=FALSE) ## End(Not run)
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20), loc0=sample(c("0101","0102","0202"),40,replace=TRUE), loc1=sample(c("0101","0102","0202"),40,replace=TRUE)) fsts<-calc.actual.fst(gpop) bins<-make.bins(fsts) cis<-find.quantiles(bins = bins$bins,bin.fst = bins$bin.fst) quant.list<-cis$CI0.95 plotting.cis(df=fsts,ci.df=quant.list,make.file=FALSE) ## Not run: load(fsts) bins<-make.bins(fsts) cis<-find.quantiles(bins = bins$bins,bin.fst = bins$bin.fst) quant.list<-cis$CI0.95 plotting.cis(df=fsts,ci.df=quant.list,make.file=FALSE) ## End(Not run)
Example fsthet output based on numerical simulations Allelic information was generated by using a numerical analysis with Nm = 10, 75 demes, and 5 population samples taken. No selection was imposed. This is a list of three structures. The first is a data.frame containing the Ht and Fst values. The second is a data.frame of the bins with the lower heterozygosity values and the upper heterozygosity values for each bin. The third is a list of data.frames with the lower (Low) and upper (Upp) Fst values for each bin (the bins are in "LowHet" and "UppHet" columns.)
quant.out
quant.out
list
Smoothed quantiles generated from the dataframe gpop.
See Flanagan & Jones
This removes spaces from a before and after words in a character vector. It was adapted from a similar function in adegenet.
remove.spaces(charvec)
remove.spaces(charvec)
charvec |
is a vector of characters containing spaces to be removed. |
charvec |
A vector of characters without spaces |
http://adegenet.r-forge.r-project.org/
charvec<-c("this ", " is"," a"," test") remove.spaces(charvec)
charvec<-c("this ", " is"," a"," test") remove.spaces(charvec)
This calculates Weir & Cockerham (1993)'s Fst. The caluclation is based on variance in allele frequencies. See the vignette for details on the calculation of beta.
var.fst(df, i)
var.fst(df, i)
df |
A dataframe containing the genepop information, where the first column is the population ID. |
i |
Column number containing genotype information. |
ht |
2pbar(1-pbar). This is a single numerical value. |
fst |
The calculated beta value for this locus. |
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20), loc0=sample(c("0101","0102","0202"),40,replace=TRUE), loc1=sample(c("0101","0102","0202"),40,replace=TRUE)) var1<-var.fst(gpop,3) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) var1<-var.fst(gpop,3) #calculate variance-based for the first SNP ## End(Not run)
gpop<-data.frame(popinfo=c(rep("POP 1", 20),rep("POP 2", 20)),ind.names=c(1:20,1:20), loc0=sample(c("0101","0102","0202"),40,replace=TRUE), loc1=sample(c("0101","0102","0202"),40,replace=TRUE)) var1<-var.fst(gpop,3) ## Not run: gfile<-system.file("extdata", "example.genepop.txt",package = 'fsthet') gpop<-my.read.gpop(gfile) var1<-var.fst(gpop,3) #calculate variance-based for the first SNP ## End(Not run)