adegenet-package: The adegenet package
Description
This package is devoted to the multivariate analysis of
genetic markers data. These data can be codominant markers (e.g. microsatellites) or
presence/absence data (e.g. AFLP), and have any level of ploidy.
'adegenet' defines two formal (S4) classes:
- genind: a class for data of individuals ("genind" stands for genotypes-individuals).
- genpop: a class for data of groups of individuals ("genpop" stands for genotypes-populations)
For more information about these classes, type "class ? genind" or
"class ? genpop".
Both types of objects store information from molecular markers in a matrix ($tab slot),
that can be directly analyzed using multivariate methods such as
Principal Component Analysis, Correspondance Analysis, etc. See the
"dudi.[...]" methods in the ade4 package. Moreover, this
package offers methods for manipulating and analyzing information
coming from genetic markers (see below).
=== IMPORTING DATA ===
adegenet imports data to genind object from the
following softwares:
- STRUCTURE: see read.structure
- GENETIX: see read.genetix
- FSTAT: see read.fstat
- Genepop: see read.genepop
To import data from any of these formats, you can also use the general
function import2genind.
- DNA files: use read.dna from the ape package,
and then extract SNPs from DNA alignments using
DNAbin2genind.
- protein sequences alignments: polymorphic sites can be extracted from
protein sequences alignments in alignment format (package
seqinr, see as.alignment) using the
function alignment2genind.
It is also possible to read genotypes coded by character strings from
a data.frame in which genotypes are in rows, markers in columns. For
this, use df2genind. Note that df2genind
can be used for any level of ploidy.
=== EXPORTING DATA ===
adegenet exports data from genind object to
formats recognized by other R packages:
- the genetics package: see genind2genotype
- the hierfstat package: see genind2hierfstat
Genotypes can also be recoded from a genind object into
a data.frame of character strings, using any separator between
alleles. This covers formats from many softwares like GENETIX or
STRUCTURE. For this, see genind2df.
=== MANIPULATING DATA ===
Several functions allow one to manipulate genind or
genpop objects
- genind2genpop: convert a genind object
to a genpop
- seploc: creates one object per marker
- seppop: creates one object per population
- na.replace: replaces missing data (NA) in an
approriate way
- truenames: restores true names of an object
(genind and genpop use generic labels)
- x[i,j]: create a new object keeping only genotypes (or populations)
indexed by 'i' and the alleles indexed by 'j'.
- makefreq: returns a table of allelic frequencies from
a genpop object.
- repool merges genoptypes from different
gene pools into one single genind object.
- propTyped returns the proportion of available (typed)
data, by individual, population, and/or locus.
- selPopSize subsets data, retaining only genotypes
from a population whose sample size is above a given level.
- pop sets the population of a set of genotypes.
=== ANALYZING DATA ===
Several functions allow to use usual, and less usual analyses:
- HWE.test.genind: performs HWE test for all
populations and loci combinations
- pairwise.fst: computes simple pairwise Fst between populations
- gstat.randtest: performs a Monte Carlo test of Goudet's G statistic, measuring
population structure (based on g.stats.glob package hierfstat).
- dist.genpop: computes 5 genetic distances among populations.
- monmonier: implementation of the Monmonier algorithm,
used to seek genetic boundaries among individuals or
populations. Optimized boundaries can be obtained using
optimize.monmonier. Object of the class
monmonier can be plotted and printed using the corresponding
methods.
- spca: implements Jombart et al. (in revision) spatial
Principal Component Analysis
- global.rtest: implements Jombart et al. (2008)
test for global spatial structures
- local.rtest: implements Jombart et al. (2008)
test for local spatial structures
- propShared: computes the proportion of shared
alleles in a set of genotypes (i.e. from a genind object)
- propTyped: function to investigate missing data in
several ways
- scaleGen: generic method to scale
genind or genpop before a principal
component analysis
- Hs: computes the average expected heterozygosity by
population in a genpop. Classically Used as a measure
of genetic diversity.
- find.clusters and dapc: implement the
Discriminant Analysis of Principal Component (DAPC, Jombart et al.,
2010).
- seqTrack: implements the SeqTrack algorithm for
recontructing transmission trees of pathogens (Jombart et al.,
2010) .
=== GRAPHICS ===
- colorplot: plots points with associated values for up
to three variables represented by colors using the RGB system;
useful for spatial mapping of principal components.
- loadingplot: plots loadings of variables. Useful for
representing the contribution of alleles to a given principal
component in a multivariate method.
=== SIMULATING DATA ===
- hybridize: implements hybridization between two populations.
- haploGen: simulates genealogies of haplotypes,
storing full genomes.
- haploPop: simulates populations of haplotypes, using
different population dynamics, storing SNPs (under development).
=== DATASETS ===
- H3N2: Seasonal influenza (H3N2) HA segment data.
- dapcIllus: Simulated data illustrating the DAPC.
- eHGDP: Extended HGDP-CEPH dataset.
- microbov: Microsatellites genotypes of 15 cattle breeds.
- nancycats: Microsatellites genotypes of 237 cats from 17 colonies of Nancy (France).
- rupica: Microsatellites genotypes of 335 chamois
(Rupicapra rupicapra) from the Bauges mountains (France).
- sim2pop: Simulated genotypes of two georeferenced populations.
- spcaIllus: Simulated data illustrating the sPCA.
For more information, visit the adegenet website by typing
adegenetWeb().
Tutorials are available on the adegenet website, or by typing
adegenetTutorial().
To cite adegenet, please use the reference given by
citation("adegenet") (or see reference below).Details
ll{
Package: adegenet
Type: Package
Version: 1.2-7
Date: 2010-10-28
License: GPL (>=2)
}References
Jombart T. (2008) adegenet: a R package for the multivariate analysis
of genetic markers Bioinformatics 24: 1403-1405. doi:
10.1093/bioinformatics/btn129
Jombart T, Devillard S and Balloux F (2010) Discriminant analysis of
principal components: a new method for the analysis of genetically
structured populations. BMC Genetics 11:94.
doi:10.1186/1471-2156-11-94
Jombart T, Eggo R, Dodd P, Balloux F (2010) Reconstructing disease
outbreaks from genetic data: a graph approach. Heredity. doi:
10.1038/hdy.2010.78.
Jombart, T., Devillard, S., Dufour, A.-B. and Pontier, D. Revealing
cryptic spatial patterns in genetic variability by a new multivariate
method. Heredity, 101, 92--103.
See adegenet website: http://adegenet.r-forge.r-project.org/
Please post your questions on 'the adegenet forum': adegenet-forum@lists.r-forge.r-project.orgSee Also
adegenet is related to several packages, in particular:
- ade4 for multivariate analysis
- ape for phylogenetics and DNA data handling
- pegas for population genetics tools
- seqinr for handling nucleic and proteic sequences