adegenet-package: The adegenet package
Description
This package is devoted to the multivariate analysis of
genetic markers data. These data can be codominant markers (e.g. microsatellites) or
presence/absence data (e.g. AFLP), and have any level of ploidy.
'adegenet' defines two formal (S4) classes:
- genind: a class for data of individuals ("genind" stands for genotypes-individuals).
- genpop: a class for data of groups of individuals ("genpop" stands for genotypes-populations)
For more information about these classes, type "class ? genind" or
"class ? genpop".
Both types of objects store information from molecular markers in a matrix ($tab slot),
that can be directly analyzed using multivariate methods such as
Principal Component Analysis, Correspondance Analysis, etc. See the
"dudi.[...]" methods in the ade4
package. Moreover, this
package offers methods for manipulating and analyzing information
coming from genetic markers (see below).
=== IMPORTING DATA ===
adegenet
imports data to genind object from the
following softwares:
- STRUCTURE: see read.structure
- GENETIX: see read.genetix
- FSTAT: see read.fstat
- Genepop: see read.genepop
To import data from any of these formats, you can also use the general
function import2genind
.
- DNA files: use read.dna
from the ape package,
and then extract SNPs from DNA alignments using
DNAbin2genind
.
- protein sequences alignments: polymorphic sites can be extracted from
protein sequences alignments in alignment
format (package
seqinr
, see as.alignment
) using the
function alignment2genind
.
It is also possible to read genotypes coded by character strings from
a data.frame in which genotypes are in rows, markers in columns. For
this, use df2genind
. Note that df2genind
can be used for any level of ploidy.
=== EXPORTING DATA ===
adegenet
exports data from genind object to
formats recognized by other R packages:
- the genetics package: see genind2genotype
- the hierfstat package: see genind2hierfstat
Genotypes can also be recoded from a genind object into
a data.frame of character strings, using any separator between
alleles. This covers formats from many softwares like GENETIX or
STRUCTURE. For this, see genind2df
.
=== MANIPULATING DATA ===
Several functions allow one to manipulate genind or
genpop objects
- genind2genpop
: convert a genind object
to a genpop
- seploc
: creates one object per marker
- seppop
: creates one object per population
- na.replace
: replaces missing data (NA) in an
approriate way
- truenames
: restores true names of an object
(genind and genpop use generic labels)
- x[i,j]: create a new object keeping only genotypes (or populations)
indexed by 'i' and the alleles indexed by 'j'.
- makefreq
: returns a table of allelic frequencies from
a genpop object.
- repool
merges genoptypes from different
gene pools into one single genind object.
- propTyped
returns the proportion of available (typed)
data, by individual, population, and/or locus.
- selPopSize
subsets data, retaining only genotypes
from a population whose sample size is above a given level.
- pop
sets the population of a set of genotypes.
=== ANALYZING DATA ===
Several functions allow to use usual, and less usual analyses:
- HWE.test.genind
: performs HWE test for all
populations and loci combinations
- pairwise.fst
: computes simple pairwise Fst between populations
- gstat.randtest
: performs a Monte Carlo test of Goudet's G statistic, measuring
population structure (based on g.stats.glob
package hierfstat
).
- dist.genpop
: computes 5 genetic distances among populations.
- monmonier
: implementation of the Monmonier algorithm,
used to seek genetic boundaries among individuals or
populations. Optimized boundaries can be obtained using
optimize.monmonier
. Object of the class
monmonier
can be plotted and printed using the corresponding
methods.
- spca
: implements Jombart et al. (in revision) spatial
Principal Component Analysis
- global.rtest
: implements Jombart et al. (2008)
test for global spatial structures
- local.rtest
: implements Jombart et al. (2008)
test for local spatial structures
- propShared
: computes the proportion of shared
alleles in a set of genotypes (i.e. from a genind object)
- propTyped
: function to investigate missing data in
several ways
- scaleGen
: generic method to scale
genind or genpop before a principal
component analysis
- Hs
: computes the average expected heterozygosity by
population in a genpop. Classically Used as a measure
of genetic diversity.
- find.clusters
and dapc
: implement the
Discriminant Analysis of Principal Component (DAPC, Jombart et al.,
2010).
- seqTrack
: implements the SeqTrack algorithm for
recontructing transmission trees of pathogens (Jombart et al.,
2010) .
=== GRAPHICS ===
- colorplot
: plots points with associated values for up
to three variables represented by colors using the RGB system;
useful for spatial mapping of principal components.
- loadingplot
: plots loadings of variables. Useful for
representing the contribution of alleles to a given principal
component in a multivariate method.
=== SIMULATING DATA ===
- hybridize
: implements hybridization between two populations.
- haploGen
: simulates genealogies of haplotypes,
storing full genomes.
- haploPop
: simulates populations of haplotypes, using
different population dynamics, storing SNPs (under development).
=== DATASETS ===
- H3N2
: Seasonal influenza (H3N2) HA segment data.
- dapcIllus
: Simulated data illustrating the DAPC.
- eHGDP
: Extended HGDP-CEPH dataset.
- microbov
: Microsatellites genotypes of 15 cattle breeds.
- nancycats
: Microsatellites genotypes of 237 cats from 17 colonies of Nancy (France).
- rupica
: Microsatellites genotypes of 335 chamois
(Rupicapra rupicapra) from the Bauges mountains (France).
- sim2pop
: Simulated genotypes of two georeferenced populations.
- spcaIllus
: Simulated data illustrating the sPCA.
For more information, visit the adegenet website by typing
adegenetWeb()
.
Tutorials are available on the adegenet website, or by typing
adegenetTutorial()
.
To cite adegenet, please use the reference given by
citation("adegenet")
(or see reference below).Details
ll{
Package: adegenet
Type: Package
Version: 1.2-7
Date: 2010-10-28
License: GPL (>=2)
}References
Jombart T. (2008) adegenet: a R package for the multivariate analysis
of genetic markers Bioinformatics 24: 1403-1405. doi:
10.1093/bioinformatics/btn129
Jombart T, Devillard S and Balloux F (2010) Discriminant analysis of
principal components: a new method for the analysis of genetically
structured populations. BMC Genetics 11:94.
doi:10.1186/1471-2156-11-94
Jombart T, Eggo R, Dodd P, Balloux F (2010) Reconstructing disease
outbreaks from genetic data: a graph approach. Heredity. doi:
10.1038/hdy.2010.78.
Jombart, T., Devillard, S., Dufour, A.-B. and Pontier, D. Revealing
cryptic spatial patterns in genetic variability by a new multivariate
method. Heredity, 101, 92--103.
See adegenet website: http://adegenet.r-forge.r-project.org/
Please post your questions on 'the adegenet forum': adegenet-forum@lists.r-forge.r-project.orgSee Also
adegenet is related to several packages, in particular:
- ade4
for multivariate analysis
- ape
for phylogenetics and DNA data handling
- pegas
for population genetics tools
- seqinr
for handling nucleic and proteic sequences