dapc: Discriminant Analysis of Principal Components (DAPC)

Description

These functions implement the Discriminant Analysis of Principal Components (DAPC). See 'details' section for a succint description of the method. DAPC implementation calls upon dudi.pca from the ade4 package and lda from the MASS package.

dapc performs the DAPC on a data.frame, a matrix, or a genind object, and returns an object with class dapc. If data are stored in a data.frame or a matrix, these have to be quantitative data (i.e., numeric or integers), as opposed to characters or factors.

Other functions are: - print.dapc: prints the content of a dapc object. - summary.dapc: extracts useful information from a dapc object. - scatter.dapc: produces scatterplots of principal components (or 'discriminant functions'), with a screeplot of eigenvalues as inset. - assignplot: plot showing the probabilities of assignment of individuals to the different clusters.

Usage

## S3 method for class 'data.frame':
dapc(x, grp, n.pca=NULL, n.da=NULL, center=TRUE,
     scale=FALSE,var.contrib=FALSE, pca.select=c("nbEig","percVar"),
    perc.pca=NULL, ..., dudi=NULL)
## S3 method for class 'matrix':
dapc(x, \ldots)
## S3 method for class 'genind':
dapc(x, pop=NULL, n.pca=NULL, n.da=NULL, scale=FALSE,
     scale.method=c("sigma", "binom"), truenames=TRUE, all.contrib=FALSE,
     pca.select=c("nbEig","percVar"), perc.pca=NULL, ...)
## S3 method for class 'dudi':
dapc(x, grp, \ldots)
## S3 method for class 'dapc':
print(x, \dots)
## S3 method for class 'dapc':
summary(object, \dots)
## S3 method for class 'dapc':
scatter(x, xax=1, yax=2,
        col=rainbow(length(levels(x$grp))), posi="bottomleft", bg="grey",
        ratio=0.3, csub=1.2, ...)
assignplot(x, only.grp=NULL, subset=NULL, cex.lab=.75, pch=3)

Arguments

a data.frame, matrix, or genind object. For the data.frame and matrix arguments, only quantitative variables should be provided.

grp,pop

a factor indicating the group membership of individuals

n.pca

an integer indicating the number of axes retained in the Principal Component Analysis (PCA) step. If NULL, interactive selection is triggered.

n.da

an integer indicating the number of axes retained in the Discriminant Analysis step. If NULL, interactive selection is triggered.

center

a logical indicating whether variables should be centred to mean 0 (TRUE, default) or not (FALSE). Always TRUE for genind objects.

scale

a logical indicating whether variables should be scaled (TRUE) or not (FALSE, default). Scaling consists in dividing variables by their (estimated) standard deviation to account for trivial differences in variances. Further scaling opti

var.contrib,all.contrib

a logical indicating whether the contribution of original variables (alleles, for genind objects) should be provided (TRUE) or not (FALSE, default). Such output can be useful, but can also create huge matrices

pca.select

a character indicating the mode of selection of PCA axes, matching either "nbEig" or "percVar". For "nbEig", the user has to specify the number of axes retained (interactively, or via n.pca). For "percVar", the user has to

perc.pca

a numeric value between 0 and 100 indicating the minimal percentage of the total variance of the data to be expressed by the retained axes of PCA.

...

further arguments to be passed to other functions. For dapc.matrix, arguments are to match those of dapc.data.frame.

object

a dapc object.

scale.method

a character specifying the scaling method to be used for allele frequencies, which must match "sigma" (usual estimate of standard deviation) or "binom" (based on binomial distribution). See scaleGen<

truenames

a logical indicating whether true (i.e., user-specified) labels should be used in object outputs (TRUE, default) or not (FALSE).

xax,yax

integers specifying which principal components of DAPC should be shown in x and y axes.

col

a suitable color to be used for groups. The specified vector should match the number of groups, not the number of individuals.

posi,bg,ratio,csub

arguments used to customize the inset in scatterplots of DAPC results. See add.scatter documentation in the ade4 package for more details.

only.grp

a character vector indicating which groups should be displayed. Values should match values of x$grp. If NULL, all results are displayed

subset

integer or logical vector indicating which individuals should be displayed. If NULL, all results are displayed

cex.lab

a numeric indicating the size of labels.

pch

a numeric indicating the type of point to be used to indicate the prior group of individuals (see points documentation for more details).

dudi

optionally, a multivariate analysis with the class dudi (from the ade4 package). If provided, prior PCA will be ignored, and this object will be used as a prior step for variable orthogonalisation.

Value

=== dapc objects === The class dapc is a list with the following components:
callthe matched call.
n.pcanumber of PCA axes retained
n.danumber of DA axes retained
varproportion of variance conserved by PCA principal components
eiga numeric vector of eigenvalues.
grpa factor giving prior group assignment
priora numeric vector giving prior group probabilities
assigna factor giving posterior group assignment
tabmatrix of retained principal components of PCA
loadingsprincipal axes of DAPC, giving coefficients of the linear combination of retained PCA axes.
ind.coordprincipal components of DAPC, giving the coordinates of individuals onto principal axes of DAPC; also called the discriminant functions.
grp.coordcoordinates of the groups onto the principal axes of DAPC.
posteriora data.frame giving posterior membership probabilities for all individuals and all clusters.
var.contr(optional) a data.frame giving the contributions of original variables (alleles in the case of genetic data) to the principal components of DAPC.
=== other outputs === Other functions have different outputs: - summary.dapc returns a list with 6 components: n.dim (number of retained DAPC axes), n.pop (number of groups/populations), assign.prop (proportion of overall correct assignment), assign.per.pop (proportion of correct assignment per group), prior.grp.size (prior group sizes), and post.grp.size (posterior group sizes).
- scatter.dapc, assignplot return the matched call.

encoding

UTF-8

Details

The Discriminant Analysis of Principal Components (DAPC) is designed to investigate the genetic structure of biological populations. This multivariate method consists in a two-steps procedure. First, genetic data are transformed (centred, possibly scaled) and submitted to a Principal Component Analysis (PCA). Second, principal components of PCA are submitted to a Linear Discriminant Analysis (LDA). A trivial matrix operation allows to express discriminant functions as linear combination of alleles, therefore allowing one to compute allele contributions. More details about the computation of DAPC are to be found in the indicated reference.

DAPC does not infer genetic clusters ex nihilo; for this, see the find.clusters function.

References

Jombart T, Devillard S and Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genetics11:94. doi:10.1186/1471-2156-11-94

Examples

Run this code

## data(dapcIllus), data(eHGDP), and data(H3N2) illustrate the dapc
## see ?dapcIllus, ?eHGDP, ?H3N2
##

example(dapcIllus)


example(eHGDP)
example(H3N2)

Run the code above in your browser using DataLab