eHGDP: Extended HGDP-CEPH dataset

Description

This dataset consists of 1350 individuals from native Human populations distributed worldwide typed at 678 microsatellite loci. The original HGDP-CEPH panel [1-3] has been extended by several native American populations [4]. This dataset was used to illustrate the Discriminant Analysis of Principal Components (DAPC, [5]).

Arguments

Format

eHGDP is a genind object with a data frame named popInfo as supplementary component (eHGDP@other$popInfo), which contains the following variables:

Population:: a character vector indicating populations.
Region:: a character vector indicating the geographic region of each population.
Label:: a character vector indicating the correspondence with population labels used in the genind object (i.e., as output by pop(eHGDP)).
Latitude,Longitude:: geographic coordinates of the populations, indicated as north and east degrees.

Source

Original panel by Human Genome Diversity Project (HGDP) and Centre d'Etude du Polymorphisme Humain (CEPH). See reference [4] for Native American populations. This copy of the dataset was prepared by Francois Balloux.

References

[1] Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, et al. (2002) Genetic structure of human populations. Science 298: 2381-2385.

[2] Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, et al. (2005) Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci U S A 102: 15942-15947.

[3] Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, et al. (2002) A human genome diversity cell line panel. Science 296: 261-262.

[4] Wang S, Lewis CM, Jakobsson M, Ramachandran S, Ray N, et al. (2007) Genetic Variation and Population Structure in Native Americans. PLoS Genetics 3: e185.

[5] Jombart, T., Devillard, S. and Balloux, F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. Submitted to BMC genetics.

Examples

Run this code


## Not run: 
# ## LOAD DATA
# data(eHGDP)
# eHGDP
# 
# 
# ## PERFORM DAPC - USE POPULATIONS AS CLUSTERS
# ## to reproduce exactly analyses from the paper, use "n.pca=1000"
# dapc1 <- dapc(eHGDP, all.contrib=TRUE, scale=FALSE,
# n.pca=200, n.da=80) # takes 2 minutes
# dapc1
# 
# ## (see ?dapc for details about the output)
# 
# 
# 
# ## SCREEPLOT OF EIGENVALUES
# barplot(dapc1$eig, main="eHGDP - DAPC eigenvalues",
# col=c("red","green","blue", rep("grey", 1000)))
# 
# 
# 
# ## SCATTERPLOTS
# ## (!) Note: colors may be inverted with respect to [5]
# ## as signs of principal components are arbitrary
# ## and change from one computer to another
# ##
# ## axes 1-2
# s.label(dapc1$grp.coord[,1:2], clab=0, sub="Axes 1-2")
# par(xpd=T)
# colorplot(dapc1$grp.coord[,1:2], dapc1$grp.coord, cex=3, add=TRUE)
# add.scatter.eig(dapc1$eig,10,1,2, posi="bottomright", ratio=.3, csub=1.25)
# 
# ## axes 2-3
# s.label(dapc1$grp.coord[,2:3], clab=0, sub="Axes 2-3")
# par(xpd=T)
# colorplot(dapc1$grp.coord[,2:3], dapc1$grp.coord, cex=3, add=TRUE)
# add.scatter.eig(dapc1$eig,10,1,2, posi="bottomright", ratio=.3, csub=1.25)
# 
# 
# 
# ## MAP DAPC1 RESULTS
# if(require(maps)){
# 
# xy <- cbind(eHGDP$other$popInfo$Longitude, eHGDP$other$popInfo$Latitude)
# 
# par(mar=rep(.1,4))
# map(fill=TRUE, col="lightgrey")
# colorplot(xy, -dapc1$grp.coord, cex=3, add=TRUE, trans=FALSE)
# }
# 
# 
# 
# ## LOOK FOR OTHER CLUSTERS
# ## to reproduce results of the reference paper, use :
# ## grp <- find.clusters(eHGDP, max.n=50, n.pca=200, scale=FALSE)
# ## and then
# ## plot(grp$Kstat, type="b", col="blue")
# 
# grp <- find.clusters(eHGDP, max.n=30, n.pca=200,
# scale=FALSE, n.clust=4) # takes about 2 minutes
# names(grp)
# 
# ## (see ?find.clusters for details about the output)
# 
# 
# 
# ## PERFORM DAPC - USE POPULATIONS AS CLUSTERS
# ## to reproduce exactly analyses from the paper, use "n.pca=1000"
# dapc2 <- dapc(eHGDP, pop=grp$grp, all.contrib=TRUE,
# scale=FALSE, n.pca=200, n.da=80) # takes around a 1 minute
# dapc2
# 
# 
# ## PRODUCE SCATTERPLOT
# scatter(dapc2) # axes 1-2
# scatter(dapc2,2,3) # axes 2-3
# 
# 
# ## MAP DAPC2 RESULTS
# if(require(maps)){
# xy <- cbind(eHGDP$other$popInfo$Longitude,
# eHGDP$other$popInfo$Latitude)
# 
# myCoords <- apply(dapc2$ind.coord, 2, tapply, pop(eHGDP), mean)
# 
# par(mar=rep(.1,4))
# map(fill=TRUE, col="lightgrey")
# colorplot(xy, myCoords, cex=3, add=TRUE, trans=FALSE)
# }
# 
# ## End(Not run)

Run the code above in your browser using DataLab