Learn R Programming

LEA (version 1.4.0)

pca: Principal Component Analysis

Description

The function pca performs a Principal Component Analysis of a genotypic matrix using the lfmm, geno, ancestrymap, ped or vcf format. The function computes eigenvalue, eigenvector, and standard deviation for each principal component and the projection of each individual on each component. The function pca returns an object of class "pcaProject" containing the output data and the input parameters.

Usage

pca (input.file, K, center = TRUE, scale = FALSE)

Arguments

input.file
A character string containg the path to the genotype input file, a genotypic matrix in the lfmm format.
K
An integer corresponding to the number of principal components calculated. By default, all principal components are calculated.
center
A boolean option. If true, the data matrix is centered (default: TRUE).
scale
A boolean option. If true, the data matrix is centered and scaled (default: FALSE).

Value

pca returns an object of class pcaProject containing the following components:
eigenvalues
The vector of eigenvalues.
eigenvectors
The matrix of eigenvectors (one column for each eigenvector).
sdev
The vector of standard deviations.
projections
The matrix of projections (one column for each projection).
The following methods can be applied to the object of class pcaProject returned by pca:
plot
Plot the eigenvalues.
show
Display information about the analysis.
summary
Summarize the analysis.
tracy.widom
Perform Tracy-Widom tests on the eigenvalues.
load.pcaProject(file.pcaProject)
Load the file containing a pcaProject object and return the pcaProject object.
remove.pcaProject(file.pcaProject)
Erase a pcaProject object. Caution: All the files associated with the object will be removed.
export.pcaProject(file.pcaProject)
Create a zip file containing the full pcaProject object. It allows to move the project to a new directory or a new computer (using import). If you want to overwrite an existing export, use the option force == TRUE.
import.pcaProject(file.pcaProject)
Import and load an pcaProject object from a zip file (made with the export function) into the chosen directory. If you want to overwrite an existing project, use the option force == TRUE.

See Also

lfmm.data snmf lfmm tutorial

Examples

Run this code
# Creation of the genotype file "genotypes.lfmm"
# with 1000 SNPs for 165 individuals.
data("tutorial")
write.lfmm(tutorial.R,"genotypes.lfmm")

#################
# Perform a PCA #
#################

# run of PCA
# Available options, K (the number of PCs calculated), 
#                    center and scale. 
# Creation of   genotypes.pcaProject - the pcaProject object.
#               a directory genotypes.pca containing:
# Create files: genotypes.eigenvalues - eigenvalues,    
#               genotypes.eigenvectors - eigenvectors,
#               genotypes.sdev - standard deviations,
#               genotypes.projections - projections,
# Create a pcaProject object: pc.
pc = pca("genotypes.lfmm", scale = TRUE)

#######################
# Display Information #
#######################

# Display information about the analysis.
show(pc)

# Summarize the analysis.
summary(pc)

#####################
# Graphical outputs #
#####################

par(mfrow=c(2,2))

# Plot eigenvalues.
plot(pc, lwd=5, col="red",xlab=("PCs"),ylab="eigen")

# PC1-PC2 plot.
plot(pc$projections)
# PC3-PC4 plot.
plot(pc$projections[,3:4])

# Plot standard deviations.
plot(pc$sdev)

#############################
# Perform Tracy-Widom tests #
#############################

# Perfom Tracy-Widom tests on all eigenvalues.
# Create file: genotypes.tracyWidom - tracy-widom test information, 
#          in the directory genotypes.pca/.
tw = tracy.widom(pc)

# Plot the percentage of variance explained by each component.
plot(tw$percentage)

# Display the p-values for the Tracy-Widom tests. 
tw$pvalues

##########################
# Manage an pca project #
##########################

# All the file of pca for a given file are 
# automatically saved into a pca project directory and a file.
# The name of the pcaProject file is the same name as 
# the name of the input file with a .pcaProject extension 
# ("genotypes.pcaProject").
# The name of the pcaProject directory is the same name as
# the name of the input file with a .pca extension ("genotypes.pca/")
# There is only one pca Project for each input file including all the runs.

# An pcaProject can be load in a different session.
project = load.pcaProject("genotypes.pcaProject") 

# An pcaProject can be exported to be imported in another directory
# or in another computer
export.pcaProject("genotypes.pcaProject")

 windows
dir.create("test", showWarnings = TRUE)
#import
newProject = import.pcaProject("genotypes_pcaProject.zip", "test")
# remove
remove.pcaProject("test/genotypes.pcaProject")

 windows
# remove
remove.pcaProject("genotypes.pcaProject")

#import
newProject = import.pcaProject("genotypes_pcaProject.zip")

# An pcaProject can be erased.
# Caution: All the files associated with the project will be removed.
remove.pcaProject("genotypes.pcaProject")

Run the code above in your browser using DataLab