Learn R Programming

popkin

The popkin ("population kinship") R package estimates the kinship matrix of individuals and FST from their biallelic genotypes. Our estimation framework is the first to be practically unbiased under arbitrary population structures.

Installation

The stable version of the package is now on CRAN and can be installed using

install.packages("popkin")

The current development version can be installed from the GitHub repository using devtools:

install.packages("devtools") # if needed
library(devtools)
install_github('StoreyLab/popkin', build_vignettes = TRUE)

You can see the package vignette, which has more detailed documentation, by typing this into your R session:

vignette('popkin')

Examples

Input data

The examples below assume the following R data variables are present for n individuals and m loci:

  • The m-by-n genotype matrix X, containing only unphased biallelic variants encoded as 0,1,2 counting a given reference allele per locus.
  • The length-n vector subpops that assigns each individual to a subpopulation.

The subpops vector is not required, but its use is recommended to improve estimation of the baseline kinship value treated as zero.

If your data is in BED format, popkin will process it efficiently using BEDMatrix. If file is the path to the BED file (excluding .bed extension):

library(BEDMatrix)
X <- BEDMatrix(file) # load genotype matrix object

popkin functions

This is a quick overview of every popkin function, covering estimation and visualization of kinship and FST from a genotype matrix.

First estimate the kinship matrix from the genotypes X. All downstream analysis require kinship, none use X after this

library(popkin)
kinship <- popkin(X, subpops) # calculate kinship from X and optional subpop labels

Plot the kinship matrix, marking the subpopulations. Note inbr_diag replaces the diagonal of kinship with inbreeding coefficients

plot_popkin( inbr_diag(kinship), labs = subpops )

Extract inbreeding coefficients from kinship

inbreeding <- inbr(kinship)

Estimate FST

weights <- weights_subpops(subpops) # weigh individuals so subpopulations are balanced
Fst <- fst(kinship, weights) # use kinship matrix and weights to calculate fst
Fst <- fst(inbreeding, weights) # estimate more directly from inbreeding vector (same result)

Estimate and visualize the pairwise FST matrix

pairwise_fst <- pwfst(kinship) # estimated matrix
leg_title <- expression(paste('Pairwise ', F[ST])) # fancy legend label
plot_popkin(pairwise_fst, labs = subpops, leg_title = leg_title) # NOTE no need for inbr_diag() here!

Rescale the kinship matrix using different subpopulations (implicitly changes the most recent common ancestor population used as reference)

kinship2 <- rescale_popkin(kinship, subpops2)

Estimate the coancestry matrix from a matrix of allele frequencies P (useful when P comes from an admixture inference model)

coancestry <- popkin_af( P )

Please see the popkin R vignette for a description of the key parameters and more detailed examples, including complex plots with multiple kinship matrices and multi-level subpopulation labeling.

Citations

Alejandro Ochoa, John D Storey. 2021. "Estimating FST and kinship for arbitrary population structures." PLoS Genet 17(1): e1009241. PubMed ID 33465078. doi:10.1371/journal.pgen.1009241. bioRxiv doi:10.1101/083923 2016-10-27.

Alejandro Ochoa, John D Storey. 2019. "New kinship and FST estimates reveal higher levels of differentiation in the global human population." bioRxiv doi:10.1101/653279.

Alejandro Ochoa, John D Storey. 2016. "FST And Kinship for Arbitrary Population Structures I: Generalized Definitions." bioRxiv doi:10.1101/083915.

Copy Link

Version

Install

install.packages('popkin')

Monthly Downloads

346

Version

1.3.23

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

January 7th, 2023

Functions in popkin (1.3.23)

admix_order_cols

Reorder admixture matrix columns
hgdp_subset

HGDP subset
inbr

Extract inbreeding coefficients from a kinship matrix
plot_admix

Make a structure/admixture plot
fst

Calculate FST from a population-level kinship matrix or vector of inbreeding coefficients
popkin-package

A package for estimating kinship and FST under arbitrary population structure
popkin_A

Compute popkin's A and M matrices from genotypes
popkin_A_min_subpops

Estimate the minimum expected value of a matrix A using subpopulations
weights_subpops

Get weights for individuals that balance subpopulations
validate_kinship

Validate a kinship matrix
popkin

Estimate kinship from a genotype matrix and subpopulation assignments
plot_popkin

Visualize one or more kinship matrices and other related objects
rescale_popkin

Rescale kinship matrix to set a given kinship value to zero.
popkin_af

Estimate coancestry from an allele frequency matrix and subpopulation assignments
pwfst

Estimate the individual-level pairwise FST matrix
plot_phylo

Plot a phylo tree object
mean_kinship

Calculate the weighted mean kinship
inbr_diag

Replace kinship diagonal with inbreeding coefficients
avg_kinship_subpops

Calculate a kinship matrix between subpopulations by averaging individual data
n_eff

Calculates the effective sample size of the data
admix_label_cols

Label ancestries based on best match to individual labels