Learn R Programming

Admixture Graph Manipulation and Fitting

The package provides functionality to analyse and test admixture graphs against the f statistics described in the paper Ancient Admixture in Human History, Patterson et al., Genetics, Vol. 192, 1065--1093, 2012.

The f statistics --- f2, f3, and f4 --- extract information about correlations between gene frequencies in different populations (or single diploid genome samples), which can be informative about patterns of gene flow between these populations in form of admixture events. If a graph is constructed as a hypothesis for the relationship between the populations, equations for the expected values of the f statistics can be extracted, as functions of edge lenghs --- representing genetic drift --- and admixture proportions.

This package provides functions for extracting these equations and for fitting them against computed f statistics. It does not currently provide functions for computing the f statistics --- for that we refer to the ADMIXTOOLS software package.

Example

Below is a quick example of how the package can be used. The example uses data from polar bears and brown bears with a black bear as outgroup and is taken from Genomic evidence of geographically widespread effect of gene flow from polar bears into brown bears.

The BLK sample is the black bear, the PB sample is a polar bear, and the rest are brown bears.

I have taken the f statistics from Table 1 in the paper:

data(bears)
bears
#>      W  X      Y      Z      D Z.value
#> 1  BLK PB Sweden   Adm1 0.1258    12.8
#> 2  BLK PB  Kenai   Adm1 0.0685     5.9
#> 3  BLK PB Denali   Adm1 0.0160     1.3
#> 4  BLK PB Sweden   Adm2 0.1231    12.2
#> 5  BLK PB  Kenai   Adm2 0.0669     6.1
#> 6  BLK PB Denali   Adm2 0.0139     1.1
#> 7  BLK PB Sweden    Bar 0.1613    14.7
#> 8  BLK PB  Kenai    Bar 0.1091     8.9
#> 9  BLK PB Denali    Bar 0.0573     4.3
#> 10 BLK PB Sweden   Chi1 0.1786    17.7
#> 11 BLK PB  Kenai   Chi1 0.1278    11.3
#> 12 BLK PB Denali   Chi1 0.0777     6.4
#> 13 BLK PB Sweden   Chi2 0.1819    18.3
#> 14 BLK PB  Kenai   Chi2 0.1323    12.1
#> 15 BLK PB Denali   Chi2 0.0819     6.7
#> 16 BLK PB Sweden Denali 0.1267    14.3
#> 17 BLK PB  Kenai Denali 0.0571     5.6
#> 18 BLK PB Sweden  Kenai 0.0719     9.6

The D column is the f4(W,X;Y,Z) statistic and the Z column is the Z-values obtained from a blocked jacknife (see Patterson et al. for details).

From the statistics we can see that the ABC bears (Adm, Bar and Chi) are closer related to the polar bears compared to the other brown bears. The paper explains this with gene flow from polar bears into the ABC bears and going further out from there, but we can also explain this by several waves of admixture from ancestral polar bears into brown bears:

leaves <- c("BLK", "PB",
            "Bar", "Chi1", "Chi2", "Adm1", "Adm2",
            "Denali", "Kenai", "Sweden") 
inner_nodes <- c("R", "PBBB",
                 "Adm", "Chi", "BC", "ABC",
                 "x", "y", "z",
                 "pb_a1", "pb_a2", "pb_a3", "pb_a4",
                 "bc_a1", "abc_a2", "x_a3", "y_a4")

edges <- parent_edges(c(edge("BLK", "R"),
                        edge("PB", "pb_a1"),
                        edge("pb_a1", "pb_a2"),
                        edge("pb_a2", "pb_a3"),
                        edge("pb_a3", "pb_a4"),
                        edge("pb_a4", "PBBB"),
                        
                        edge("Chi1", "Chi"),
                        edge("Chi2", "Chi"),
                        edge("Chi", "BC"),
                        edge("Bar", "BC"),
                        edge("BC", "bc_a1"),
                        
                        edge("Adm1", "Adm"),
                        edge("Adm2", "Adm"),
                        
                        admixture_edge("bc_a1", "pb_a1", "ABC", "a"),
                        edge("Adm", "ABC"),
                        
                        edge("ABC", "abc_a2"),
                        admixture_edge("abc_a2", "pb_a2", "x", "b"),
                        
                        edge("Denali", "x"),
                        edge("x", "x_a3"),
                        admixture_edge("x_a3", "pb_a3", "y", "c"),
                      
                        edge("Kenai", "y"),
                        edge("y", "y_a4"),                        
                        admixture_edge("y_a4", "pb_a4", "z", "d"),
                        
                        edge("Sweden", "z"),
                        
                        edge("z", "PBBB"),
                        edge("PBBB", "R")))

bears_graph <- agraph(leaves, inner_nodes, edges)
plot(bears_graph, show_admixture_labels = TRUE)
#> fminbnd:  Exiting: Maximum number of function evaluations has been exceeded
#>          - increase MaxFunEvals option.
#>          Current function value: 3027.37262644651

Fitting a graph to data

The graph makes predictions on how the f4 statistics should look. The graph parameters can be fit to observed statistics using the fit_graph function:

fit <- fit_graph(bears, bears_graph)
fit
#> 
#> Call: inner_fit_graph(data, graph, point, Z.value, concentration, optimisation_options, 
#>     parameters, iteration_multiplier, qr_tol)
#> 
#> None of the admixture proportions are properly fitted!
#> Not all of the admixture proportions are properly fitted!
#> See summary.agraph_fit for a more detailed analysis.
#> 
#> Minimal error: 12.98523

You can get detailsabout the fit by calling the summary.agraph_fit function:

summary(fit)
#> 
#> Call: inner_fit_graph(data, graph, point, Z.value, concentration, optimisation_options, 
#>     parameters, iteration_multiplier, qr_tol)
#> 
#> None of the proportions {a, b, c, d} affect the quality of the fit!
#> 
#> Optimal admixture proportions:
#>         a         b         c         d 
#> 0.3666992 0.4977105 0.9565926 0.7986799 
#> 
#> Optimal edge lengths:
#>        edge_R_BLK       edge_R_PBBB       edge_PBBB_z   edge_PBBB_pb_a4 
#>        0.00000000        0.00000000        0.00000000        0.07852837 
#>     edge_Adm_Adm1     edge_Adm_Adm2     edge_Chi_Chi1     edge_Chi_Chi2 
#>        0.00000000        0.00000000        0.00000000        0.00000000 
#>       edge_BC_Bar       edge_BC_Chi      edge_ABC_Adm    edge_ABC_bc_a1 
#>        0.00000000        0.00000000        0.00000000        0.00000000 
#>     edge_x_Denali     edge_x_abc_a2      edge_y_Kenai       edge_y_x_a3 
#>        0.00000000        0.00000000        0.00000000        0.00000000 
#>     edge_z_Sweden       edge_z_y_a4     edge_pb_a1_PB  edge_pb_a1_bc_a1 
#>        0.00000000        0.00000000        0.00000000        0.00000000 
#>  edge_pb_a2_pb_a1 edge_pb_a2_abc_a2  edge_pb_a3_pb_a2   edge_pb_a3_x_a3 
#>        0.13643125        0.00000000        0.02156832        0.00000000 
#>  edge_pb_a4_pb_a3   edge_pb_a4_y_a4     edge_bc_a1_BC   edge_abc_a2_ABC 
#>        0.04010857        0.00000000        0.00000000        0.00000000 
#>       edge_x_a3_x       edge_y_a4_y 
#>        0.00000000        0.00000000 
#> 
#> Solution to a homogeneous system of edge lengths with the optimal admixture proportions:
#> Adding any such solution to the optimal one will not affect the error.
#> 
#> Free edge lengths:
#> edge_R_BLK
#> edge_R_PBBB
#> edge_PBBB_z
#> edge_Adm_Adm1
#> edge_Adm_Adm2
#> edge_Chi_Chi1
#> edge_Chi_Chi2
#> edge_BC_Bar
#> edge_BC_Chi
#> edge_ABC_Adm
#> edge_ABC_bc_a1
#> edge_x_Denali
#> edge_x_abc_a2
#> edge_y_Kenai
#> edge_y_x_a3
#> edge_z_Sweden
#> edge_z_y_a4
#> edge_pb_a1_PB
#> edge_pb_a1_bc_a1
#> edge_pb_a2_abc_a2
#> edge_pb_a3_x_a3
#> edge_pb_a4_y_a4
#> edge_bc_a1_BC
#> edge_abc_a2_ABC
#> edge_x_a3_x
#> edge_y_a4_y
#> 
#> Bounded edge lengths:
#> edge_PBBB_pb_a4 = 0
#> edge_pb_a2_pb_a1 = 0
#> edge_pb_a3_pb_a2 = 0
#> edge_pb_a4_pb_a3 = 0
#> 
#> Minimal error:
#> 12.98523

You can make a plot of the fit against the data by calling the plot.agraph_fit function:

plot(fit)

The plot shows the observed f4 statistics with error bars (in black) plus the predicted values from the graph.

The result of this is a ggplot2 object that you can modify by adding ggplot2 commands in the usual way.

Read the vignette admixturegraph for more examples.

Copy Link

Version

Install

install.packages('admixturegraph')

Monthly Downloads

35

Version

1.0.2

License

GPL-2

Issues

Pull Requests

Stars

Forks

Maintainer

Thomas Mailund

Last Published

December 13th, 2016

Functions in admixturegraph (1.0.2)

add_an_admixture

Adds a new admixture event to a graph.
admix_props

Specify the proportions in an admixture event.
add_graph_f4

Evalutes the f_4 statistics for all rows in a data frame and extends the data frame with the graph f_4.
add_a_leaf

Adds a new leaf to a graph.
admixturegraph-package

admixturegraph: Visualising and analysing admixture graphs.
add_graph_f4_sign

Extend a data frame with f_4 statistics predicted by a graph.
add_an_admixture2

Adds a new admixture event to a graph.
agraph_children

Build the child incidene matrix from an parent edge list.
admixture_proportions

Create the list of admixture proportions for an admixture graph.
admixture_edge

Create an admixture edge from a child to two parents.
agraph

Create an admixture graph object.
calculate_concentration

Building a proxy concentration matrix.
all_graphs

All graphs.
burn_in

Removes the first k rows from a trace.
all_path_overlaps

Get the list of overlaps of all paths.
all_paths

Compute all paths from one leaf to another.
agraph_parents

Build the parent incidence matrix from an edge list.
agraph_weights

Build the matrix of admixture proportions from an edge list.
build_edge_optimisation_matrix

Build a matrix coding the linear system of edges once the admix variables have been fixed.
bears

Statistics for populations of bears
canonise_expression

Used to recognize similar expressions and to possibly simplify them.
edge_optimisation_function

More detailed edge fitting than mere cost_function.
canonise_graph

Canonise graph.
eight_leaves_trees

Eight leaves trees.
evaluate_f4

Evaluates an f_4 statistics in a given environment.
edge

Create an edge from a child to a parent.
extract_admixture_proportion_parameters

Extract the admixture proportion parameter from edge specifications.
examine_edge_optimisation_matrix

Examine the edge optimisation matrix to detect unfitted admix variables.
filter_on_leaves

Filter data so all W, X, Y and Z are leaves in the graph.
fit_graph_list

Fit lots of graphs to data.
f2

Calculate the f_2(A, B) statistics.
f3

Calculate the f_3(A; B, C) statistics.
graph_to_vector

Graph to vector.
graphs_2_0

Admixture graphs of 2 leaves and 0 admixture events compressed into vectors
graphs_6_0

Admixture graphs of 6 leaves and 0 admixture events compressed into vectors
graphs_6_1

Admixture graphs of 6 leaves and 1 admixture event compressed into vectors
fitted.agraph_fit

Predicted f statistics for the fitted graph.
fast_fit

A fast version of graph fitting.
fast_plot

Fast version of graph plotting.
five_leaves_graphs

Five leaves graphs.
graphs_3_0

Admixture graphs of 3 leaves and 0 admixture events compressed into vectors
coef.agraph_fit

Parameters for the fitted graph.
graphs_3_1

Admixture graphs of 3 leaves and 1 admixture event compressed into vectors
extract_trees

Extract trees
extract_graph_parameters

Extract all the parameters a graph contains.
cost_function

The cost function fed to Nelder-Mead.
format_path

Create a path data frame from a list of nodes.
four_leaves_graphs

Four leaves graphs.
graphs_5_1

Admixture graphs of 5 leaves and 1 admixture event compressed into vectors
graphs_5_2

Admixture graphs of 5 leaves and 2 admixture events compressed into vectors
graphs_7_1

Admixture graphs of 7 leaves and 1 admixture event compressed into vectors
graphs_8_0

Admixture graphs of 8 leaves and 0 admixture events compressed into vectors
is_descendant_of

Is descendant of.
is_negative

All overlaps are either empty or have a negative weight.
plot_fit_1

A plot of the cost function or number of fitted statistics.
model_likelihood

Computes the likelihood of a model from samples from its posterior distribution.
mynonneg

Non negative least square solution.
plot_fit_2

A contour plot of the cost function.
remove_duplicates

Remove duplicate graphs from a list.
rename_nodes

Rename nodes.
make_mcmc_model

Collect the information about a graph and a data set needed to run an MCMC on it.
make_permutations

List of permutations.
residuals.agraph_fit

Errors of prediction in the fitted graph
plot.agraph

Plot an admixture graph.
overlaps_sign

Get the sign of overlapping paths.
plot.agraph_fit

Plot the fit of a graph to data.
no_poor_fits

Get the number of tests in the fit where the predictions fall outside of the error bars.
fit_graph

Fit the graph parameters to a data set.
run_metropolis_hasting

Run a Metropolis-Hasting MCMC to sample graph parameters.
vector_to_graph

Vector to graph.
fit_permutations_and_graphs

Fit lots of graphs to data.
graphs_4_0

Admixture graphs of 4 leaves and 0 admixture events compressed into vectors
is_unknown

Overlapping edges have both positive and negative contributions.
is_positive

All overlaps are either empty or have a positive weight.
graphs_4_1

Admixture graphs of 4 leaves and 1 admixture event compressed into vectors
no_admixture_events.agraph_fit

Get the number of admixture events in a fitted graph.
no_admixture_events.agraph_fit_list

Get the number of admixture events in a list of fitted graph.
project_to_population

Map sample names to population names.
print.agraph_fit

Print function for the fitted graph.
f4

Calculate the f_4(W, X; Y, Z) statistics.
f4stats

Make a data frame an f_4 statistics object.
graphs_4_2

Admixture graphs of 4 leaves and 2 admixture events compressed into vectors
graphs_5_0

Admixture graphs of 5 leaves and 0 admixture events compressed into vectors
graph_environment

Build an environment in which f statistics can be evaluated.
get_graph_f4_sign

Extracts the sign for the f_4 statistics predicted by the graph.
graphs_6_2

Admixture graphs of 6 leaves and 2 admixture events compressed into vectors
graphs_7_0

Admixture graphs of 7 leaves and 0 admixture events compressed into vectors
parent_edges

Create the list of edges for an admixture graph.
path_overlap

Collect the postive and negative overlap between two paths.
log_sum_of_logs

Computes the log of a sum of numbers all given in log-space.
make_an_outgroup

Make an outgroup.
no_admixture_events.agraph

Get the number of admixture events in a graph.
no_admixture_events

Get the number of admixture events in a graph.
seven_leaves_trees

Seven leaves trees.
sf2

Calculate the f_2(A, B) statistics.
plot.f4stats

Plot the fit of a graph to data.
seven_leaves_graphs

Seven leaves graphs.
poor_fits.agraph_fit_list

Get the tests in the fit where the predictions fall outside of the error bars.
summary.agraph_fit

Summary for the fitted graph.
sf3

Calculate the f_3(A; B, C) statistics.
thinning

Thins out an MCMC trace.
split_population.agraph_fit

Reverse a projection of samples to populations.
split_population.data.frame

Reverse a projection of samples to populations.
is_zero

All overlaps are empty.
split_population

Reverse a projection of samples to populations.
log_likelihood

Calculate (essentially) the log likelihood of a graph with parameters, given the observation.
sum_of_squared_errors.agraph_fit_list

Get the sum of squared errors for a list of fitted graph.
model_bayes_factor_n

Computes the Bayes factor between two models from samples from their posterior distributions.
no_poor_fits.agraph_fit_list

Get the number of tests in the fit where the predictions fall outside of the error bars.
model_likelihood_n

Computes the likelihood of a model from samples from its posterior distribution.
no_poor_fits.agraph_fit

Get the number of tests in the fit where the predictions fall outside of the error bars.
poor_fits.agraph_fit

Get the tests in the fit where the predictions fall outside of the error bars.
poor_fits

Get the tests in the fit where the predictions fall outside of the error bars.
sf4

Calculate the f_4(W, X; Y, Z) statistics.
six_leaves_graphs

Six leaves graphs.
sum_of_squared_errors.agraph_fit

Get the sum of squared errors for a fitted graph.
sum_of_squared_errors

Get the sum of squared errors for a fitted graph.