Learn R Programming

admixturegraph (version 1.0.2)

fit_graph: Fit the graph parameters to a data set.

Description

Given a table of observed $f$ statistics and a graph, uses Nelder-Mead algorithm to find the graph parameters (edge lengths and admixture proportions) that minimize the value of cost_function, i. e. maximizes the likelihood of a graph with parameters given the observed data. Like fast_fit but outputs a more detailed analysis on the results.

Usage

fit_graph(data, graph, point = list(rep(1e-05, length(extract_graph_parameters(graph)$admix_prop)), rep(1 - 1e-05, length(extract_graph_parameters(graph)$admix_prop))), Z.value = TRUE, concentration = calculate_concentration(data, Z.value), optimisation_options = NULL, parameters = extract_graph_parameters(graph), iteration_multiplier = 3, qr_tol = 1e-08)

Arguments

data
The data table, must contain columns W, X, Y, Z for sample names and D for the observed $f_4(W, X; Y, Z)$. May contain an optional column Z.value for the $Z$ scores (the $f$ statistics divided by the standard deviations).
graph
The admixture graph (an agraph object).
point
If the user wants to restrict the admixture proportions somehow, like to fix some of them. A list of two vectors: the lower and the upper bounds. As a default the bounds are just it little bit more than zero and less than one; this is because sometimes the infimum of the values of cost function is at a point of non-continuity, and zero and one have reasons to be problematic values in this respect.
Z.value
Whether we calculate the default concentration from $Z$ scores (the default option TRUE) or just use the identity matrix.
concentration
The Cholesky decomposition of the inverted covariance matrix. Default matrix determined by the parameter Z.value.
optimisation_options
Options to the Nelder-Mead algorithm.
parameters
In case one wants to tweak something in the graph.
iteration_multiplier
Given to mynonneg.

Value

A class agraph_fit list containing a lot of information about the fit: data is the input data, graph is the input graph, matrix is the output of build_edge_optimisation_matrix, containing the full matrix, the column_reduced matrix without zero columns, and graph parameters, complaint coding wchich subsets of admixture proportions are trurly fitted, best_fit is the optimal admixture proportions (might not be unique if they are not trurly fitted), best_edge_fit is an example of optimal edge lengths, homogeneous is the reduced row echelon form of the matrix describing when a vector of edge lengths have no effect on the prediced statistics $F$, free_edges is one way to choose a subset of edge lengths in such a vector as free variables, bounded_edges is how we calculate the reamining edge lengths from the free ones, best_error is the minimum value of the cost_function, approximation is the predicted statistics $F$ with the optimal graph parameters, parameters is jsut a shortcut for the graph parameters. See summary.agraph_fit for the interpretation of some of these results.

See Also

cost_function

agraph

calculate_concentration

optimset

fast_fit

Examples

Run this code

# For example, let's fit the following two admixture graph to an example data on bears:

data(bears)
print(bears)

leaves <- c("BLK", "PB", "Bar", "Chi1", "Chi2", "Adm1", "Adm2", "Denali", "Kenai", "Sweden") 
inner_nodes <- c("R", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "M", "N")
edges <- parent_edges(c(edge("BLK", "R"),
                        edge("PB", "v"),
                        edge("Bar", "x"),
                        edge("Chi1", "y"),
                        edge("Chi2", "y"),
                        edge("Adm1", "z"),
                        edge("Adm2", "z"),
                        edge("Denali", "t"),
                        edge("Kenai", "s"),
                        edge("Sweden", "r"),
                        edge("q", "R"),
                        edge("r", "q"),
                        edge("s", "r"),
                        edge("t", "s"),
                        edge("u", "q"),
                        edge("v", "u"),
                        edge("w", "M"),
                        edge("x", "N"),
                        edge("y", "x"),
                        edge("z", "w"),
                        admixture_edge("M", "u", "t"),
                        admixture_edge("N", "v", "w")))
admixtures <- admixture_proportions(c(admix_props("M", "u", "t", "a"),
                                      admix_props("N", "v", "w", "b")))
bears_graph <- agraph(leaves, inner_nodes, edges, admixtures)
plot(bears_graph, show_admixture_labels = TRUE)

fit <- fit_graph(bears, bears_graph)
summary(fit)

# It turned out the values of admixture proportions had no effect on the cost function. This is not
# too surprising because the huge graph contains a lot of edge variables compared to the tiny 
# amount of data we used! Note however that the mere existence of the admixture event with non- 
# trivial (not zero or one) admixture proportion might still decrease the cost function.


Run the code above in your browser using DataLab