NB.estimator: Maximum likelihood estimation of effective population size from temporally-spaced genetic data

Description

The concept of effective population size is imporatnt in population genetics. This function estimates the effective population size $N_e$ from temporally-sapced genetic data using maximum likelihood method with continuous approximation.

Usage

NB.estimator(infile, alleles, sample.interval, bound = c(50, 1e+07), 
	profile.likelihood=FALSE)

Arguments

infile

The input dataset. It should be a plain text file (*.txt) containing the number of alleles of each allele, at each locus, from each temporal sample. For example, Suppose we have $i$ temporal samples from the focal population, $j$ loci, and $K_j$ alleles at locus $j$. Denoting the number of copies of allele $k$ at locus $j$ from sample $i$ as $n_{k,j,i}$, then the input format of data is as follows: $n_{1,1,1}~n_{2,1,1}~...~n_{K_1,1,1}$ $n_{1,2,1}~n_{2,2,1}~...~n_{K_2,2,1}$ ... $n_{1,j,1}~n_{2,j,1}~...~n_{K_j, j,1}$ $n_{1,1,2}~n_{2,1,2}~...~n_{K_1,1,2}$ $n_{1,2,2}~n_{2,2,2}~...~n_{K_2,2,2}$ ... $n_{1,j,2}~n_{2,j,2}~...~n_{K_j, j,2}$ ... Note: A space is required to separate the allele counts.

alleles

A vector containing the number of alleles at each locus. For example, c(4, 4, 4) would mean that 3 loci are sampled, with 4 alleles each.

sample.interval

A vector stating at which generations the samples were taken. For example, c(0, 8) would indicate that two samples were collected from the 0th and 8th generation.

bound

Lower and upper bound for searching for the effective population size. Default values are c(50, 1e7).

profile.likelihood

TRUE if you would like to plot the log-likelihood function, or to look at the details of the log-likelihood values between your confidence interval. FALSE will return you the lower and upper 95% confidence interval only.

Value

N: The point estimate of the effective population size $N_e$.
CI: The approximate 95% confidence interval calculated from the log-likelihood. They are the region where the log-likelihood is 2 units below the maxima.
log.like: The value of the maximised log-likelihood.
profile.CI: A list of log-likelihood values as a function of $N_e$ within the 95% confidence interval.

Details

The input arguments above largely follow the input format of MLNE (Wang, 2001; Wang and Whitlock, 2003) to allow users to switch between platforms with the minimal effort. The infile should be in a plain text (*.txt) file format. It contains the same information as the input data as MLNE does. The built-in optim optimiser is wrapped within this function.

Examples

Run this code

## CREATE SAMPLE DATASET
NB.example.dataset()

## RUN THE FUNCTION
NB.estimator(infile='sample_data.txt', alleles=rep(4, 50), 
	sample.interval=c(0, 8), profile.likelihood=FALSE)

	#####
	## NUMERICAL RESULTS
	#$N
	# [1] 1241.079
	#
	#$CI
	#[1]  594.195 6375.933
	#
	#$log.like
	#[1] -543.9159
	#####