Learn R Programming

strataG (version 2.0.2)

phase: PHASE

Description

Run PHASE to estimate the phase of loci in diploid data.

Usage

phase(g, loci, positions = NULL, type = NULL, num.iter = 1e+05,
  thinning = 100, burnin = 1e+05, model = "new", ran.seed = NULL,
  final.run.factor = NULL, save.posterior = FALSE, in.file = "phase_in",
  out.file = "phase_out", delete.files = TRUE)

phaseReadSample(out.file, type)

phaseReadPair(out.file)

phaseWrite(g, loci, positions = NULL, type = rep("S", length(loci)), in.file = "phase_in")

phasePosterior(ph.res, keep.missing = TRUE)

phaseFilter(ph.res, thresh = 0.5, keep.missing = TRUE)

Arguments

g

a '>gtypes object.

loci

vector or data.frame of loci in 'g' that are to be phased. If a data.frame, it should have columns named locus (name of locus in 'g'), group (number identifying loci in same linkage group), and position (integer identifying location of each locus in a linkage group).

positions

position along chromosome of each locus.

type

type of each locus.

num.iter

number of PHASE MCMC iterations.

thinning

number of PHASE MCMC iterations to thin by.

burnin

number of PHASE MCMC iterations for burnin.

model

PHASE model type.

ran.seed

PHASE random number seed.

final.run.factor

optional.

save.posterior

logical. Save posterior sample in output list?

in.file

name to use for PHASE input file.

out.file

name to use for PHASE output files.

delete.files

logical. Delete PHASE input and output files when done?

ph.res

result from phase.run.

keep.missing

logical. T = keep missing data from original data set. F = Use estimated genotypes from PHASE.

thresh

minimum probability for a genotype to be selected (0.5 - 1).

Value

phase

a list containing:

locus.name new locus name, which is a combination of loci in group.
gtype.probs a data.frame listing the estimated genotype for every sample along with probability.
orig.gtypes the original gtypes object for the composite loci.
posterior a list of num.iter data.frames representing posterior sample of genotypes for each sample.

phaseWrite

a list with the input filename and the '>gtypes object used.

phaseReadPair

a data.frame of genotype probabilities.

phaseReadSample

a list of data.frames representing the posterior sample of genotypes for one set of loci for each sample.

phaseFilter

a matrix of genotypes for each sample.

phasePosterior

a list of data.frames representing the posterior sample of all genotypes for each sample.

Details

phase runs PHASE assuming that the executable is installed properly and available on the command line.
phaseWrite writes a PHASE formatted file.
phaseReadPair reads the '_pair' output file.
phaseReadSample reads the '_sample' output file.
phaseFilter filters the result from phase.run to extract one genotype for each sample.
phasePosterior create a data.frame all genotypes for each posterior sample.

References

Stephens, M., and Donnelly, P. (2003). A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics 73:1162-1169. Available at: http://stephenslab.uchicago.edu/software.html#phase

Examples

Run this code
# NOT RUN {
data(bowhead.snps)
data(bowhead.snp.position)
snps <- df2gtypes(bowhead.snps, ploidy = 2, description = "Bowhead SNPS")
summary(snps)

# Run PHASE on all data
phase.results <- phase(snps, bowhead.snp.position, num.iter = 100, 
  save.posterior = FALSE)

# Filter phase results
filtered.results <- phaseFilter(phase.results, thresh = 0.5)

# Convert phased genotypes to gtypes
ids <- rownames(filtered.results)
strata <- bowhead.snps$Stock[match(ids, bowhead.snps$LABID)]
filtered.df <- cbind(id = ids, strata = strata, filtered.results)
phased.snps <- df2gtypes(filtered.df, ploidy = 2, description = "Bowhead phased SNPs")
summary(phased.snps)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab