MVP.FarmCPU: Perform GWAS using FarmCPU method

Description

Date build: Febuary 24, 2013 Last update: May 25, 2017 Requirement: Y, GD, and CV should have same taxa order. GD and GM should have the same order on SNPs

Usage

MVP.FarmCPU(
  phe,
  geno,
  map,
  CV = NULL,
  geno_ind_idx = NULL,
  P = NULL,
  method.sub = "reward",
  method.sub.final = "reward",
  method.bin = c("EMMA", "static", "FaST-LMM"),
  bin.size = c(5e+05, 5e+06, 5e+07),
  bin.selection = seq(10, 100, 10),
  memo = "MVP.FarmCPU",
  Prior = NULL,
  ncpus = 2,
  maxLoop = 10,
  threshold.output = 0.01,
  converge = 1,
  iteration.output = FALSE,
  p.threshold = NA,
  QTN.threshold = 0.01,
  bound = NULL,
  verbose = TRUE
)

Value

a m by 4 results matrix, m is marker size, the four columns are SNP_ID, Chr, Pos, and p-value

Arguments

phe: phenotype, n by t matrix, n is sample size, t is number of phenotypes
geno: genotype, m by n matrix, m is marker size, n is sample size. This is Pure Genotype Data Matrix(GD). THERE IS NO COLUMN FOR TAXA.
map: SNP map information, m by 3 matrix, m is marker size, the three columns are SNP_ID, Chr, and Pos
CV: covariates, n by c matrix, n is sample size, c is number of covariates
geno_ind_idx: the index of effective genotyped individuals
P: start p values for all SNPs
method.sub: method used in substitution process, five options: 'penalty', 'reward', 'mean', 'median', or 'onsite'
method.sub.final: method used in substitution process, five options: 'penalty', 'reward', 'mean', 'median', or 'onsite'
method.bin: method for selecting the most appropriate bins, three options: 'static', 'EMMA' or 'FaST-LMM'
bin.size: bin sizes for all iterations, a vector, the bin size is always from large to small
bin.selection: number of selected bins in each iteration, a vector
memo: a marker on output file name
Prior: prior information, four columns, which are SNP_ID, Chr, Pos, P-value
ncpus: number of threads used for parallele computation
maxLoop: maximum number of iterations
threshold.output: only the GWAS results with p-values lower than threshold.output will be output
converge: a number, 0 to 1, if selected pseudo QTNs in the last and the second last iterations have a certain probality (the probability is converge) of overlap, the loop will stop
iteration.output: whether to output results of all iterations
p.threshold: if all p values generated in the first iteration are bigger than p.threshold, FarmCPU stops
QTN.threshold: in second and later iterations, only SNPs with lower p-values than QTN.threshold have chances to be selected as pseudo QTNs
bound: maximum number of SNPs selected as pseudo QTNs in each iteration
verbose: whether to print detail.

Author

Xiaolei Liu and Zhiwu Zhang

Examples

Run this code

# \donttest{
phePath <- system.file("extdata", "07_other", "mvp.phe", package = "rMVP")
phenotype <- read.table(phePath, header=TRUE)
idx <- !is.na(phenotype[, 2])
phenotype <- phenotype[idx, ]
print(dim(phenotype))
genoPath <- system.file("extdata", "06_mvp-impute", "mvp.imp.geno.desc", package = "rMVP")
genotype <- attach.big.matrix(genoPath)
genotype <- deepcopy(genotype, cols=idx)
print(dim(genotype))
mapPath <- system.file("extdata", "06_mvp-impute", "mvp.imp.geno.map", package = "rMVP")
map <- read.table(mapPath , head = TRUE)

farmcpu <- MVP.FarmCPU(phe=phenotype,geno=genotype,map=map,maxLoop=2,method.bin="static")
str(farmcpu)
# }

Run the code above in your browser using DataLab