Learn R Programming

CGEN (version 3.8.0)

GxE.scan.partition: Creates GxE.scan job files for a computing cluster

Description

Creates job files for running GxE.scan on a parallel processing system.

Usage

GxE.scan.partition(snp.list, pheno.list, op=NULL)

Arguments

snp.list
See snp.list and details below. No default.
pheno.list
See pheno.list. No default.
op
See details for this list of options. The default is NULL.

Value

The name of the file containing names of the job files to be submitted. See details.

Details

This function will create files needed for running a GWAS scan on a computing cluster. The user must know how to submit jobs and know how to use their particular cluster. On many clusters, the command for submitting a job is "qsub". The scan is partitioned into smaller jobs by either setting the values for snp.list$start.vec and snp.list$stop.vec or by setting the value for snp.list$include.snps. The partitioning is done so that each job will process an equal number of SNPs. In the output directory (see option out.dir), three types of files will be created. One type of file will be the R program file containing R statements defining snp.list, pheno.list and op for the GxE.scan function. These files have the ".R" file extension. Another type of file will be the job file which calls the R program file. These files are named paste(op$out.dir, "job_", op$id.str, 1:op$n.jobs, sep="") The third type of file is a single file containing the names of all the job files. This file has the prefix "Rjobs_". This function will automatically set the name of the output file created by GxE.scan to a file in the op$out.dir directory with the prefix "GxEout_".

Options list op: Below are the names for the options list op. All names have default values if they are not specified.

  • n.jobs The (maximum) number of jobs to run. The default is 100.
  • out.dir Directory to save all files. If NULL, then the files will be created in the working directory getwd.
  • GxE.scan.op List of options for the GxE.scan function. The default is NULL.
  • R.cmd Character string for calling R. The default is "R --vanilla".
  • begin.commands.R Character vector of R statements to be placed at the top of each R program file. For example, begin.commands.R=c("rm(list=ls(all=TRUE))", "gc()", 'library(CGEN, lib.loc="/home/Rlibs/")') The default is "library(CGEN)".
  • qsub.cmd Character string for the command to submit a single job. The default is "qsub".
  • begin.commands.qsub Character vector of statements to be placed at the top of each job file. For example, begin.commands.qsub="module load R". The default is NULL.
  • id.str A character string to be appended to the file names. The default is "".

snp.list The objects start.vec and stop.vec in snp.list are set automatically, so they do not need to be set by the user. In general, it is more efficient in terms of memory usage and speed to have the genotype data partitioned into many files. Thus, snp.list$file can not only be set to a single file but also set to a character vector of the partitioned files when calling this function. In this case, the number of jobs to create (op$n.jobs) must be greater than or equal to the number of partitioned files. An object in snp.list that is unique to the GxE.scan.partition function is nsnps.vec. Each element of snp.list$nsnps.vec is the number of SNPs in each file of snp.list$file. If nsnps.vec is not specified and snp.list$file contains more than one file, then each job will process an entire file in snp.list$file.

For the scenarios when the genotype data must be transformed and the data is contained in a single file, then snp.list$include.snps should also be set. This will create a separate list of SNPs for each job to process.

See Also

GxE.scan, GxE.scan.combine

Examples

Run this code

 # Define the list for the genotype data. There are 50 SNPs in the TPED file. 
 snp.list <- list(nsnps.vec=50, format="tped")
 snp.list$file <- system.file("sampleData", "geno_data.tped.gz", package="CGEN")
 snp.list$subject.list <- system.file("sampleData", "geno_data.tfam", package="CGEN")
 
 # Define pheno.list
 pheno.list <- list(id.var=c("Family", "Subject"), delimiter="\t", header=1,
                    response.var="CaseControl")
 pheno.list$file <- system.file("sampleData", "pheno.txt", package="CGEN")
 pheno.list$main.vars <- ~Gender + Exposure
 pheno.list$int.vars <- ~Exposure
 pheno.list$strata.var <- "Study"

 # Define the list of options. 
 # Specifying n.jobs=5 will let each job process 10 SNPs.
 op <- list(n.jobs=5, GxE.scan.op=list(model=1))

 # GxE.scan.partition(snp.list, pheno.list, op=op)

Run the code above in your browser using DataLab