Learn R Programming

rnaseqWrapper (version 1.0-1)

readVariantFiles: Read in variant files for RNAseq

Description

Reads in the variant files from each sample of an RNAseq experiment and then combines the files into a single data.frame, useful for several downstream applications.

Usage

readVariantFiles(fileDir, sepSymbol = "_", fileID = "*_variants.txt", firstColName = "SEQ_ID", fileSep = "\t", idCols = 5, refPosCol = "Reference.Position", colToSort = "Coverage", removeDups = TRUE, returnMerged = TRUE, returnSing = FALSE, limitGenes = NULL, omitRefMatches = TRUE, refAlleleCol = "Reference$", varAlleleCol = "Allele")

Arguments

fileDir
The path to the directory containing all of the variant files.
sepSymbol
The symbol that separates the sample names from other info in the file name. Used to pull names for columns in the combined file. Set to "" if the full file name should be used.
fileID
character to use to limit which files are imported; regular expressions allowed
firstColName
What should the first column be renamed to. Set to NULL or "" to leave the column as is. Intended to stanardize and to match the column names in other parts of the analysis pipeline.
fileSep
The column delimiter used in the file (e.g. "," or "\t")
idCols
How many columns of position information are there? Avoids including duplicated information in the combined ouput.
refPosCol
Which column has the reference position? Can be numeric or character
colToSort
Which column should be used to keep one line per position, if removeDups == TRUE? Can be numeric or character.
removeDups
Logical, should duplicates at a position be removed? This is necessary to avoid massive over merging
returnMerged
Logical, should the merged variants be returned?
returnSing
Logical, should each of the separate variant files be returned?
limitGenes
A character vector listing the genes to include. This can be useful if your variant files include genes that you are not interested in analyzing (e.g. things without a blast hit).
omitRefMatches
Logical, should 'variants' which match the reference be excluded? This is useful if your variant file includes rows for reads aligning to the reference allele, which may be accidentally set as the main 'variant' in this function. Defaults to TRUE.
refAlleleCol
Which column has the reference allele? Can be numeric or character.
varAlleleCol
Which column has the variable alleles? Can be numeric or character.

Value

Output is based on returnMerged & returnSing returns:If returnMerged: a data.frame with the merged variantsIf returnSing: a list of the singVariants (cleaned if removeDups=TRUE)If both TRUE: a list with both of the above

Details

Reads in the variant files from fileDir and merges by gene and position.

Examples

Run this code

## Not run: 
# 
# mergedVariants <- readVariantFiles (
#       fileDir="path/to/variant/directory",
#       fileID = "*_variants.txt",
#       firstColName = "SEQ_ID",
#       idCols = 4, 
#       refPosCol = "Region"
#       ) 
# 
# ## End(Not run) 

Run the code above in your browser using DataLab