Learn R Programming

tcR (version 2.2.4)

parse.folder: Parse input table files with immune receptor repertoire data.

Description

Load the TCR data from the file with the given filename to a data frame or load all files from the given folder to a list of data frames. The folder must contain onky files with the specified format. Input files could be either text files or archived with gzip ("filename.txt.gz") or bzip2 ("filename.txt.bz2"). For a general parser see parse.cloneset.

Parsers are available for: MiTCR ("mitcr"), MiTCR w/ UMIs ("mitcrbc"), MiGEC ("migec"), VDJtools ("vdjtools"), ImmunoSEQ ("immunoseq" or 'immunoseq2' for old and new formats respectively), MiXCR ("mixcr"), IMSEQ ("imseq") and tcR ("tcr", data frames saved with the `repSave()` function).

Output of MiXCR should contain either all hits or best hits for each gene segment.

Output of IMSEQ should be generated with parameter "-on". In this case there will be no positions of aligned gene segments in the output data frame due to restrictions of IMSEQ output.

tcR's data frames should be saved with the `repSave()` function.

Usage

parse.file(.filename, 
.format = c('mitcr', 'mitcrbc', 'migec', 'vdjtools', 'immunoseq', 
'mixcr', 'imseq', 'tcr'), ...)

parse.file.list(.filenames, .format = c('mitcr', 'mitcrbc', 'migec', 'vdjtools', 'immunoseq', 'mixcr', 'imseq', 'tcr'), .namelist = NA)

parse.folder(.folderpath, .format = c('mitcr', 'mitcrbc', 'migec', 'vdjtools', 'immunoseq', 'mixcr', 'imseq', 'tcr'), ...)

parse.mitcr(.filename)

parse.mitcrbc(.filename)

parse.migec(.filename)

parse.vdjtools(.filename)

parse.immunoseq(.filename)

parse.immunoseq2(.filename)

parse.immunoseq3(.filename)

parse.mixcr(.filename)

parse.imseq(.filename)

parse.tcr(.filename)

parse.migmap(.filename)

Arguments

.folderpath

Path to the folder with text cloneset files.

.format

String that specifies the input format.

...

Parameters passed to parse.cloneset.

.filename

Path to the input file with cloneset data.

.filenames

Vector or list with paths to files with cloneset data.

.namelist

Either NA or character vector of length .filenames with names for output data frames.

Value

Data frame with immune receptor repertoire data. Each row in this data frame corresponds to a clonotype. The data frame has following columns:

- "Umi.count" - number of barcodes (events, UMIs);

- "Umi.proportion" - proportion of barcodes (events, UMIs);

- "Read.count" - number of reads;

- "Read.proportion" - proportion of reads;

- "CDR3.nucleotide.sequence" - CDR3 nucleotide sequence;

- "CDR3.amino.acid.sequence" - CDR3 amino acid sequence;

- "V.gene" - names of aligned Variable gene segments;

- "J.gene" - names of aligned Joining gene segments;

- "D.gene" - names of aligned Diversity gene segments;

- "V.end" - last positions of aligned V gene segments (1-based);

- "J.start" - first positions of aligned J gene segments (1-based);

- "D5.end" - positions of D'5 end of aligned D gene segments (1-based);

- "D3.end" - positions of D'3 end of aligned D gene segments (1-based);

- "VD.insertions" - number of inserted nucleotides (N-nucleotides) at V-D junction (-1 for receptors with VJ recombination);

- "DJ.insertions" - number of inserted nucleotides (N-nucleotides) at D-J junction (-1 for receptors with VJ recombination);

- "Total.insertions" - total number of inserted nucleotides (number of N-nucleotides at V-J junction for receptors with VJ recombination).

See Also

parse.cloneset, repSave, repLoad

Examples

Run this code
# NOT RUN {
# Parse file in "~/mitcr/immdata1.txt" as a MiTCR file.
immdata1 <- parse.file("~/mitcr/immdata1.txt", 'mitcr')
# Parse VDJtools file archive as .gz file.
immdata1 <- parse.file("~/mitcr/immdata3.txt.gz", 'vdjtools')
# Parse files "~/data/immdata1.txt" and "~/data/immdat2.txt" as MiGEC files.
immdata12 <- parse.file.list(c("~/data/immdata1.txt",
                             "~/data/immdata2.txt"), 'migec')
# Parse all files in "~/data/" as MiGEC files.
immdata <- parse.folder("~/data/", 'migec')
# }

Run the code above in your browser using DataLab