Loads genotype, phenotype, genetic map data files into R environment into a population object.
read.population(offspring = "offspring", founders = "founders", map = "map",
foundersGroups, populationType = c("riself", "f2", "bc", "risib"),
readMode = c("normal","HT"), threshold=0.05, verbose = FALSE, debugMode = 0,
n.cluster=1, ...)
Core used to specify names of children phenotypic ("core_phenotypes.txt") genotypic ("core_genotypes.txt") and annotations ("core_annotations.txt") files.
Core used to specify names of parental phenotypic ("core_phenotypes.txt") file.
Core used to specify names of genetic ("map_genetic.txt") and physical ("map_physical.txt") map files.
Specify groups of individuals in founders data, see description below and RP
for more details
Type of the population data was obtained from:
riself - RILs by selfing.
f2 - f2 cross.
bc - back cross.
risib - RILs by sibling mating.
HT, or High-Throughput mode should be used when the very large dataset is processed (at least 10000 probes). Then files are read in chunks intead of at once. To avoid R memory limits, only probes showing differential expression between parent are selected. Size of the chunk and threshold for assesing significance can be specified (see description of ... parameter).
- threshold for assesing probes that are differentially expressed between parents. 0.05 by default.
Be verbose
1: Print out checks, 2: print additional time information
number of cores used for calcuations
Parameters passed to high-throughtput function:
transformations - how should the data be transformed (see transformation
)
sliceSize - number of lines to be read at once byt HT function. 5000 by default.
An object of class population
.
Function is working on tab delimited files. Phenotype files, both for founders and offspring, should have header, containing column names (so names of individuals). All the other rows should start with rowname (unique). Rownames and colnames are only values allowed to be not numeric. After file is read into R, check is performed and rows and columns containing values that are not numeric and not convertable to numeric, will be removed from dataset. Rownames should match between founders and offspring. After loading founders file in, all non-matching rows are removed. Example of phenotype file structure:
"individual1" | "individual2" | "individual3" | "individual4" | "individual5" | |
"marker" | 8.84494695336781 | 9.06939381429179 | 9.06939381429179 | 7.72431126650435 | 6.04480152688572 |
"marker2" | 9.06939381429179 | 7.85859536346299 | 8.84494695336781 | 6.04480152688572 | 7.72431126650435 |
"marker3" | 6.04480152688572 | 6.04480152688572 | 7.85859536346299 | 7.72431126650435 | 7.85859536346299 |
"marker4" | 6.04480152688572 | 7.85859536346299 | 6.04480152688572 | 8.84494695336781 | 7.85859536346299 |
"marker5" | 7.72431126650435 | 7.72431126650435 | 17.85859536346299 | 7.85859536346299 | 7.85859536346299 |
Genotype file should have basically the same structure as the phenotype file. The genotypes codes are exactly the same as in r/qtl - for F2 populations:
AA - 1, AB - 2, BB - 3, not BB - 4, not AA - 5, missing - NA and for BC and RILs: AA - 1, BB - 2, missing - NA (see read.cross
for details.)
Example of genotype file structure:
"individual1" | "individual2" | "individual3" | "individual4" | "individual5" | |
"marker" | 1 | 1 | 2 | 1 | 2 |
"marker2" | NA | 1 | 2 | 1 | 2 |
"marker3" | 1 | 1 | 1 | 1 | 2 |
"marker4" | 1 | NA | 1 | 1 | 2 |
"marker5" | NA | 1 | 1 | 1 | 2 |
Map files should have really simple structure, always three columns, no header. First column contains rownames, second - chromosome number and third - position on chromosome (in cM for genetic or Mbp for physical map). Secodn and third column can contain only numbers (any NA, Inf, etc, will cause dropping of file). Rownames should match either ones from genotype file or ones from phenotype file, depending which one you want to use map with (see generate.biomarkers for more information). Example of map file structure:
"marker" | 1 | 0 |
"marker2" | 1 | 1.2 |
"marker3" | 1 | 1.2 |
"marker4" | 1 | 2 |
"marker5" | 1 | 3 |
You have also to specify groups ion founders file, so which columns come from which parent. Let's imagine, you have measured both parents in triplo and data for first parent is in columns 1,3 and 5, for second parent - columns 2,4,6. Founders groups should be c(0,1,0,1,0,1) then. Always use only 0 and 1 to specify groups.
add.to.population
- Adding data to existing population object.
create.population
- Create new object of class population.
# NOT RUN {
# }
# NOT RUN {
### simplest call possible
population <- read.population(founders_groups=c(0,0,0,1,1,1))
### more informative one
population <- read.population(founders_groups=c(0,0,0,1,1,1),verbose=TRUE,debugMode=1)
### imagine you prefer parents and children instead of founders and offspring:
population <- read.population(offspring="children",founders="parents",
founders_groups=c(0,0,0,1,1,1),verbose=TRUE,debugMode=1)
### etc.. when you load it, you may want to inspect it:
population$founders$phenotypes[1:10,]
# }
Run the code above in your browser using DataLab