read.population: Loading genotype and phenotype data

Description

Loads genotype, phenotype, genetic map data files into R environment into a population object.

Usage

read.population(offspring = "offspring", founders = "founders", map = "map", 
  foundersGroups, populationType = c("riself", "f2", "bc", "risib"), 
  readMode = c("normal","HT"), threshold=0.05, verbose = FALSE, debugMode = 0, 
  n.cluster=1, ...)

Arguments

offspring

Core used to specify names of children phenotypic ("core_phenotypes.txt") genotypic ("core_genotypes.txt") and annotations ("core_annotations.txt") files.

founders

Core used to specify names of parental phenotypic ("core_phenotypes.txt") file.

map

Core used to specify names of genetic ("map_genetic.txt") and physical ("map_physical.txt") map files.

foundersGroups

Specify groups of individuals in founders data, see description below and RP for more details

populationType

Type of the population data was obtained from:

riself - RILs by selfing.
f2 - f2 cross.
bc - back cross.
risib - RILs by sibling mating.

readMode

HT, or High-Throughput mode should be used when the very large dataset is processed (at least 10000 probes). Then files are read in chunks intead of at once. To avoid R memory limits, only probes showing differential expression between parent are selected. Size of the chunk and threshold for assesing significance can be specified (see description of ... parameter).

threshold

- threshold for assesing probes that are differentially expressed between parents. 0.05 by default.

verbose

Be verbose

debugMode

1: Print out checks, 2: print additional time information

n.cluster

number of cores used for calcuations

...

Parameters passed to high-throughtput function:

transformations - how should the data be transformed (see transformation)
sliceSize - number of lines to be read at once byt HT function. 5000 by default.

Value

An object of class population.

Details

Function is working on tab delimited files. Phenotype files, both for founders and offspring, should have header, containing column names (so names of individuals). All the other rows should start with rowname (unique). Rownames and colnames are only values allowed to be not numeric. After file is read into R, check is performed and rows and columns containing values that are not numeric and not convertable to numeric, will be removed from dataset. Rownames should match between founders and offspring. After loading founders file in, all non-matching rows are removed. Example of phenotype file structure:

	"individual1"	"individual2"	"individual3"	"individual4"	"individual5"
"marker"	8.84494695336781	9.06939381429179	9.06939381429179	7.72431126650435	6.04480152688572
"marker2"	9.06939381429179	7.85859536346299	8.84494695336781	6.04480152688572	7.72431126650435
"marker3"	6.04480152688572	6.04480152688572	7.85859536346299	7.72431126650435	7.85859536346299
"marker4"	6.04480152688572	7.85859536346299	6.04480152688572	8.84494695336781	7.85859536346299
"marker5"	7.72431126650435	7.72431126650435	17.85859536346299	7.85859536346299	7.85859536346299

Genotype file should have basically the same structure as the phenotype file. The genotypes codes are exactly the same as in r/qtl - for F2 populations: AA - 1, AB - 2, BB - 3, not BB - 4, not AA - 5, missing - NA and for BC and RILs: AA - 1, BB - 2, missing - NA (see read.cross for details.) Example of genotype file structure:

	"individual1"	"individual2"	"individual3"	"individual4"	"individual5"
"marker"	1	1	2	1	2
"marker2"	NA	1	2	1	2
"marker3"	1	1	1	1	2
"marker4"	1	NA	1	1	2
"marker5"	NA	1	1	1	2

Map files should have really simple structure, always three columns, no header. First column contains rownames, second - chromosome number and third - position on chromosome (in cM for genetic or Mbp for physical map). Secodn and third column can contain only numbers (any NA, Inf, etc, will cause dropping of file). Rownames should match either ones from genotype file or ones from phenotype file, depending which one you want to use map with (see generate.biomarkers for more information). Example of map file structure:

"marker"	1	0
"marker2"	1	1.2
"marker3"	1	1.2
"marker4"	1	2
"marker5"	1	3

You have also to specify groups ion founders file, so which columns come from which parent. Let's imagine, you have measured both parents in triplo and data for first parent is in columns 1,3 and 5, for second parent - columns 2,4,6. Founders groups should be c(0,1,0,1,0,1) then. Always use only 0 and 1 to specify groups.

Examples

Run this code

# NOT RUN {
  
# }
# NOT RUN {
  ### simplest call possible
  population <- read.population(founders_groups=c(0,0,0,1,1,1))
  ### more informative one
  population <- read.population(founders_groups=c(0,0,0,1,1,1),verbose=TRUE,debugMode=1)
  ### imagine you prefer parents and children instead of founders and offspring:
  population <- read.population(offspring="children",founders="parents",
    founders_groups=c(0,0,0,1,1,1),verbose=TRUE,debugMode=1)
  ### etc.. when you load it, you may want to inspect it:
  population$founders$phenotypes[1:10,]
  
# }