genDataRead: Reading the genetic data from a file

Description

This function will read in data from PED or haplin formatted file.

Usage

genDataRead(
  file.in = stop("Filename must be given!", call. = FALSE),
  file.out = NULL,
  dir.out = ".",
  format = stop("Format parameter is required!"),
  header = FALSE,
  n.vars,
  cov.file.in,
  cov.header,
  map.file,
  map.header = FALSE,
  allele.sep = ";",
  na.strings = "NA",
  col.sep = "",
  overwrite = NULL
)

Value

A list object with three elements:

cov.data - a data.frame with covariate data (if available in the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.

Arguments

file.in

The name of the main input file with genotype information.

file.out

The base for the output filename (by default, constructed from the input file name).

dir.out

The path to the directory where the output files will be saved.

format

Format of data (will influence how data is processed) - choose from:

haplin - data already in one row per family,
ped - data from .ped file, each row represents an individual.

header

Whether the first line of the main input file contains column names; default: FALSE; NB: this is useful only for 'haplin'-formatted files!

n.vars

The number of columns with covariate data (if any) in the main file; NB: if the main file is in PED format, it is assumed that the first 6 columns contain the standard PED-covariates (i.e., family ID, ID of the child, father and mother, sex and case-control status), so in this case setting 'n.vars' is useful only if the PED file contains more than 6 covariate columns.

cov.file.in

Name of the file containing additional covariate data, if any. Caution: unless the 'cov.header' argument is used, it is assumed that the first line of this file contains the header (i.e., the column names of the additional data).

cov.header

The character vector containing the names of covariate columns (in the file with additional covariate data if given by the 'cov.file.in' argument; or in the main file, if it's a "haplin"-formatted file).

map.file

Filename (with path if the file is not in current directory) of the .map file holding the SNP names, if available (see Details).

map.header

Logical: does the map.file contain a header in the first row? Default: FALSE.

allele.sep

Character: separator between two alleles (default: ";").

na.strings

Character or NA: how the missing data is coded (default: "NA").

col.sep

Character: separator between the columns (i.e., markers; default: any whitespace character).

overwrite

Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist.

Usage note

When reading in a covariate file together with the genotype information, it is advised to include the header in the file, so that there is no doubt to the naming of the data columns.

Details

The .map file should contain at least two columns, where the second one contains SNP names. Any additional columns should be separated by a whitespace character, but will be ignored. The file should contain a header.

Examples

Run this code

  # The argument 'overwrite' is set to TRUE!
  examples.dir <- system.file( "extdata", package = "Haplin" )
  # ped format:
  example.file2 <- file.path( examples.dir, "exmpl_data.ped" )
  ped.data.read <- genDataRead( example.file2, file.out = "exmpl_ped_data", 
   dir.out = tempdir( check = TRUE ), format = "ped", overwrite = TRUE )
  ped.data.read
  # haplin format:
  example.file1 <- file.path( examples.dir, "HAPLIN.trialdata2.txt" )
  haplin.data.read <- genDataRead( file.in = example.file1,
   file.out = "exmpl_haplin_data", format = "haplin", allele.sep = "", n.vars = 2, 
   cov.header = c( "smoking", "sex" ), overwrite = TRUE,
   dir.out = tempdir( check = TRUE ) )
  haplin.data.read

Run the code above in your browser using DataLab