genDataGetPart: Extracting part of genetic data.

Description

This function enables to extract (and save for later use) part of genetic data read in with genDataRead.

Usage

genDataGetPart(
  data.in = stop("No data given!", call. = FALSE),
  design = stop("Design type must be given!"),
  markers,
  indiv.ids,
  rows,
  cc,
  sex,
  file.out = "my_data_part",
  dir.out = ".",
  overwrite = NULL,
  ...
)

Value

A list object with three elements:

cov.data - a data.frame with covariate data (if available in the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.

This now contains only the selected subset of data.

Arguments

data.in

The data object (in format as the output of genDataRead).

design

The design used in the study - choose from:

triad - (default), data includes genotypes of mother, father and child;
cc - classical case-control;
cc.triad - hybrid design: triads with cases and controls;

Any of the following can be given to narrow down the dataset:

markers

Vector with numbers or names indicating which markers to choose.

indiv.ids

Character vector giving IDs of individuals. CAUTION: in a standard PED file, individual IDs are not unique, so this will select all individuals with given IDs.

rows

Numeric vector giving the positions - this will select only these rows.

cc

One or more values to choose based on case-control status ('cc' column).

sex

One or more values to choose based on the 'sex' column.

file.out

The base for the output filename (default: "my_data_part").

dir.out

The path to the directory where the output files will be saved.

overwrite

Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist.

...

If any additional covariate data are available in data.in, the user can choose based on values of these (see the Examples section).

Warning

No checks are performed when choosing a subset of the data - it is the user's obligation to check whether the data subset contains correct number of individuals (especially important when using the triad design study) and/or markers!

Details

The genetic data from GWAS studies can be quite large, and thus the analysis is time-consuming. If a user knows where they want to focus the analysis, they can use this function to extract part of the entire dataset and use only this part in subsequent Haplin analysis.

Examples

Run this code

  # The argument 'overwrite' is set to TRUE!
  # Read the data:
  examples.dir <- system.file( "extdata", package = "Haplin" )
  example.file <- file.path( examples.dir, "HAPLIN.trialdata2.txt" )
  my.gen.data.read <- genDataRead( file.in = example.file, file.out = "trial_data",
   dir.out = tempdir( check = TRUE ), format = "haplin", allele.sep = "", n.vars = 2,
   cov.header = c( "smoking", "sex" ), overwrite = TRUE )
  my.gen.data.read
  # Extract part with only men:
  men.subset <- genDataGetPart( my.gen.data.read, design = "triad", sex = 1,
    dir.out = tempdir( check = TRUE ), file.out = "gen_data_men_only", overwrite = TRUE )
  men.subset
  # Extract the part with only smoking women:
  women.smoke.subset <- genDataGetPart( my.gen.data.read, design = "triad",
    dir.out = tempdir( check = TRUE ), sex = 0, smoking = c( 1,2 ), overwrite = TRUE )
  women.smoke.subset

Run the code above in your browser using DataLab