pedToHaplin: Convert from ped format data to Haplin format

Description

Converts an ASCII file from a standard ped format to the Haplin format

Usage

pedToHaplin(indata, outdata, merge = F, na.strings = "0", sep, 
colnames.out = F)

Value

The outdata file is written to disk. pedToHaplin returns (invisibly) the converted data file.

Arguments

indata: A character string giving the name and path of the ASCII data file to be converted.
outdata: A character string giving the name and path for saving the converted file.
merge: If the alleles of each genotype are in two separate columns in the indata file, they must be merged (with the ";" separator) in the outdata file. This is done by setting merge = TRUE. Otherwise, it must be set to FALSE.
na.strings: The symbol used to denote missing data in indata. It is passed directly to R's read.table
sep: Column separator in indata. If unspecified, any white space will be used, as in read.table.
colnames.out: Provided just for the purpose of checking data. If TRUE, adds colnames to the outdata file to make it more readable. NOTE: Haplin does currently not use colnames, so this should be set to FALSE when producing the file to run on.

Author

Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
hakon.gjessing@uib.no

Warning

Data files come in many shapes and formats, so you should always check the output from pedToHaplin before using it.

Details

Important: The first 6 columns should always be family id, individual id, father's id, mother's id, sex and casetype, in that order, then followed by the genetic data columns. If the genetic data columns are separated into two individual alleles, one should use the option merge = TRUE to merge them in the output file. If they are already joined in single columns, for instance as CT or C;T, merge should be set to FALSE (default).

Additional covariates can be included in the input file. If so, they should be placed after the 6 leading columns but before the genetic data. In this case, one should make sure the genetic data columns are already merged, and that merge = FALSE. (The merge = TRUE option when covariates are present will hopefully be implemented at some point...)

Note that the output file usually has three columns before (to the left of) the columns containing genetic data. These columns are family id, sex, and casetype. When running haplin on the output file one should specify the argument 'n.vars = 3' in haplin. If the data are from the x chromosome the haplin arguments should also include 'sex = 2' and 'xchrom = T'. Similarly, if the casetype variable is a case-control indicator one should use the argument 'ccvar = 3'. If the intention is to only run haplin on the cases the case triads should be saved separately in a new file prior to running haplin on it.

References

Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.

Web Site: https://haplin.bitbucket.io

Examples

Run this code


if (FALSE) {

# Standard run on supplied test file:
pedToHaplin("test_pedToHaplin.ped", outdata = "test_pedToHaplin_result.txt", 
colnames.out = F, merge = T)

}

Run the code above in your browser using DataLab