NOTE: This function is probably less useful now that GenABEL is no longer used by Haplin. The function is used to prepare a ped file for loading into GenABEL. However, GenABEL requires unique individual IDs in the file, not only unique within family. Furthermore, numeric allele coding 1,2,3,4 is not accepted. To fix this, convertPed
can be run prior to running prepPed
. This will create unique IDs and do the necessary allele recoding, and possibly also select and reorder SNPs. convertPed
will also update the corresponding map file.
convertPed(ped.infile, map.infile, ped.outfile, map.outfile, create.unique.id = FALSE,
convert, snp.select = NULL, choose.lines = NULL, col.sep = " ",
ask = TRUE, blank.lines.skip = TRUE, verbose = TRUE)
There is no useful output; the objective of convertPed
is the converted ped file and the modified map file.
A character string giving the name of the standard ped file to be modified. The name of the file is relative to the current working directory, unless the file name contains an absolute path.
See Details for a description of the standard ped format.
A character string giving the name and path of the to-be-modified standard map file. Optional if snp.select = NULL. A description of the standard map format is given in the Details section.
A character string of the name and path of the converted ped file.
A character string giving the name and path of the modified map file.
Logical. If "TRUE", the function creates a unique individual ID.
No default. The option "ACGT_to_1234" recodes the SNP alleles from A,C,G,T to 1,2,3,4, whereas "1234_to_ACGT" converts from 1,2,3,4 to A,C,G,T. If "no_recode", no conversion occurs.
A character vector of the SNP identifiers (RS codes) or a numeric vector of the SNP numbers to be extracted. Default is "NULL", which means that all SNPs are selected without reordering among the SNPs. The RS codes or SNP numbers may be listed in any order. Reordering among the selected SNPs will occur in the modified files corresponding to this listing.
A numeric vector of lines to be selected from the ped file. If "NULL" (default), all lines are selected.
Specifies the separator that splits the columns in ped.infile
. By default, col.sep = " " (space). To split at all types of space or blank characters, set col.sep = "[[:space:]]" or col.sep = "[[:blank:]]".
Logical. Default is "TRUE". If set to "FALSE", an already existing outfile will be overwritten without asking.
Logical. If "TRUE" (default), convertPed
ignores blank lines in ped.infile
and map.infile
.
Logical. Default is "TRUE", which means that the line number is displayed for each iteration, i.e. each line read and modified, in addition to the first ten columns of the converted line.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
hakon.gjessing@uib.no
convertPed
assumes a standard ped file as input.
The format of the ped file should look something like this:
1104 1 2 3 1 2 4 1 3 2 1 1
1104 2 0 0 1 1 4 1 2 2 4 1
1104 3 0 0 2 1 0 0 0 0 0 0
1105 1 2 3 2 2 1 1 2 2 4 1
1105 2 0 0 1 1 1 1 2 2 1 1
1105 3 0 0 2 1 1 1 3 2 4 4
The column values are: Family ID, Individual ID, Father's ID, Mother's ID, Sex (1 = male, 2 = female, alternatively: 1 = male, 0 = female), and Case-control status (1 = controls, 2 = cases, alternatively: 0 = controls, 1 = cases).
Column 7 and onwards contain the genotype data, with alleles in separate columns, two columns representing one SNP. A ``0'' is used to denote missing data.
The corresponding map file should look something like this:
Chromosome SNP-identifier Base-pair-position
1 RS9629043 554636
1 RS12565286 711153
1 RS12138618 740098
Alternatively, the map file could contain four columns. The column values should then be:
Chromosome, SNP-identifier, Genetic-distance, Base-pair-position.
A header must be added to the map file if this does not already have one.
After creating unique individual IDs and recoding the SNP alleles from 1,2,3,4 to A,C,G,T (using convertPed
with options create.unique.id = TRUE
and convert = "1234_to_ACGT"
),
the ped file above should look like this:
1104 1104_1 1104_2 1104_3 1 2 T A G C A A
1104 1104_2 0 0 1 1 T A C C T A
1104 1104_3 0 0 2 1 0 0 0 0 0 0
1105 1105_1 1105_2 1105_3 2 2 A A C C T A
1105 1105_2 0 0 1 1 A A C C A A
1105 1105_3 0 0 2 1 A A G C T T
Web Site: https://haplin.bitbucket.io
lineByLine
, Haplin:::lineConvert
, snpPos
if (FALSE) {
# Create unique individual IDs and recode SNP alleles from 1,2,3,4 to A,C,G,T
convertPed(ped.infile = "mygwas.ped", map.infile = "mygwas.map",
ped.outfile = "mygwas_modified.ped", map.outfile = "mygwas_modified.map",
create.unique.id = TRUE, convert = "1234_to_ACGT", ask = TRUE)
}
Run the code above in your browser using DataLab