Learn R Programming

plinkQC (version 0.2.2)

check_sex: Identification of individuals with discordant sex information

Description

Runs and evaluates results from plink --check-sex. check_sex returns IIDs for individuals whose SNPSEX != PEDSEX (where the SNPSEX is determined by the heterozygosity rate across X-chromosomal variants). Mismatching SNPSEX and PEDSEX IDs can indicate plating errors, sample-mixup or generally samples with poor genotyping. In the latter case, these IDs are likely to fail other QC steps as well. Optionally, an extra data.frame (externalSex) with sample IDs and sex can be provided to double check if external and PEDSEX data (often processed at different centers) match. If a mismatch between PEDSEX and SNPSEX was detected, while SNPSEX == Sex, PEDSEX of these individuals can optionally be updated (fixMixup=TRUE). check_sex depicts the X-chromosomal heterozygosity (SNPSEX) of the individuals split by their (PEDSEX).

Usage

check_sex(indir, name, qcdir = indir, maleTh = 0.8, femaleTh = 0.2,
  run.check_sex = TRUE, externalSex = NULL, externalFemale = "F",
  externalMale = "M", externalSexSex = "Sex", externalSexID = "IID",
  fixMixup = FALSE, interactive = FALSE, verbose = FALSE,
  label = TRUE, path2plink = NULL, showPlinkOutput = TRUE)

Arguments

indir

[character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files.

name

[character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam and name.sexcheck.

qcdir

[character] /path/to/directory to save name.sexcheck as returned by plink --check-sex. Per default qcdir=indir. If run.check_sex is FALSE, it is assumed that plink --check-sex has been run and qcdir/name.sexcheck is present. User needs writing permission to qcdir.

maleTh

[double] Threshold of X-chromosomal heterozygosity rate for males.

femaleTh

[double] Threshold of X-chromosomal heterozygosity rate for females.

run.check_sex

[logical] Should plink --check-sex be run? if set to FALSE, it is assumed that plink --check-sex has been run and qcdir/name.sexcheck is present; check_sex will fail with missing file error otherwise.

externalSex

[data.frame, optional] Dataframe with sample IDs [externalSexID] and sex [externalSexSex] to double check if external and PEDSEX data (often processed at different centers) match.

externalFemale

[integer/character] Identifier for 'female' in externalSex.

externalMale

[integer/character] Identifier for 'male' in externalSex.

externalSexSex

[character] Column identifier for column containing sex information in externalSex.

externalSexID

[character] Column identifier for column containing ID information in externalSex.

fixMixup

[logical] Should PEDSEX of individuals with mismatch between PEDSEX and Sex while Sex==SNPSEX automatically corrected: this will directly change the name.bim/.bed/.fam files!

interactive

[logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_sexcheck) via ggplot2::ggsave(p=p_sexcheck, other_arguments) or pdf(outfile) print(p_sexcheck) dev.off().

verbose

[logical] If TRUE, progress info is printed to standard out.

label

[logical] Set TRUE, to add fail IDs as text labels in scatter plot.

path2plink

[character] Absolute path to PLINK executable (https://www.cog-genomics.org/plink/1.9/) i.e. plink should be accesible as path2plink -h. The full name of the executable should be specified: for windows OS, this means path/plink.exe, for unix platforms this is path/plink. If not provided, assumed that PATH set-up works and PLINK will be found by exec_wait('plink').

showPlinkOutput

[logical] If TRUE, plink log and error messages are printed to standard out.

Value

Named list with i) fail_sex: [data.frame] with FID, IID, PEDSEX, SNPSEX and Sex (if externalSex was provided) of individuals failing sex check, ii) mixup: dataframe with FID, IID, PEDSEX, SNPSEX and Sex (if externalSex was provided) of individuals whose PEDSEX != Sex and Sex == SNPSEX and iii) p_sexcheck, a ggplot2-object 'containing' a scatter plot of the X-chromosomal heterozygosity (SNPSEX) of the sample split by their (PEDSEX), which can be shown by print(p_sexcheck).

Details

check_sex wraps around run_check_sex and evaluate_check_sex. If run.check_sex is TRUE, run_check_sex is executed ; otherwise it is assumed that plink --check-sex has been run externally and qcdir/name.sexcheck exists. check_sex will fail with missing file error otherwise.

For details on the output data.frame fail_sex, check the original description on the PLINK output format page: https://www.cog-genomics.org/plink/1.9/formats#sexcheck.

Examples

Run this code
# NOT RUN {
 
# }
# NOT RUN {
indir <- system.file("extdata", package="plinkQC")
name <- "data"
fail_sex <- check_sex(indir=indir, name=name, run.check_sex=FALSE,
interactive=FALSE, verbose=FALSE)
# }

Run the code above in your browser using DataLab