Evaluates and depicts results from plink --check-sex (via
run_check_sex
or externally conducted sex check).
Takes file qcdir/name.sexcheck and returns IIDs for samples whose
SNPSEX != PEDSEX (where the SNPSEX is determined by the heterozygosity rate
across X-chromosomal variants).
Mismatching SNPSEX and PEDSEX IDs can indicate plating errors, sample-mixup
or generally samples with poor genotyping. In the latter case, these IDs are
likely to fail other QC steps as well.
Optionally, an extra data.frame (externalSex) with sample IDs and sex can be
provided to double check if external and PEDSEX data (often processed at
different centers) match. If a mismatch between PEDSEX and SNPSEX was
detected while SNPSEX == Sex, PEDSEX of these individuals can optionally be
updated (fixMixup=TRUE).
evaluate_check_sex
depicts the X-chromosomal heterozygosity (SNPSEX)
of the samples split by their (PEDSEX).
evaluate_check_sex(qcdir, name, maleTh = 0.8, femaleTh = 0.2,
externalSex = NULL, fixMixup = FALSE, indir = qcdir,
externalFemale = "F", externalMale = "M", externalSexSex = "Sex",
externalSexID = "IID", verbose = FALSE, label = TRUE,
path2plink = NULL, showPlinkOutput = TRUE, interactive = FALSE)
[character] /path/to/directory containing name.sexcheck as returned by plink --check-sex.
[character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam and name.sexcheck.
[double] Threshold of X-chromosomal heterozygosity rate for males.
[double] Threshold of X-chromosomal heterozygosity rate for females.
[data.frame, optional] with sample IDs [externalSexID] and sex [externalSexSex] to double check if external and PEDSEX data (often processed at different centers) match.
[logical] Should PEDSEX of individuals with mismatch between PEDSEX and Sex, with Sex==SNPSEX automatically corrected: this will directly change the name.bim/.bed/.fam files!
[character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files; only required of fixMixup==TRUE. User needs writing permission to indir.
[integer/character] Identifier for 'female' in externalSex.
[integer/character] Identifier for 'male' in externalSex.
[character] Column identifier for column containing sex information in externalSex.
[character] Column identifier for column containing ID information in externalSex.
[logical] If TRUE, progress info is printed to standard out.
[logical] Set TRUE, to add fail IDs as text labels in scatter plot.
[character] Absolute path to PLINK executable
(https://www.cog-genomics.org/plink/1.9/) i.e.
plink should be accesible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by exec_wait
('plink').
[logical] If TRUE, plink log and error messages are printed to standard out.
[logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_sexcheck) via ggplot2::ggsave(p=p_sexcheck, other_arguments) or pdf(outfile) print(p_sexcheck) dev.off().
named list with i) fail_sex: dataframe with FID, IID, PEDSEX, SNPSEX and Sex (if externalSex was provided) of individuals failing sex check, ii) mixup: dataframe with FID, IID, PEDSEX, SNPSEX and Sex (if externalSex was provided) of individuals whose PEDSEX != Sex and Sex == SNPSEX and iii) p_sexcheck, a ggplot2-object 'containing' a scatter plot of the X-chromosomal heterozygosity (SNPSEX) of the individuals split by their (PEDSEX), which can be shown by print(p_sexcheck).
Both run_check_sex
and evaluate_check_sex
can
simply be invoked by check_sex
.
For details on the output data.frame fail_sex, check the original description on the PLINK output format page: https://www.cog-genomics.org/plink/1.9/formats#sexcheck.
# NOT RUN {
qcdir <- system.file("extdata", package="plinkQC")
name <- "data"
# }
# NOT RUN {
fail_sex <- evaluate_check_sex(qcdir=qcdir, name=name, interactive=FALSE,
verbose=FALSE)
# }
Run the code above in your browser using DataLab