Learn R Programming

XGR (version 1.1.4)

xLDenricher: Function to conduct LD-based enrichment analysis using genomic annotations via sampling

Description

xLDenricher is supposed to conduct LD-based enrichment analysis for the input genomic region data (genome build h19), using genomic annotations (eg active chromatin, transcription factor binding sites/motifs, conserved sites). Enrichment analysis is achieved by comparing the observed overlaps against the expected overlaps which are estimated from the null distribution. The null LD block is generated via sampling from the background (for example, all GWAS SNPs or all common SNPs), respecting the maf of the best SNP and/or the distance of the best SNP to the nearest gene, restricting the same chromosome or not.

Usage

xLDenricher(bLD, GR.SNP = c("dbSNP_GWAS", "dbSNP_Common",
"dbSNP_Single"),
num.samples = 2000, respect = c("maf", "distance", "both"),
restrict.chr = F, preserve = c("exact", "boundary"), seed = 825,
p.adjust.method = c("BH", "BY", "bonferroni", "holm", "hochberg",
"hommel"),
GR.annotation = NA, verbose = T,
RData.location = "http://galahad.well.ox.ac.uk/bigdata")

Arguments

bLD

a bLD object, containing a set of blocks based on which to generate a null distribution

GR.SNP

the genomic regions of SNPs. By default, it is 'dbSNP_GWAS', that is, SNPs from dbSNP (version 150) restricted to GWAS SNPs and their LD SNPs (hg19). It can be 'dbSNP_Common', that is, Common SNPs from dbSNP (version 150) plus GWAS SNPs and their LD SNPs (hg19). Alternatively, the user can specify the customised GR object directly

num.samples

the number of samples randomly generated

respect

how to respect the properties of to-be-sampled LD blocks. It can be one of 'maf' (respecting the maf of the best SNP), 'distance' (respecting the distance of the best SNP to the nearest gene), and 'both' (respecting the maf and distance)

restrict.chr

logical to restrict to the same chromosome. By default, it sets to false

preserve

how to preserve the resulting null LD block. It can be one of 'boundary' (preserving the boundary of the LD block), and 'exact' (exactly preserving the relative SNP locations within the LD block). Notably, no huge difference for the boundary preserving when enrichment analysis invovles region-based genomic annotations, but it may make difference when genomic annatations are largely SNP-based (such as eQTLs)

seed

an integer specifying the seed

p.adjust.method

the method used to adjust p-values. It can be one of "BH", "BY", "bonferroni", "holm", "hochberg" and "hommel". The first two methods "BH" (widely used) and "BY" control the false discovery rate (FDR: the expected proportion of false discoveries amongst the rejected hypotheses); the last four methods "bonferroni", "holm", "hochberg" and "hommel" are designed to give strong control of the family-wise error rate (FWER). Notes: FDR is a less stringent condition than FWER

GR.annotation

the genomic regions of annotation data. By default, it is 'NA' to disable this option. Pre-built genomic annotation data are detailed in xDefineGenomicAnno. Alternatively, the user can also directly provide a customised GR object (or a list of GR objects)

verbose

logical to indicate whether the messages will be displayed in the screen. By default, it sets to false for no display

RData.location

the characters to tell the location of built-in RData files. See xRDataLoader for details

Value

a data frame with 13 columns:

  • name: the annotation name

  • nAnno: the number of regions from annotation data

  • nOverlap: the observed number of LD blocks overlapped with annotation data

  • fc: fold change

  • zscore: z-score

  • pvalue: p-value

  • adjp: adjusted p-value. It is the p value but after being adjusted for multiple comparisons

  • or: a vector containing odds ratio

  • CIl: a vector containing lower bound confidence interval for the odds ratio

  • CIu: a vector containing upper bound confidence interval for the odds ratio

  • nData: the number of input LD blocks

  • nExpect: the expected number of LD blocks overlapped with annotation data

  • std: the standard deviation of expected number of LD blocks overlapped with annotation data

See Also

xDefineGenomicAnno

Examples

Run this code
# NOT RUN {
# Load the XGR package and specify the location of built-in data
library(XGR)
RData.location <- "http://galahad.well.ox.ac.uk/bigdata"

# }
# NOT RUN {
# a) provide the seed SNPs with the significance info
## load ImmunoBase
data(ImmunoBase)
## get lead SNPs reported in AS GWAS and their significance info (p-values)
gr <- ImmunoBase$AS$variant
data <- GenomicRanges::mcols(gr)[,c('Variant','Pvalue')]

# b) get LD block (EUR population)
bLD <- xLDblock(data, include.LD="EUR", LD.r2=0.8,
RData.location=RData.location)

## c) perform enrichment analysis using FANTOM expressed enhancers
eTerm <- xLDenricher(bLD, GR.annotation="ReMap_Encode_mergedTFBS",
RData.location=RData.location)

## d) view enrichment results for the top significant terms
xEnrichViewer(eTerm)

## e) barplot of enriched terms
bp <- xEnrichBarplot(eTerm, top_num='auto', displayBy="fdr")
bp

## f) forest plot of enrichment results
gp <- xEnrichForest(eTerm, FDR.cutoff=0.01)

## g) save enrichment results to the file called 'LD_enrichments.txt'
output <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp),
sortBy="adjp", details=TRUE)
utils::write.table(output, file="LD_enrichments.txt", sep="\t",
row.names=FALSE)

## h) compare boundary and exact
GR.SNP <- xRDataLoader("dbSNP_GWAS", RData.location=RData.location)
GR.annotation <- xRDataLoader("FANTOM5_CAT_Cell",
RData.location=RData.location)
eTerm_boundary <- xLDenricher(bLD, GR.SNP=GR.SNP,
GR.annotation=GR.annotation, num.samples=20000, preserve="boundary",
RData.location=RData.location)
eTerm_exact <- xLDenricher(bLD, GR.SNP=GR.SNP,
GR.annotation=GR.annotation, num.samples=20000, preserve="exact",
RData.location=RData.location)
ls_eTerm <- list(boundary=eTerm_boundary, exact=eTerm_exact)
### barplot
bp <- xEnrichCompare(ls_eTerm, displayBy="zscore")
### forest plot
eTerm_boundary$group <- 'boundary'
eTerm_exact$group <- 'exact'
df <- rbind(eTerm_boundary, eTerm_exact)
gp <- xEnrichForest(df, FDR.cutoff=0.01)
# }

Run the code above in your browser using DataLab