xGScoreAdv: Function to calculate per base scores given a list of genomic regions in terms of overlaps with genomic annotations

Description

xGScoreAdv is supposed to calculate per base scores for an input list of genomic regions (genome build 19), using genomic annotations (eg genomic segments, active chromatin, transcription factor binding sites/motifs, conserved sites). The per base scores are calculated for overlaps with each genomic annotation. Scores for genomic regions/variants can be constraint/conservation or impact/pathogenicity.

Usage

xGScoreAdv(data, format = c("data.frame", "bed", "chr:start-end",
"GRanges"),
build.conversion = c(NA, "hg38.to.hg19", "hg18.to.hg19"),
GS.annotation = c("fitCons", "phastCons", "phyloP", "mcap", "cadd"),
GR.annotation = NA, details = F, verbose = T,
RData.location = "http://galahad.well.ox.ac.uk/bigdata")

Arguments

data

input genomic regions (GR). If formatted as "chr:start-end" (see the next parameter 'format' below), GR should be provided as a vector in the format of 'chrN:start-end', where N is either 1-22 or X, start (or end) is genomic positional number; for example, 'chr1:13-20'. If formatted as a 'data.frame', the first three columns correspond to the chromosome (1st column), the starting chromosome position (2nd column), and the ending chromosome position (3rd column). If the format is indicated as 'bed' (browser extensible data), the same as 'data.frame' format but the position is 0-based offset from chromomose position. If the genomic regions provided are not ranged but only the single position, the ending chromosome position (3rd column) is allowed not to be provided. The data could also be an object of 'GRanges' (in this case, formatted as 'GRanges')

format

the format of the input data. It can be one of "data.frame", "chr:start-end", "bed" or "GRanges"

build.conversion

the conversion from one genome build to another. The conversions supported are "hg38.to.hg19" and "hg18.to.hg19". By default it is NA (no need to do so)

GS.annotation

which genomic scores (GS) annotaions used. It can be 'fitCons' (the probability of fitness consequences for point mutations; http://www.ncbi.nlm.nih.gov/pubmed/25599402), 'phastCons' (the probability that each nucleotide belongs to a conserved element/negative selection [0,1]), 'phyloP' (conservation at individual sites representing -log p-values under a null hypothesis of neutral evolution, positive scores for conservation and negative scores for acceleration), 'mcap' (eliminating a majority of variants with uncertain significance in clinical exomes at high sensitivity: http://www.ncbi.nlm.nih.gov/pubmed/27776117), and 'cadd' (combined annotation dependent depletion for estimating relative levels of pathogenicity of potential human variants: http://www.ncbi.nlm.nih.gov/pubmed/24487276)

GR.annotation

the genomic regions of annotation data. By default, it is 'NA' to disable this option. Pre-built genomic annotation data are detailed in xDefineGenomicAnno. Alternatively, the user can also directly provide a customised GR object (or a list of GR objects)

details

logical to indicate whether the detailed information (ie ratio) is returned. By default, it sets to false for no inclusion

verbose

logical to indicate whether the messages will be displayed in the screen. By default, it sets to false for no display

RData.location

the characters to tell the location of built-in RData files. See xRDataLoader for details

Value

a data frame with 6 columns:

name: the annotation name
o_nBase: the number of bases overlapped between input regions and annotation regions
o_GS: the per base genomic scores for overlaps between input regions and annotation regions
a_nBase: the number of bases covered by that annotation; optional, it is only appended when "details" is true
a_GS: the per base genomic scores for that annotation; optional, it is only appended when "details" is true
ratio: ratio of o_GS divided by a_GS; optional, it is only appended when "details" is true

Examples

Run this code

# NOT RUN {
# Load the XGR package and specify the location of built-in data
library(XGR)
RData.location <- "http://galahad.well.ox.ac.uk/bigdata"

# a) provide the genomic regions
## load ImmunoBase
ImmunoBase <- xRDataLoader(RData.customised='ImmunoBase',
RData.location=RData.location)
## get lead SNPs reported in AS GWAS
data <- ImmunoBase$AS$variant

# b) in terms of overlaps with genomic segments (Primary monocytes from peripheral blood)
## fitness consequence score 
res_df <- xGScoreAdv(data=data, format="GRanges",
GS.annotation="fitCons",
GR.annotation="EpigenomeAtlas_15Segments_E029",
RData.location=RData.location)
## phastCons conservation score 
res_df <- xGScoreAdv(data=data, format="GRanges",
GS.annotation="phastCons",
GR.annotation="EpigenomeAtlas_15Segments_E029",
RData.location=RData.location)

# c) in terms of overlaps with genic annotations
## phyloP conservation score 
res_df <- xGScoreAdv(data=data, format="GRanges",
GS.annotation="phyloP", GR.annotation="Genic_anno",
RData.location=RData.location)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples