Interface to the g:Profiler tool for finding enrichments in gene lists. Organism names are constructed by concatenating the first letter of the name and the family name. Example: human - 'hsapiens', mouse - 'mmusculus'. If requesting PNG output, the request is directed to the g:GOSt tool in case 'query' is a vector and the g:Cocoa (compact view of multiple queries) tool in case 'query' is a list. PNG output can fail (return FALSE) in case the input query is too large. In such case, it is advisable to fall back to a non-image request.
gprofiler(query, organism = "hsapiens", sort_by_structure = T,
ordered_query = F, significant = T, exclude_iea = F,
underrep = F, evcodes = F, region_query = F, max_p_value = 1,
min_set_size = 0, max_set_size = 0, min_isect_size = 0,
correction_method = "analytical", hier_filtering = "none",
domain_size = "annotated", custom_bg = "", numeric_ns = "",
png_fn = NULL, include_graph = F, src_filter = NULL)
vector of gene IDs or a list of such vectors. In the latter case,
the query is directed to g:Cocoa, which yields a different graphical output
if requested with the png_fn
parameter.
organism name.
whether hierarchical sorting is enabled or disabled.
in case output gene lists are ranked this option may be used to get GSEA style p-values.
whether all or only statistically significant results should be returned.
exclude electronic annotations (IEA).
measure underrepresentation.
include GO evidence codes as the final column of output. Note that this can decrease performance and make the query slower.
interpret query as chromosomal ranges.
custom p-value threshold, results with a larger p-value are excluded.
minimum size of functional category, smaller categories are excluded.
maximum size of functional category, larger categories are excluded.
minimum size of the overlap (intersection) between query and functional category, smaller intersections are excluded.
the algorithm used for determining the significance threshold, one of "gSCS", "fdr", "bonferroni".
hierarchical filtering strength, one of "none", "moderate", "strong".
statistical domain size, one of "annotated", "known".
vector of gene names to use as a statistical background.
namespace to use for fully numeric IDs.
request the result as PNG image and write it to png_fn.
request inclusion of network data with the result.
a vector of data sources to use. Currently, these include GO (GO:BP, GO:MF, GO:CC to select a particular GO branch), KEGG, REAC, TF, MI, CORUM, HP, HPA, OMIM. Please see the g:GOSt web tool for the comprehensive list and details on incorporated data sources.
A data frame with the enrichment analysis results. If the input consisted of several lists the corresponding list is indicated with a variable 'query number'. When requesting a PNG image, either TRUE or FALSE, depending on whether a non-empty result was received and a file written or not, respectively. If 'include_graph' is set, the return value may include the attribute 'networks', containing a list of all network sources, each in turn containing a list of graph edges. The edge structure is a list containing the two interacting symbols and two boolean values (in that order), indicating whether the first or second interactor is part of the input query (core nodes).
J. Reimand, M. Kull, H. Peterson, J. Hansen, J. Vilo: g:Profiler - a web-based toolset for functional profiling of gene lists from large-scale experiments (2007) NAR 35 W193-W200
# NOT RUN {
gprofiler(c("Klf4", "Pax5", "Sox2", "Nanog"), organism = "mmusculus")
# }
Run the code above in your browser using DataLab