run_pathfindR
is the wrapper function for the pathfindR workflow
run_pathfindR(input, p_val_threshold = 0.05, visualize_pathways = TRUE,
human_genes = TRUE, enrichment_threshold = 0.05,
adj_method = "bonferroni", search_method = "GR",
use_all_positives = FALSE, saTemp0 = 1, saTemp1 = 0.01,
saIter = 10000, gaPop = 400, gaIter = 200, gaThread = 5,
gaMut = 0, grMaxDepth = 1, grSearchDepth = 1, grOverlap = 0.5,
grSubNum = 1000, iterations = 10, n_processes = NULL,
pin_name_path = "Biogrid", score_quan_thr = 0.8, sig_gene_thr = 10,
gene_sets = "KEGG", custom_genes = NULL, custom_pathways = NULL,
bubble = TRUE, output_dir = "pathfindR_Results",
list_active_snw_genes = FALSE, silent_option = TRUE)
the input data that pathfindR uses. The input must be a data frame with three columns:
Gene Symbol (HGNC Gene Symbol)
Change value, e.g. log(fold change) (Not obligatory)
adjusted p value associated with test, e.g. differential expression/methylation
the adjusted-p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1.
Boolean value to indicate whether or not to create pathway diagrams.
boolean to indicate whether the input genes are human gene symbols or not (default = TRUE)
threshold used when filtering individual iterations' pathway enrichment results
correction method to be used for adjusting p-values of pathway enrichment results (Default: 'bonferroni', see ?p.adjust)
algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (Default:GR. Can be one of c("GR", "SA", "GA"))
if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (Default = FALSE)
Initial temperature for SA (Default = 1.0)
Final temperature for SA (Default = 0.01)
Iteration number for SA (Default = 10000)
Population size for GA (Default = 400)
Iteration number for GA (Default = 200)
Number of threads to be used in GA (Default = 5)
For GA, applies mutation with given mutation rate (Default = 0, i.e. mutation off)
Sets max depth in greedy search, 0 for no limit (Default = 1)
Search depth in greedy search (Default = 1)
Overlap threshold for results of greedy search (Default = 0.5)
Number of subnetworks to be presented in the results (Default = 1000)
number of iterations for active subnetwork search and enrichment analyses (Default = 10. Gets set to 1 for Genetic Algorithm)
optional argument for specifying the number of processes used by foreach. If not specified, the function determines this automatically (Default == NULL. Gets set to 1 for Genetic Algorithm)
Name of the chosen PIN or path/to/PIN.sif. If PIN name, must be one of c("Biogrid", "GeneMania", "IntAct", "KEGG"). If path/to/PIN.sif, the file must comply with the PIN specifications. Defaults to "Biogrid".
active subnetwork score quantile threshold (Default = 0.80)
threshold for minimum number of significant genes (Default = 10)
the gene sets to be used for enrichment analysis. Available gene sets are KEGG, Reactome, BioCarta, GO-All, GO-BP, GO-CC, GO-MF or Custom. If "Custom", the arguments custom_genes and custom pathways must be specified. (Default = "KEGG")
a list containing the genes involved in each custom pathway. Each element is a vector of gene symbols located in the given pathway. Names correspond to the ID of the pathway.
A list containing the descriptions for each custom pathway. Names of the list correspond to the ID of the pathway.
boolean value. If TRUE, a bubble chart displaying the enrichment results is plotted. (default = TRUE)
the directory to be created under the current working directory where the output and intermediate files are saved (default: "pathfindR_Results")
boolean value indicating whether or not to report the non-DEG active subnetwork genes for the active subnetwork which was enriched for the given pathway with the lowest p value (default = FALSE)
boolean value indicating whether to print the messages to the console (FALSE) or print to a file (TRUE) during active subnetwork search (default = TRUE). This option was added because during parallel runs, the console messages get mixed up.
Data frame of pathfindR enrichment results. Columns are:
KEGG ID of the enriched pathway
Description of the enriched pathway
Fold enrichment value for the enriched pathway
the number of iterations that the given pathway was found to enriched over all iterations
the lowest adjusted-p value of the given pathway over all iterations
the highest adjusted-p value of the given pathway over all iterations
the non-DEG active subnetwork genes, comma-separated
the up-regulated genes in the input involved in the given pathway, comma-separated
the down-regulated genes in the input involved in the given pathway, comma-separated
The function also creates an HTML report with the pathfindR enrichment results linked to the visualizations of the pathways in addition to the table of converted gene symbols. This report can be found in "`output_dir`/results.html" under the current working directory.
Optionally, a bubble chart of enrichment results are plotted. The x-axis corresponds to fold enrichment values while the y-axis indicates the enriched pathways. Size of the bubble indicates the number of DEGs in the given pathway. Color indicates the -log10(lowest-p) value; the more red it gets, the more significant the pathway is.
Especially depending on the protein interaction network, the algorithm and the number of iterations you choose, active subnetwork search component of pathfindR may take a very long time to finish.
This function takes in a data frame consisting of Gene Symbol, log-fold-change
and adjusted-p values. After input testing, any gene symbols that are not in
the PIN are converted to alias symbols if the alias is in the PIN. Next,
active subnetwork search is performed. Pathway enrichment analysis is
performed using the genes in each of the active subnetworks. Pathways with
adjusted-p values lower than enrichment_threshold
are discarded. The
lowest adjusted-p value (over all subnetworks) for each pathway is kept. This
process of active subnetwork search and enrichment is repeated for a selected
number of iterations
, which is done in parallel. Over all iterations,
the lowest and the highest adjusted-p values, as well as number of occurrences
are reported for each enriched pathway.
input_testing
for input testing, input_processing
for input processing,
active_snw_search
for active subnetwork search and subnetwork filtering,
enrichment_analyses
for enrichment analysis (using the active subnetworks),
summarize_enrichment_results
for summarizing the active-subnetwork-oriented enrichment results,
annotate_pathway_DEGs
for annotation of affected genes in the given gene sets,
visualize_pws
for visualization of pathway diagrams,
enrichment_chart
for a visual summary of the pathfindR enrichment result,
foreach
for details on parallel execution of looping constructs,
cluster_pathways
for clustering the resulting enriched pathways and partitioning into clusters.
# NOT RUN {
run_pathfindR(RA_input)
# }
Run the code above in your browser using DataLab