max_useful_jobs: Function `max_useful_jobs`

Description

Get the maximum number of useful jobs in the next call to make(..., jobs = YOUR_CHOICE).

Usage

max_useful_jobs(plan, from_scratch = FALSE,
  targets = drake::possible_targets(plan), envir = parent.frame(),
  verbose = TRUE, cache = NULL, jobs = 1,
  parallelism = drake::default_parallelism(), packages = (.packages()),
  prework = character(0), config = NULL, imports = c("files", "all",
  "none"))

Arguments

plan

workflow plan data frame, same as for function make().

from_scratch

logical, whether to compute the max useful jobs as if the workflow were to run from scratch (with all targets out of date).

targets

names of targets to bulid, same as for function make().

envir

environment to import from, same as for function make(). config$envir is ignored in favor of envir.

verbose

logical, whether to output messages to the console.

cache

optional drake cache. See codenew_cache(). If The cache argument is ignored if a non-null config argument is supplied.

jobs

The outdated() function is called internally, and it needs to import objects and examine your input files to see what has been updated. This could take some time, and parallel computing may be needed to speed up the process. The jobs argument is number of parallel jobs to use for faster computation.

parallelism

Choice of parallel backend to speed up the computation. Execution order in make() is slightly different when parallelism equals 'Makefile' because in that case, all the imports are imported before any target is built. Thus, max_useful_jobs() may give a different answer for Makefile parallelism. See ?parallelism_choices for details.

packages

same as for make

prework

same as for make

config

internal configuration list of make(...), produced also with config(). config$envir is ignored. Otherwise, computing this in advance could save time if you plan multiple calls to dataframes_graph().

imports

Set the imports argument to change your assumptions about how fast objects/files are imported. Possible values:

'all': Factor all imported files/objects into calculating the max useful number of jobs. Note: this is not appropriate for make(.., parallelism = 'Makefile') because imports are processed sequentially for the Makefile option.
'files': Factor all imported files into the calculation, but ignore all the other imports.
'none': Ignore all the imports and just focus on the max number of useful jobs for parallelizing targets.

Value

a list of three data frames: one for nodes, one for edges, and one for the legend/key nodes.

Details

Any additional jobs more than max_useful_jobs(...) will be superfluous, and could even slow you down for make(..., parallelism = 'parLapply'). Set Set the imports argument to change your assumptions about how fast objects/files are imported. IMPORTANT: you must be in the root directory of your project.

Examples

Run this code

# NOT RUN {
load_basic_example()
plot_graph(my_plan) # Look at the graph to make sense of the output.
max_useful_jobs(my_plan) # 8
max_useful_jobs(my_plan, imports = 'files') # 8
max_useful_jobs(my_plan, imports = 'all') # 10
max_useful_jobs(my_plan, imports = 'none') # 8
make(my_plan)
plot_graph(my_plan)
# Ignore the targets already built.
max_useful_jobs(my_plan) # 1
max_useful_jobs(my_plan, imports = 'files') # 1
max_useful_jobs(my_plan, imports = 'all') # 10
max_useful_jobs(my_plan, imports = 'none') # 0
# Change a function so some targets are now out of date.
reg2 = function(d){
  d$x3 = d$x^3
  lm(y ~ x3, data = d)
}
plot_graph(my_plan)
max_useful_jobs(my_plan) # 4
max_useful_jobs(my_plan, imports = 'files') # 4
max_useful_jobs(my_plan, imports = 'all') # 10
max_useful_jobs(my_plan, imports = 'none') # 4
# }

Run the code above in your browser using DataLab