Learn R Programming

drake (version 7.5.2)

drake_plan: Create a workflow plan data frame for the plan argument of make().

Description

A drake plan is a data frame with columns "target" and "command". Each target is an R object produced in your workflow, and each command is the R code to produce it.

Usage

drake_plan(..., list = character(0), file_targets = NULL,
  strings_in_dots = NULL, tidy_evaluation = NULL, transform = TRUE,
  trace = FALSE, envir = parent.frame(), tidy_eval = TRUE,
  max_expand = NULL)

Arguments

...

A collection of symbols/targets with commands assigned to them. See the examples for details.

list

Deprecated

file_targets

Deprecated.

strings_in_dots

Deprecated.

tidy_evaluation

Deprecated. Use tidy_eval instead.

transform

Logical, whether to transform the plan into a larger plan with more targets. Requires the transform field in target(). See the examples for details.

trace

Logical, whether to add columns to show what happens during target transformations.

envir

Environment for tidy evaluation.

tidy_eval

Logical, whether to use tidy evaluation (e.g. unquoting/!!) when resolving commands. Tidy evaluation in transformations is always turned on regardless of the value you supply to this argument.

max_expand

Positive integer, optional upper bound on the lengths of grouping variables for map() and cross(). Comes in handy when you have a massive number of targets and you want to test on a miniature version of your workflow before you scale up to production.

Value

A data frame of targets, commands, and optional custom columns.

Columns

drake_plan() creates a special data frame. At minimum, that data frame must have columns target and command with the target names and the R code chunks to build them, respectively.

You can add custom columns yourself, either with target() (e.g. drake_plan(targ = target(my_cmd(), custom = "column"))) or by appending columns post-hoc (e.g. plan$col <- vals).

Some of these custom columns are special. They are optional, but drake looks for them at various points in the workflow.

  • elapsed and cpu: number of seconds to wait for the target to build before timing out (elapsed for elapsed time and cpu for CPU time).

  • hpc: logical values (TRUE/FALSE/NA) whether to send each target to parallel workers. Visit https://ropenscilabs.github.io/drake-manual/hpc.html#selectivity to learn more.

  • resources: target-specific lists of resources for a computing cluster. See https://ropenscilabs.github.io/drake-manual/hpc.html#advanced-options for details.

  • retries: number of times to retry building a target in the event of an error.

  • seed: an optional pseudo-random number generator (RNG) seed for each target. drake usually comes up with its own unique reproducible target-specific seeds using the global seed (the seed argument to make() and drake_config()) and the target names, but you can overwrite these automatic seeds. NA entries default back to drake's automatic seeds.

  • trigger: rule to decide whether a target needs to run. It is recommended that you define this one with target(). Details: https://ropenscilabs.github.io/drake-manual/triggers.html.

Keywords

drake_plan() understands special keyword functions for your commands. With the exception of target(), each one is a proper function with its own help file.

  • target(): declare more than just the command, e.g. assign a trigger or transform. Examples: https://ropenscilabs.github.io/drake-manual/plans.html#large-plans. # nolint

  • file_in(): declare an input file dependency.

  • file_out(): declare an output file to be produced when the target is built.

  • knitr_in(): declare a knitr file dependency such as an R Markdown (*.Rmd) or R LaTeX (*.Rnw) file.

  • ignore(): force drake to entirely ignore a piece of code: do not track it for changes and do not analyze it for dependencies.

  • no_deps(): tell drake to not track the dependencies of a piece of code. drake still tracks the code itself for changes.

  • drake_envir(): get the environment where drake builds targets. Intended for advanced custom memory management.

DSL

drake has special syntax for generating large plans. Your code will look something like drake_plan(x = target(cmd, transform = f(y, z), group = g) where f() is either map(), cross(), split(), or combine() (similar to purrr::pmap(), tidy::crossing(), base::split(), and dplyr::summarize(), respectively). These verbs mimic Tidyverse behavior to scale up existing plans to large numbers of targets. You can read about this interface at https://ropenscilabs.github.io/drake-manual/plans.html#large-plans. # nolint

Details

Besides "target" and "command", drake_plan() understands a special set of optional columns. For details, visit https://ropenscilabs.github.io/drake-manual/plans.html#special-custom-columns-in-your-plan

Examples

Run this code
# NOT RUN {
isolate_example("contain side effects", {
# For more examples, visit
# https://ropenscilabs.github.io/drake-manual/plans.html.

# Create drake plans:
mtcars_plan <- drake_plan(
  write.csv(mtcars[, c("mpg", "cyl")], file_out("mtcars.csv")),
  value = read.csv(file_in("mtcars.csv"))
)
mtcars_plan
make(mtcars_plan) # Makes `mtcars.csv` and then `value`
head(readd(value))
# You can use knitr inputs too. See the top command below.

load_mtcars_example()
head(my_plan)

# The `knitr_in("report.Rmd")` tells `drake` to dive into the active
# code chunks to find dependencies.
# There, `drake` sees that `small`, `large`, and `coef_regression2_small`
# are loaded in with calls to `loadd()` and `readd()`.
deps_code("report.Rmd")

# Use transformations to generate large plans.
# Read more at
# <https://ropenscilabs.github.io/drake-manual/plans.html#create-large-plans-the-easy-way>. # nolint
drake_plan(
  data = target(
    simulate(nrows),
    transform = map(nrows = c(48, 64)),
    custom_column = 123
  ),
  reg = target(
    reg_fun(data),
   transform = cross(reg_fun = c(reg1, reg2), data)
  ),
  summ = target(
    sum_fun(data, reg),
   transform = cross(sum_fun = c(coef, residuals), reg)
  ),
  winners = target(
    min(summ),
    transform = combine(summ, .by = c(data, sum_fun))
  )
)

# Split data among multiple targets.
drake_plan(
  large_data = get_data(),
  slice_analysis = target(
    large_data %>%
      analyze(),
    transform = split(large_data, slices = 4)
  ),
  results = target(
    rbind(slice_analysis),
    transform = combine(slice_analysis)
  )
)

# Set trace = TRUE to show what happened during the transformation process.
drake_plan(
  data = target(
    simulate(nrows),
    transform = map(nrows = c(48, 64)),
    custom_column = 123
  ),
  reg = target(
    reg_fun(data),
   transform = cross(reg_fun = c(reg1, reg2), data)
  ),
  summ = target(
    sum_fun(data, reg),
   transform = cross(sum_fun = c(coef, residuals), reg)
  ),
  winners = target(
    min(summ),
    transform = combine(summ, .by = c(data, sum_fun))
  ),
  trace = TRUE
)

# You can create your own custom columns too.
# See ?triggers for more on triggers.
drake_plan(
  website_data = target(
    command = download_data("www.your_url.com"),
    trigger = "always",
    custom_column = 5
  ),
  analysis = analyze(website_data)
)

# Tidy evaluation can help generate super large plans.
sms <- rlang::syms(letters) # To sub in character args, skip this.
drake_plan(x = target(f(char), transform = map(char = !!sms)))
})
# }

Run the code above in your browser using DataLab