drake_plan()
.
The target()
function is a way to
configure individual targets in a drake
plan.
Its most common use is to invoke static branching
and dynamic branching, and it can also set the values
of custom columns such as format
, elapsed
, retries
,
and max_expand
. Details are at
https://books.ropensci.org/drake/plans.html#special-columns
.
Note: drake_plan(my_target = my_command())
is equivalent to
drake_plan(my_target = target(my_command())
.
target(command = NULL, transform = NULL, dynamic = NULL, ...)
A one-row workflow plan data frame with the named arguments as columns.
The command to build the target.
A call to map()
, split()
, cross()
, or combine()
to apply a static transformation. Details:
https://books.ropensci.org/drake/static.html
A call to map()
, cross()
, or group()
to apply a dynamic transformation. Details:
https://books.ropensci.org/drake/dynamic.html
Optional columns of the plan for a given target.
See the Columns section of this help file for a selection
of special columns that drake
understands.
drake_plan()
creates a special data frame. At minimum, that data frame
must have columns target
and command
with the target names and the
R code chunks to build them, respectively.
You can add custom columns yourself, either with target()
(e.g.
drake_plan(y = target(f(x), transform = map(c(1, 2)), format = "fst"))
)
or by appending columns post-hoc (e.g. plan$col <- vals
).
Some of these custom columns are special. They are optional,
but drake
looks for them at various points in the workflow.
transform
: a call to map()
, split()
, cross()
, or
combine()
to create and manipulate large collections of targets.
Details: (https://books.ropensci.org/drake/plans.html#large-plans
). # nolint
format
: set a storage format to save big targets more efficiently.
See the "Formats" section of this help file for more details.
trigger
: rule to decide whether a target needs to run.
It is recommended that you define this one with target()
.
Details: https://books.ropensci.org/drake/triggers.html
.
hpc
: logical values (TRUE
/FALSE
/NA
) whether to send each target
to parallel workers.
Visit https://books.ropensci.org/drake/hpc.html#selectivity
to learn more.
resources
: target-specific lists of resources for a computing cluster.
See
https://books.ropensci.org/drake/hpc.html#advanced-options
for details.
caching
: overrides the caching
argument of make()
for each target
individually. Possible values:
"main": tell the main process to store the target in the cache.
"worker": tell the HPC worker to store the target in the cache.
NA: default to the caching
argument of make()
.
elapsed
and cpu
: number of seconds to wait for the target to build
before timing out (elapsed
for elapsed time and cpu
for CPU time).
retries
: number of times to retry building a target
in the event of an error.
seed
: an optional pseudo-random number generator (RNG)
seed for each target. drake
usually comes up with its own
unique reproducible target-specific seeds using the global seed
(the seed
argument to make()
and drake_config()
)
and the target names, but you can overwrite these automatic seeds.
NA
entries default back to drake
's automatic seeds.
max_expand
: for dynamic branching only. Same as the max_expand
argument of make()
, but on a target-by-target basis.
Limits the number of sub-targets created for a given target.
drake_plan()
understands special keyword functions for your commands.
With the exception of target()
, each one is a proper function
with its own help file.
target()
: give the target more than just a command.
Using target()
, you can apply a transformation
(examples: https://books.ropensci.org/drake/plans.html#large-plans
), # nolint
supply a trigger (https://books.ropensci.org/drake/triggers.html
), # nolint
or set any number of custom columns.
file_in()
: declare an input file dependency.
file_out()
: declare an output file to be produced
when the target is built.
knitr_in()
: declare a knitr
file dependency such as an
R Markdown (*.Rmd
) or R LaTeX (*.Rnw
) file.
ignore()
: force drake
to entirely ignore a piece of code:
do not track it for changes and do not analyze it for dependencies.
no_deps()
: tell drake
to not track the dependencies
of a piece of code. drake
still tracks the code itself for changes.
id_chr()
: Get the name of the current target.
drake_envir()
: get the environment where drake builds targets.
Intended for advanced custom memory management.
Specialized target formats increase efficiency and flexibility.
Some allow you to save specialized objects like keras
models,
while others increase the speed while conserving storage and memory.
You can declare target-specific formats in the plan
(e.g. drake_plan(x = target(big_data_frame, format = "fst"))
)
or supply a global default format
for all targets in make()
.
Either way, most formats have specialized installation requirements
(e.g. R packages) that are not installed with drake
by default.
You will need to install them separately yourself.
Available formats:
"file"
: Dynamic files. To use this format, simply create
local files and directories yourself and then return
a character vector of paths as the target's value.
Then, drake
will watch for changes to those files in
subsequent calls to make()
. This is a more flexible
alternative to file_in()
and file_out()
, and it is
compatible with dynamic branching.
See https://github.com/ropensci/drake/pull/1178
for an example.
"fst"
: save big data frames fast. Requires the fst
package.
Note: this format strips non-data-frame attributes such as the
"fst_tbl"
: Like "fst"
, but for tibble
objects.
Requires the fst
and tibble
packages.
Strips away non-data-frame non-tibble attributes.
"fst_dt"
: Like "fst"
format, but for data.table
objects.
Requires the fst
and data.table
packages.
Strips away non-data-frame non-data-table attributes.
"diskframe"
:
Stores disk.frame
objects, which could potentially be
larger than memory. Requires the fst
and disk.frame
packages.
Coerces objects to disk.frame
s.
Note: disk.frame
objects get moved to the drake
cache
(a subfolder of .drake/
for most workflows).
To ensure this data transfer is fast, it is best to
save your disk.frame
objects to the same physical storage
drive as the drake
cache,
as.disk.frame(your_dataset, outdir = drake_tempfile())
.
"keras"
: save Keras models as HDF5 files.
Requires the keras
package.
"qs"
: save any R object that can be properly serialized
with the qs
package. Requires the qs
package.
Uses qsave()
and qread()
.
Uses the default settings in qs
version 0.20.2.
"rds"
: save any R object that can be properly serialized.
Requires R version >= 3.5.0 due to ALTREP.
Note: the "rds"
format uses gzip compression, which is slow.
"qs"
is a superior format.
target()
must be called inside drake_plan()
.
It is invalid otherwise.
drake_plan()
, make()
# Use target() to create your own custom columns in a drake plan.
# See ?triggers for more on triggers.
drake_plan(
website_data = target(
download_data("www.your_url.com"),
trigger = "always",
custom_column = 5
),
analysis = analyze(website_data)
)
models <- c("glm", "hierarchical")
plan <- drake_plan(
data = target(
get_data(x),
transform = map(x = c("simulated", "survey"))
),
analysis = target(
analyze_data(data, model),
transform = cross(data, model = !!models, .id = c(x, model))
),
summary = target(
summarize_analysis(analysis),
transform = map(analysis, .id = c(x, model))
),
results = target(
bind_rows(summary),
transform = combine(summary, .by = data)
)
)
plan
if (requireNamespace("styler", quietly = TRUE)) {
print(drake_plan_source(plan))
}
Run the code above in your browser using DataLab