Learn R Programming

drake (version 7.5.2)

drake_cache_log: Get a table that represents the state of the cache.

Description

This functionality is like make(..., cache_log_file = TRUE), but separated and more customizable. Hopefully, this functionality is a step toward better data versioning tools.

Usage

drake_cache_log(path = NULL, search = NULL,
  cache = drake::drake_cache(path = path, verbose = verbose),
  verbose = 1L, jobs = 1, targets_only = FALSE)

Arguments

path

Path to a drake cache (usually a hidden .drake/ folder) or NULL.

search

Deprecated.

cache

drake cache. See new_cache(). If supplied, path is ignored.

verbose

Integer, control printing to the console/terminal.

  • 0: print nothing.

  • 1: print targets, retries, and failures.

  • 2: also show a spinner when preprocessing tasks are underway.

jobs

Number of jobs/workers for parallel processing.

targets_only

Logical, whether to output information only on the targets in your workflow plan data frame. If targets_only is FALSE, the output will include the hashes of both targets and imports.

Value

Data frame of the hash keys of the targets and imports in the cache

Details

A hash is a fingerprint of an object's value. Together, the hash keys of all your targets and imports represent the state of your project. Use drake_cache_log() to generate a data frame with the hash keys of all the targets and imports stored in your cache. This function is particularly useful if you are storing your drake project in a version control repository. The cache has a lot of tiny files, so you should not put it under version control. Instead, save the output of drake_cache_log() as a text file after each make(), and put the text file under version control. That way, you have a changelog of your project's results. See the examples below for details. Depending on your project's history, the targets may be different than the ones in your workflow plan data frame. Also, the keys depend on the hash algorithm of your cache. To define your own hash algorithm, you can create your own storr cache and give it a hash algorithm (e.g. storr_rds(hash_algorithm = "murmur32"))

See Also

cached(), drake_cache()

Examples

Run this code
# NOT RUN {
isolate_example("Quarantine side effects.", {
if (suppressWarnings(require("knitr"))) {
# Load drake's canonical example.
load_mtcars_example() # Get the code with drake_example()
# Run the project, build all the targets.
make(my_plan)
# Get a data frame of all the hash keys.
# If you want a changelog, be sure to do this after every make().
cache_log <- drake_cache_log()
head(cache_log)
# Suppress partial arg match warnings.
suppressWarnings(
  # Save the hash log as a flat text file.
  write.table(
    x = cache_log,
    file = "drake_cache.log",
    quote = FALSE,
    row.names = FALSE
  )
)
# At this point, put drake_cache.log under version control
# (e.g. with 'git add drake_cache.log') alongside your code.
# Now, every time you run your project, your commit history
# of hash_lot.txt is a changelog of the project's results.
# It shows which targets and imports changed on every commit.
# It is extremely difficult to track your results this way
# by putting the raw '.drake/' cache itself under version control.
}
})
# }

Run the code above in your browser using DataLab