hashDataset: Create a character vector or file hash of a dataset and each variable
Description
Given a data.frame or data.table, create a character vector
MD5 hash of the overall dataset and each variable. The goal of this is to create
a secure vector / text file that can be tracked using version control
(e.g., GitHub) without requiring commiting sensitive datasets.
The tracking will make it possible to evaluate whether two datasets are the
same, such as when sending data or when datasets may change over time
to know which variable(s) changed, if any.
Usage
hashDataset(x, file)
Value
A (possibly invisible) character vector. Also (optionally) a text file
written version of the character string.
Arguments
x
A data.frame or data.table to be hashed.
file
An optional character string. If given, assumed to be the path/name of a
file to write the character string hash out to, for convenience. When
non missing, the character vector is returned invisibly and a file written.
When missing (default), the character vector is returned directly.
hashDataset(mtcars)
## if a file is specified it will write the results to the text file## nicely formatted, along these linescat(hashDataset(cars), sep = "\n")