Learn R Programming

JWileymisc (version 1.4.1)

hashDataset: Create a character vector or file hash of a dataset and each variable

Description

Given a data.frame or data.table, create a character vector MD5 hash of the overall dataset and each variable. The goal of this is to create a secure vector / text file that can be tracked using version control (e.g., GitHub) without requiring commiting sensitive datasets. The tracking will make it possible to evaluate whether two datasets are the same, such as when sending data or when datasets may change over time to know which variable(s) changed, if any.

Usage

hashDataset(x, file)

Value

A (possibly invisible) character vector. Also (optionally) a text file written version of the character string.

Arguments

x

A data.frame or data.table to be hashed.

file

An optional character string. If given, assumed to be the path/name of a file to write the character string hash out to, for convenience. When non missing, the character vector is returned invisibly and a file written. When missing (default), the character vector is returned directly.

Examples

Run this code

hashDataset(mtcars)

## if a file is specified it will write the results to the text file
## nicely formatted, along these lines

cat(hashDataset(cars), sep = "\n")

Run the code above in your browser using DataLab