Learn R Programming

datacheck (version 1.2.2)

datadict.profile: Create a data quality profile (main function)

Description

Tests a database against a set of rules (one per line) in a 'data dictionary file'. Rules will be summarized in the returned object: the variable/column, the rule, any comment after the rule, the execution success, the total number of rule violations if any, the record id for any non-compliant records. Rules that can't be executed for any reason will be marked as 'failed'.

Usage

datadict.profile(atable, adictionary)

Arguments

atable
a data.frame
adictionary
a list of rules in rule format

Value

a data.profile object or NA

Details

The rule file must be a simple list of one rule per line. Functions can be used but since they are applied on a 'vector' (the column) they should be used within a sapply statement (see example rule file). Rules may be separated by empty lines or lines with comment character #. Comments after a rule within the same line will be used for display in the summary table and should be short. A rule must only test one variable and one aspect at a time.

See Also

Other datadict: as.rules; as_rules; datadict_profile; has.ruleErrors; has_rule_errors; is.datadict.profile; is_datadict_profile; prep4rep; read.rules; read_rules

Examples

Run this code
library(stringr)
# Get example data files
atable <- system.file("examples/db.csv", package = "datacheck")
arule <- system.file("examples/rules1.R", package = "datacheck")
aloctn <- system.file("examples/location.csv", package = "datacheck")  # for use in is.oneOf

ctable <- basename(atable)
crule <- basename(arule)
cloctn <- basename(aloctn)

cwd <- tempdir()
owd <- getwd()
setwd(cwd)

file.copy(atable, ctable)
file.copy(arule, crule)
file.copy(aloctn, cloctn)

at <- read.csv(ctable, stringsAsFactors = FALSE)
ad <- read_rules(crule)

db <- datadict_profile(at, ad)

is_datadict_profile(db) == TRUE

db

setwd(owd)

Run the code above in your browser using DataLab