Learn R Programming

validate (version 1.1.5)

contains_exactly: Check records using a predifined table of (im)possible values

Description

Given a set of keys or key combinations, check whether all thos combinations occur, or check that they do not occur. Supports globbing and regular expressions.

Usage

contains_exactly(keys, by = NULL, allow_duplicates = FALSE)

contains_at_least(keys, by = NULL)

contains_at_most(keys, by = NULL)

does_not_contain(keys)

Value

For contains_exactly, contains_at_least, and contains_at_most a logical vector with one entry for each record in the dataset. Any group not conforming to the test keys will have FALSE assigned to each record in the group (see examples).

For contains_at_least: a logical vector equal to the number of records under scrutiny. It is FALSE where key combinations do not match any value in keys.

For does_not_contain: a logical vector with size equal to the number of records under scrutiny. It is FALSE where key combinations do not match any value in keys.

Arguments

keys

A data frame or bare (unquoted) name of a data frame passed as a reference to confront (see examples). The column names of keys must also occurr in the columns of the data under scrutiny.

by

A bare (unquoted) variable or list of variable names that occur in the data under scrutiny. The data will be split into groups according to these variables and the check is performed on each group.

allow_duplicates

[logical] toggle whether key combinations can occur more than once.

Globbing

Globbing is a simple method of defining string patterns where the asterisks (*) is used a wildcard. For example, the globbing pattern "abc*" stands for any string starting with "abc".

Details

contains_exactlydataset contains exactly the key set, no more, no less.
contains_at_leastdataset contains at least the given keys.
contains_at_mostall keys in the data set are contained the given keys.
does_not_containThe keys are interpreted as forbidden key combinations.

See Also

Other cross-record-helpers: do_by(), exists_any(), hb(), hierarchy(), is_complete(), is_linear_sequence(), is_unique()

Examples

Run this code

## Check that data is present for all quarters in 2018-2019
dat <- data.frame(
    year    = rep(c("2018","2019"),each=4)
  , quarter = rep(sprintf("Q%d",1:4), 2)
  , value   = sample(20:50,8)
)

# Method 1: creating a data frame in-place (only for simple cases)
rule <- validator(contains_exactly(
           expand.grid(year=c("2018","2019"), quarter=c("Q1","Q2","Q3","Q4"))
          )
        )
out <- confront(dat, rule)
values(out)

# Method 2: pass the keyset to 'confront', and reference it in the rule.
# this scales to larger key sets but it needs a 'contract' between the
# rule definition and how 'confront' is called.

keyset <- expand.grid(year=c("2018","2019"), quarter=c("Q1","Q2","Q3","Q4"))
rule <- validator(contains_exactly(all_keys))
out <- confront(dat, rule, ref=list(all_keys = keyset))
values(out)

## Globbing (use * as a wildcard)

# transaction data 
transactions <- data.frame(
    sender   = c("S21", "X34", "S45","Z22")
  , receiver = c("FG0", "FG2", "DF1","KK2")
  , value    = sample(70:100,4)
)

# forbidden combinations: if the sender starts with "S", 
# the receiver can not start "FG"
forbidden <- data.frame(sender="S*",receiver = "FG*")

rule <- validator(does_not_contain(glob(forbidden_keys)))
out <- confront(transactions, rule, ref=list(forbidden_keys=forbidden))
values(out)


## Quick interactive testing
# use 'with':
with(transactions, does_not_contain(forbidden)) 



## Grouping 

# data in 'long' format
dat <- expand.grid(
  year = c("2018","2019")
  , quarter = c("Q1","Q2","Q3","Q4")
  , variable = c("import","export")
)
dat$value <- sample(50:100,nrow(dat))


periods <- expand.grid(
  year = c("2018","2019")
  , quarter = c("Q1","Q2","Q3","Q4")
)

rule <- validator(contains_exactly(all_periods, by=variable))

out <- confront(dat, rule, ref=list(all_periods=periods))
values(out)

# remove one  export record

dat1 <- dat[-15,]
out1 <- confront(dat1, rule, ref=list(all_periods=periods))
values(out1)
values(out1)

Run the code above in your browser using DataLab