A concise overview of the validate
syntax.
The basic rule is that an R-statement that evaluates to a logical
is a
validating statement. This is established by static code inspection when
validator
reads a (set of) user-defined validation rule(s).
All basic comparisons, including >, >=, ==, !=, <=, <
, %in%
are validating statements. When executing a validating statement, the
%in%
operator is replaced with %vin%
.
Unary logical operators `!
', all()
and any
define
validating statements. Binary logical operations including &, &&, |,
||
, are validating when P
and Q
in e.g. P & Q
are
validating. (note that the short-circuits &&
and &
onnly return
the first logical value, in cases where for P && Q
, P
and/or
Q
are vectors. Binary logical implication \(P\Rightarrow Q\) (P
implies Q) is implemented as if ( P ) Q
. The latter is interpreted as
!(P) | Q
.
Any function starting with is.
(e.g. is.numeric
) is a
validating expression.
grepl
is a validating expression.
Armstrong's functional dependencies, of the form \(A + B \to C + D\) are
represented using the ~
, e.g. A + B ~ C + D
. For example
postcode ~ city
means, that when two records have the same value for
postcode
, they must have the same value for city
.
Metadata such as numer of rows, columns, column names and so on can be
tested by referencing the whole data set with the '.
'. For example,
the rule nrow(.) == 15
checks whether there are 15 rows in the
dataset at hand.
These can be tested in principle with the 'dot' syntax. However, there are
some convenience functions: is_complete
, all_complete
is_unique
, all_unique
.
The operator `:=
' can be used to set up local variables (during, for
example, validation) to save time (the rhs of an assignment is computed only
once) or to make your validation code more maintainable. Assignments work more
or less like common R assignments: they are only valid for statements coming
after the assignment and they may be overwritten. The result of computing the
rhs is not part of a confront
ation with data.
Often the same constraints/rules are valid for groups of variables.
validate
allows for compact notation. Variable groups can be used
in-statement or by defining them with the :=
operator.
validator( var_group(a,b) > 0 )
is equivalent to
validator(G := var_group(a,b), G > 0)
is equivalent to
validator(a>0,b>0)
.
Using two groups results in the cartesian product of checks. So the statement
validator( f=var_group(c,d), g=var_group(a,b), g > f)
is equivalent to
validator(a > c, b > c, a > d, b > d)
Please see the cookbook on how to read rules from and write rules to file:
vignette("cookbook",package="validate")