Learn R Programming

EditImputeCont (version 1.1.6)

readData: Read Data and Edit Rules


add description about this function


readData(Y.original, ratio = NULL, range = NULL, balance = NULL, eps.bal = 0.6)



original dataset of (\(n\), \(p\)) dimension with missing and edit-failing values where \(n\) is the number of records and \(p\) is the number of variables.


ratio edit.


range restriction.


balance edit.


threshold for balance edit. Defaults to 0.6.


readData returns an EditIn.data object which consists of

  • Y.input: input dataset which replaces NA in Y.original with -999 and zero values with 0.01.

  • Edit.editmatrix: editmatrix of edit rules. It can be used for functions of editrules package.

  • Edit.matrix:matrix of edit rules.

  • Bound.LU: range restrictions. For variable X whose range is not specified in range, the default values are set as max( 0.1min(X), 1e-5 ) for the lower bound and 10max(x) for the upper bound.

  • ratio: ratio edits.

  • n.balance: number of balance edit, i.e., the row number of balance.

  • FaultyRecordID: record IDs of Y.orig whose values violate edit rules.


Y.original has \(n\) records and \(p\) variables. The variable names (column names) of Y.original are used to specify ratio edits.

The edit rules are either imported from text files or written by editrules package's syntax.

A balance edit is considered as two inequality constraints with the threshold, i.e., `A = B' is converted to `-eps.bal < A - B < eps.bal' before computation.

For accurate computation, nested balances are written as `total variable = sum of component variables'. For example, it is recommended to replace `X1 = X2 + X3' and `X3 = X4 + X5' with `X1 = X2 + X4 + X5' and `X3 = X4 + X5' so that `X3' does not appear both sides of the balance edits.


Hang J. Kim, Lawrence H. Cox, Alan F. Karr, Jerome P. Reiter and Quanli Wang (2015). "Simultaneous Edit-Imputation for Continuous Microdata", Journal of the American Statistical Association, DOI: 10.1080/01621459.2015.1040881. Edwin de Jonge and Mark van der Loo (2013). editrules: R package for parsing, applying, and manipulating data cleaning rules. R package version 2.7.2.

See Also



Run this code
### option 1. import from text files ###

D_obs1 = NestedEx$D.obs
Ratio1 = NestedEx$Ratio.edit
Range1 = NestedEx$Range.edit
Balance1 = NestedEx$Balance.edit

data1 = readData(Y.original=D_obs1, ratio=Ratio1, range=Range1, 
balance=Balance1, eps.bal=0.6)

# print(data1$Edit.editmatrix)
# plot(data1$Edit.editmatrix)	  ## function of 'editrules' package

### option 2. Using the syntax of R package 'editrules' ###

data(NestedEx) ; D_obs2 = NestedEx$D.obs

Ratio2 <- editmatrix(c(
 "X1 <= 1096.63*X5", "X1 <= 2980.96*X7", "X1 <= 148.41*X8", "X1 <= 7.39*X9",
 "X5 <= 0.37*X1", "X5 <= 54.60*X7", "X5 <= 2.72*X8", "X5 <= 0.14*X9",
 "X7 <= 0.14*X1", "X7 <= 1.65*X5", "X7 <= 7.39*X8", "X7 <= 0.05*X9",
 "X8 <= 1.65*X1", "X8 <= 54.60*X5", "X8 <= 403.43*X7", "X8 <= 1.65*X9",
 "X9 <= 20.09*X1", "X9 <= 403.43*X5", "X9 <= 13359.73*X7", "X9 <= 148.41*X8"
Range2 <- editmatrix(c(
 "X1 >= 2", "X2 <= 1.2e+06", "X11 >= 0.002", "X11 <= 1.2e+04"
Balance2 <- editmatrix(c(
 "X1 == X2+X3+X4", "X5 == X6 + 0.4*X10 + 0.6*X11", "X7 == 0.4*X10 + 0.6*X11"

data2 = readData(D_obs2, Ratio2, Range2, Balance2)

# print(data2$Edit.editmatrix)
  # Note: data2 is equivalent to data1
# plot(data2$Edit.editmatrix)	  ## function of 'editrules' package
# }

Run the code above in your browser using DataLab