Learn R Programming

klausuR (version 0.12-14)

compare: Comparison of data sets

Description

The function compare will take two data.frames (or objects of class klausuR.answ-class) and compare them for equality. This is useful to check for typos before you calculate the results with klausur. If you need to type in the given answers by hand, errors easily occur, so it is advisable to input all data at least twice (perhaps by different persons) and check for differences with this function, which can then be corrected by looking up the original answer in the test.

Usage

compare(
  set1,
  set2,
  select = NULL,
  ignore = NULL,
  new.set = FALSE,
  rename = c(),
  trim = FALSE,
  id = list(No = "No", Name = c("FirstName", "Name"))
)

Arguments

set1, set2

The data sets to be compared. Can be two data.frames or objects of class klausuR.answ-class. If the latter, their slots id and items will be compared.

select

A vector with variables that should be compared, all others are omitted. At least all the values given in id are needed for the output! If NULL, all variables are examined.

ignore

A vector with variables that should be dropped from both sets. See also select.

new.set

Logical. If TRUE, a data.frame of the compared sets is returned, with all unequal cells set to NA.

rename

A named vector defining if variables in set1 and set2 need to be renamed into the klausuR name scheme. Accepts elements named No, Name, FirstName, MatrNo, Pseudonym and Form. The values of these elements represent the variable names of the input data.

trim

Logical. Indicates wheter whitespace in character variables should be trimmed.

id

A named list of character vectors to help identify differing cases in the input data. The element names of this list will become column names in the generated output table, their values define the respective column names of the input data. If a value has more than one element, they will be collapsed into one string for the output.

Value

If new.set=FALSE, a data.frame of the differences, if found (if not, just a message is returned). Otherwise returns a combined data.frame (see details).

Details

If you don't want to compare all variables but only a subset, you can use the select option (see examples below). But be careful with this, at least all the values given in id are needed to produce the output table.

If new.set=TRUE, a new data.frame will be returned, that is identical in both sets compared, but all dubious values will be replaced by NA.

See Also

klausur

Examples

Run this code
# NOT RUN {
data(antworten)

# create some differences
antworten2 <- antworten[-3, -7]
antworten2[4,6] <- NA
antworten2[8,8:10] <- antworten2[8,8:10] + 1

# default comparison
compare(antworten, antworten2)

# compare only variables 1 to 12
compare(antworten, antworten2, select=c(1:12))

# omit variables 3 to 8 and create a new set called "antworten.comp"
# from the results
antworten.comp <- compare(antworten, antworten2, select=-c(3:8), new.set=TRUE)
# }

Run the code above in your browser using DataLab