data.set: Data Set Objects

Description

"data.set" objects are collections of "item" objects, with similar semantics as data frames. They are distinguished from data frames so that coercion by as.data.fame leads to a data frame that contains only vectors and factors. Nevertheless most methods for data frames are inherited by data sets, except for the method for the within generic function. For the within method for data sets, see the details section.

Thus data preparation using data sets retains all informations about item annotations, labels, missing values etc. While (mostly automatic) conversion of data sets into data frames makes the data amenable for the use of R's statistical functions.

Usage

data.set(...,row.names = NULL, check.rows = FALSE, check.names = TRUE,
    stringsAsFactors = default.stringsAsFactors(),
                 document = NULL)
as.data.set(x, row.names=NULL, optional=NULL, ...)
is.data.set(x)
## S3 method for class 'data.set':
as.data.frame(x, row.names = NULL, optional = FALSE, \dots)
## S3 method for class 'data.set':
within(data, expr, \dots)

Arguments

...

For the data.set function several vectors or items, for within further, ignored arguments.

row.names, check.rows, check.names, stringsAsFactors, optional

arguments as in data.frame or as.data.frame, respectively.

document

NULL or an optional character vector that contains documenation of the data.

for is.data.set(x), any object; for as.data.frame(x,...) a "data.set" object.

data

a data set, that is, an object of class "data.set".

expr

an expression, or several expressions enclosed in curly braces.

Value

data.set and the within method for data sets returns a "data.set" object, is.data.set returns a logical value, and as.data.frame returns a data frame.

Details

The as.data.frame method for data sets is just a copy of the method for list. Consequently, all items in the data set are coerced in accordance to their measurement setting, see item and measurement.

The within method for data sets has the same effect as the within method for data frames, apart from two differences: all results of the computations are coerced into items if they have the appropriate length, otherwise, they are automatically dropped.

Currently only one method for the generic function as.data.set is defined: a method for "importer" objects.

Examples

Run this code

Data <- data.set(
          vote = sample(c(1,2,3,8,9,97,99),size=300,replace=TRUE),
          region = sample(c(rep(1,3),rep(2,2),3,99),size=300,replace=TRUE),
          income = exp(rnorm(300,sd=.7))*2000
          )

Data <- within(Data,{
  description(vote) <- "Vote intention"
  description(region) <- "Region of residence"
  description(income) <- "Household income"
  wording(vote) <- "If a general election would take place next tuesday,
                    the candidate of which party would you vote for?"
  wording(income) <- "All things taken into account, how much do all
                    household members earn in sum?"
  foreach(x=c(vote,region),{
    measurement(x) <- "nominal"
    })
  measurement(income) <- "ratio"
  labels(vote) <- c(
                    Conservatives         =  1,
                    Labour                =  2,
                    "Liberal Democrats"   =  3,
                    "Don't know"          =  8,
                    "Answer refused"      =  9,
                    "Not applicable"      = 97,
                    "Not asked in survey" = 99)
  labels(region) <- c(
                    England               =  1,
                    Scotland              =  2,
                    Wales                 =  3,
                    "Not applicable"      = 97,
                    "Not asked in survey" = 99)
  foreach(x=c(vote,region,income),{
    annotation(x)["Remark"] <- "This is not a real survey item, of course ..."
    })
  missing.values(vote) <- c(8,9,97,99)
  missing.values(region) <- c(97,99)

  # These to variables do not appear in the
  # the resulting data set, since they have the wrong length.
  junk1 <- 1:5
  junk2 <- matrix(5,4,4)
  
})
# Since data sets may be huge, only a
# part of them are 'show'n
Data
# If we insist on seeing all, we can use 'print' instead
print(Data)
str(Data)
summary(Data)

Data[[1]]
Data[1,]
head(as.data.frame(Data))

EnglandData <- subset(Data,region == "England")
EnglandData

xtabs(~vote+region,data=Data)
xtabs(~vote+region,data=within(Data, vote <- include.missings(vote)))

Run the code above in your browser using DataLab