data.set: Data Set Objects

Description

"data.set" objects are collections of "item" objects, with similar semantics as data frames. They are distinguished from data frames so that coercion by as.data.fame leads to a data frame that contains only vectors and factors. Nevertheless most methods for data frames are inherited by data sets, except for the method for the within generic function. For the within method for data sets, see the details section.

Thus data preparation using data sets retains all informations about item annotations, labels, missing values etc. While (mostly automatic) conversion of data sets into data frames makes the data amenable for the use of R's statistical functions.

dsView is a function that displays data sets in a similar manner as View displays data frames. (View works with data sets as well, but changes them first into data frames.)

Usage

data.set(...,row.names = NULL, check.rows = FALSE, check.names = TRUE,
    stringsAsFactors = FALSE, document = NULL)
as.data.set(x, row.names=NULL, ...)
# S4 method for list
as.data.set(x,row.names=NULL,...)
is.data.set(x)
# S3 method for data.set
as.data.frame(x, row.names = NULL, optional = FALSE, ...)
# S4 method for data.set
within(data, expr, ...)
dsView(x)
# S4 method for data.set
head(x,n=20,...)
# S4 method for data.set
tail(x,n=20,...)

Value

data.set and the within method for data sets returns a "data.set" object, is.data.set

returns a logical value, and as.data.frame returns a data frame.

Arguments

...: For the data.set function several vectors or items, for within further, ignored arguments.
row.names, check.rows, check.names, stringsAsFactors, optional: arguments as in data.frame or as.data.frame, respectively.
document: NULL or an optional character vector that contains documenation of the data.
x: for is.data.set(x), any object; for as.data.frame(x,...) and dsView(x) a "data.set" object.
data: a data set, that is, an object of class "data.set".
expr: an expression, or several expressions enclosed in curly braces.
n: integer; the number of rows to be shown by head or tail

Details

The as.data.frame method for data sets is just a copy of the method for list. Consequently, all items in the data set are coerced in accordance to their measurement setting, see as.vector,item-method and measurement.

The within method for data sets has the same effect as the within method for data frames, apart from two differences: all results of the computations are coerced into items if they have the appropriate length, otherwise, they are automatically dropped.

Currently only one method for the generic function as.data.set is defined: a method for "importer" objects.

Examples

Run this code

Data <- data.set(
          vote = sample(c(1,2,3,8,9,97,99),size=300,replace=TRUE),
          region = sample(c(rep(1,3),rep(2,2),3,99),size=300,replace=TRUE),
          income = exp(rnorm(300,sd=.7))*2000
          )

Data <- within(Data,{
  description(vote) <- "Vote intention"
  description(region) <- "Region of residence"
  description(income) <- "Household income"
  wording(vote) <- "If a general election would take place next tuesday,
                    the candidate of which party would you vote for?"
  wording(income) <- "All things taken into account, how much do all
                    household members earn in sum?"
  foreach(x=c(vote,region),{
    measurement(x) <- "nominal"
    })
  measurement(income) <- "ratio"
  labels(vote) <- c(
                    Conservatives         =  1,
                    Labour                =  2,
                    "Liberal Democrats"   =  3,
                    "Don't know"          =  8,
                    "Answer refused"      =  9,
                    "Not applicable"      = 97,
                    "Not asked in survey" = 99)
  labels(region) <- c(
                    England               =  1,
                    Scotland              =  2,
                    Wales                 =  3,
                    "Not applicable"      = 97,
                    "Not asked in survey" = 99)
  foreach(x=c(vote,region,income),{
    annotation(x)["Remark"] <- "This is not a real survey item, of course ..."
    })
  missing.values(vote) <- c(8,9,97,99)
  missing.values(region) <- c(97,99)

  # These to variables do not appear in the
  # the resulting data set, since they have the wrong length.
  junk1 <- 1:5
  junk2 <- matrix(5,4,4)
  
})
# Since data sets may be huge, only a
# part of them are 'show'n
Data

if (FALSE) {

# If we insist on seeing all, we can use 'print' instead
print(Data)
}

str(Data)
summary(Data)

if (FALSE) {
# If we want to 'View' a data set we can use 'dsView'
dsView(Data)
# Works also, but changes the data set into a data frame first:
View(Data)
}

Data[[1]]
Data[1,]
head(as.data.frame(Data))

EnglandData <- subset(Data,region == "England")
EnglandData

xtabs(~vote+region,data=Data)
xtabs(~vote+region,data=within(Data, vote <- include.missings(vote)))

Run the code above in your browser using DataLab