Two new classes in EdSurvey are described in this section: the edsurvey.data.frame
             and light.edsurvey.data.frame. The edsurvey.data.frame
             class stores metadata about survey data, and data are stored on the
             disk (via the LaF package), allowing gigabytes of data to be used easily on a machine otherwise
             inappropriate for manipulating large datasets.
             The light.edsurvey.data.frame is typically generated
             by the getData function and stores the data in a
             data.frame.
             Both classes use attributes to manage metadata and allow
             for correct statistics to be used in calculating results; the
             getAttributes acts as an accessor for these attributes, whereas
             setAttributes acts as a mutator for the attributes.
             As a convenience, edsurvey.data.frame
             implements the $ function to extract a variable.
edsurvey.data.frame(
  userConditions,
  defaultConditions,
  dataList = list(),
  weights,
  pvvars,
  subject,
  year,
  assessmentCode,
  dataType,
  gradeLevel,
  achievementLevels,
  omittedLevels,
  survey,
  country,
  psuVar,
  stratumVar,
  jkSumMultiplier,
  recodes = NULL,
  validateFactorLabels = FALSE,
  forceLower = TRUE,
  reqDecimalConversion = TRUE
)# S3 method for edsurvey.data.frame
$(x, i)
# S3 method for edsurvey.data.frame
$(x, name) <- value
getAttributes(data, attribute = NULL)
setAttributes(data, attribute, value)
getPSUVar(data, weightVar = NULL)
getStratumVar(data, weightVar = NULL)
a list of user conditions that includes subsetting or recoding conditions
a list of default conditions that often are set for each survey
a list of dataListItem objects to model the data structure of the survey
a list that stores information regarding weight variables. See Details.
a list that stores information regarding plausible values. See Details.
a character that indicates the subject domain of the given data
a character or numeric that indicates the year of the given data
a character that indicates the code of the assessment.
Can be National or International.
a character that indicates the unit level of the main data.
Examples include Student, teacher, school,
Adult Data.
a character that indicates the grade level of the given data
a list of achievement-level categories and cutpoints
a list of default omitted levels for the given data
a character that indicates the name of the survey
a character that indicates the country of the given data
a character that indicates the PSU sampling unit variable. Ignored when weights have psuVar defined.
a character that indicates the stratum variable. Ignored when weights have stratumVar defined.
a numeric value of the jackknife coefficient (used in calculating the jackknife replication estimation)
a list of variable recodes of the given data
a Boolean that indicates whether the getData function needs to validate factor variables
a Boolean; when set to TRUE, will automatically lowercase variable names
a Boolean; when set to TRUE, a getData call will multiply the raw file value by a decimal multiplier
an edsurvey.data.frame
a character, the column name to extract
a character vector of the column to edit
outside of the assignment context, new value of the given attribute
an edsurvey.data.frame
a character, name of an attribute to get or set
a character indicating the full sample weights
An object of class edsurvey.data.frame with the following elements:
Elements that store data connections and data codebooks
dataLista list object containing the surveys dataListItem objects
userConditionsa list containing all user conditions, set using the subset.edsurvey.data.frame method
defaultConditionsthe default subsample conditions
weightsa list containing the weights. See Details.
stratumVara character that indicates the default strata identification variable name in the data. Often used in Taylor series estimation.
psuVara character that indicates the default PSU (sampling unit) identification variable name in the data. Often used in Taylor series estimation.
pvvarsa list containing the plausible values. See Details.
achievementLevelsdefault achievement cutoff scores and names. See Details.
omittedLevelsthe levels of the factor variables that will be omitted from the edsurvey.data.frame
surveythe type of survey data
subjectthe subject of the data
yearthe year of assessment
assessmentCodethe assessment code
dataTypethe type of data (e.g., student or school)
gradeLevelthe grade of the dataset contained in the edsurvey.data.frame
edsurvey.data.frame is an object that stores connection to data on the
disk along with important survey sample design information.
edsurvey.data.frame.list is a list of edsurvey.data.frame
objects. It often is used in trend or cross-regional analysis in the
gap function. See edsurvey.data.frame.list for
more information on how to create an edsurvey.data.frame.list. Users
also can refer to the vignette titled
Using EdSurvey for Trend Analysis
for examples.
Besides edsurvey.data.frame class, the EdSurvey package also
implements the light.edsurvey.data.frame class, which can be used by both
EdSurvey and non-EdSurvey functions. More particularly,
light.edsurvey.data.frame is a data.frame that has basic
survey and sample design information (i.e., plausible values and weights), which
will be used for variance estimation in analytical functions. Because it
also is a base R data.frame, users can apply base R functions for
data manipulation.
See the vignette titled
Using the getData Function in EdSurvey
for more examples.
Many functions will remove attributes from a data frame, such as
a light.edsurvey.data.frame, and the
rebindAttributes function can add them back.
Users can get a light.edsurvey.data.frame object by using the
getData method with addAttributes=TRUE.
Extracting a column from an edsurvey.data.frame
Users can extract a column from an edsurvey.data.frame object using $ or [] like a normal data frame.
Extracting and updating attributes of an object of class edsurvey.data.frame or light.edsurvey.data.frame
Users can use the getAttributes method to extract any attribute of
an edsurvey.data.frame or a light.edsurvey.data.frame. 
A light.edsurvey.data.frame will not have attributes related to data connection
because data have already been read in memory.
If users want to update an attribute (i.e., omittedLevels), they can
use the setAttributes method.
The weight list has an element named after each weight variable name
that is a list with elements jkbase and jksuffixes. The
jkbase variable is a single character indicating the jackknife replicate
weight base name, whereas jksuffixes is a vector with one element for each
jackknife replicate weight. When the two are pasted together, they should form
the complete set of the jackknife replicate weights. The weights argument
also can have an attribute that is the default weight. If the primary sampling
unit and stratum variables change by weight, they also can be defined on the weight
list as psuVar and stratumVar. When this option is used, it overrides
the psuVar and stratumVar on the edsurvey.data.frame,
which can be left blank. A weight must define only one of psuVar
and stratumVar.
The pvvars list has an element for each subject or subscale score
that has plausible values. Each element is a list with a varnames
element that indicates the column names of the plausible values and an
achievementLevel argument that is a named vector of the 
achievement-level cutpoints.
# NOT RUN {
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package="NAEPprimer"))
# run a base R function on a column of edsurvey.data.frame
table(sdf$dsex)
# assignment
table(sdf$b013801)
sdf$books <- ifelse(sdf$b013801 %in% c("0-10", "11-25"), "0-25 books", "26+ books")
table(sdf$books, sdf$b013801)
# extract default omitted levels of NAEP primer data
getAttributes(sdf, "omittedLevels") #[1] "Multiple" NA         "Omitted"
# update default omitted levels of NAEP primer data
sdf <- setAttributes(sdf, "omittedLevels", c("Multiple", "Omitted", NA, "(Missing)"))
getAttributes(sdf, "omittedLevels") #[1] "Multiple"  "Omitted"   NA          "(Missing)"
# }
Run the code above in your browser using DataLab