edsurvey.data.frame: EdSurvey Class Constructors

Description

Two new classes in EdSurvey are described in this section: the edsurvey.data.frame and light.edsurvey.data.frame. The edsurvey.data.frame class stores metadata about survey data and data is stored on the disk (via the LaF package), allowing GB of data to be used easily on a machine otherwise inappropriate for manipulating large datasets. The light.edsurvey.data.frame is typically generated by the getData function and stores the data in a data.frame. Both of the classes use attributes to manage metadata and allow for correct statistics to be used in calculating results; the getAttributes acts as an accessor for these attributes, while setAttributes acts as a mutator for the attributes. As a convenience, edsurvey.data.frame implements the $ function to extract a variable.

Usage

edsurvey.data.frame(userConditions, defaultConditions, data, dataSch,
  dataTch, dataListMeta, weights, pvvars, subject, year, assessmentCode,
  dataType, gradeLevel, achievementLevels, omittedLevels, fileFormat,
  fileFormatSchool, fileFormatTeacher, survey, country, psuVar, stratumVar,
  jkSumMultiplier, recodes = NULL, validateFactorLabels = FALSE,
  forceLower = TRUE)
# S3 method for edsurvey.data.frame
$(x, i)
getAttributes(data, attribute = NULL)
setAttributes(data, attribute, value)

Arguments

userConditions

a list of user conditions that includes subsetting or recoding conditions

defaultConditions

a list of default conditions that are often set for each survey

data

in the edsurvey.data.frame constructor, this is an LaF object that connects to the main data, often at the student level. For getAttributes and setAttributes, this argument is an edsurvey.data.frame or light.edusrvey.data.frame.

dataSch

an LaF object that connects to the school-level data (optional)

dataTch

an LaF object that connects to the teacher-level data (optional)

dataListMeta

a list that stores variables that can be used to link school-level and teacher-level data to the main data. See Details.

weights

a list that stores information regarding weight variables. See Details.

pvvars

a list that stores information regarding plausible values. See Details.

subject

a character that indicates subject domain of the given data

year

a character or numeric that indicates year of the given data

assessmentCode

a character that indicates the code of the assessment. Can be “National” or “International”.

dataType

a character that indicates the unit level of the main data. Examples include dQuoteStudent, “teacher”, “school”, “Adult Data”.

gradeLevel

a character that indicates grade level of the given data

achievementLevels

a list of achievement level categories and cutpoints

omittedLevels

a list of default omitted levels for the given data

fileFormat

a data.frame that stores codebook information for the main data. See Details.

fileFormatSchool

a data.frame that stores codebook information for the school-level data (if exists). See Details.

fileFormatTeacher

a data.frame that stores codebook information for the teacher-level data (if exists). See Details.

survey

a character that indicates the name of the survey

country

a character that indicates the country of the given data

psuVar

a character that indicates the PSU sampling unit variable. Ignored when weights have psuVar defined.

stratumVar

a character indicates the stratum variable. Ignored with weights have stratumVar defined.

jkSumMultiplier

a numeric value of the jackknife coefficient (used in calculating the jackknife replication estimation)

recodes

a list of variable recodes of the given data

validateFactorLabels

a Boolean that indicates whether the getData function needs to validate factor variables

forceLower

a Boolean; when set to TRUE, will automatically lowercase variable names

an edsurvey.data.frame

a character, the column name to extract

attribute

a character, name of an attribute to get or set

value

new value of the given attribute

Value

An object of class edsurvey.data.frame with the following elements:

Elements that store data connections and data codebooks

data: an LaF object containing a connection to the student dataset on disk
dataSch: an LaF object containing a connection to the school dataset on disk if exists. If not, will be NULL.
dataTch: an LaF object containing a connection to the teacher dataset on disk if exists. If not, will be NULL.
fileFormat: a data.frame containing the format of the file in the data parameter. See Details.
fileFormatSchool: a data.frame containing the format of the file in the dataSch parameter. See Details.
fileFormatTeacher: a data.frame containing the format of the file in the dataTch parameter. See Details.

Elements that store sample design and default subsetting information of the given survey data

userConditions: a list containing all user conditions, set using the subset.edsurvey.data.frame method
defaultConditions: the default subsample conditions
weights: a list containing the weights. See Details.
stratumVar: a character that indicates the default strata identification variable name in the data. Often used in Taylor series estimation.
psuVar: a character that indicates the default PSU (sampling unit) identification variable name in the data. Often used in Taylor series estimation.
pvvars: a list containing the plausible values. See Details.
achievementLevels: default achievement cutoff scores and names. See Details.
omittedLevels: the levels of the factor variables that will be omitted from the edsurvey.data.frame

Elements that store descriptive information of the survey

survey: the type of survey data
subject: the subject of the data
year: the year of assessment
assessmentCode: the assessment code
dataType: the type of data (e.g., “student” or “school”)
gradeLevel: the grade of the dataset contained in the edsurvey.data.frame

EdSurvey Classes

edsurvey.data.frame is an object that stores connection to data on the disk along with important survey sample design information.

edsurvey.data.frame.list is a list of edsurvey.data.frame objects. It is often used in trend or cross-regional analysis in the gap function. See edsurvey.data.frame.list for more information on how to create an edsurvey.data.frame.list. Users can also refer to the vignette titled Using EdSurvey for Trend Analysis for examples.

Besides edsurvey.data.frame class, EdSurvey package also implements light.edsurvey.data.frame class, which can be used by both EdSurvey and non-EdSurvey functions. More particularly, \ light.edsurvey.data.frame is a data.frame that also has basic survey and sample design information (i.e., plausible values and weights), which will be used for variance estimation in analytical functions. Because it is also a base R data.frame, users can also apply base R functions for data manipulation. vignette titled getData for more examples.

Many functions will remove attributes from a data frame, such as a light.edsurvey.data.frame, and the rebindAttributes function can add them back.

Users can get a light.edsurvey.data.frame object by using getData method with addAttributes=TRUE.

Basic Methods for EdSurvey Classes

Extracting a column from an edsurvey.data.frame

Users can extract a column from an edsurvey.data.frame object using $ or [] like a normal data frame.

Extracting and updating attributes of an object of class edsurvey.data.frame or light.edsurvey.data.frame

Users can use getAttributes method to extract any of the attributes of an edsurvey.data.frame or light.edsurvey.data.frame. Note that a light.edsurvey.data.frame will not have attributes related to data connection because data has already been read in memory.

If users want to update an attribute (i.e., omittedLevels), users can use the setAttributes method.

Details

The dataListMeta argument is a list with an element student that is also a list. Each element of the student list is another dataset name (teacher or school) that indicates the variables used to link the student file to those files. The merge variables are shown with a caret character (“^”) between them. The first variable is the name of the merge variable on the student file, and the second variable is the name of the merge variable on the school file. When multiple variables are used to merge, a semicolon can separate pairs of variables; e.g., student=list(school="varA^varY;varB^varZ") would indicate that the student file can be merged to the school file using the varA and varB variables from the student file to merge it to varY and varZ, respectively, on the school file.

The weight list has an element named after each weight variable name that is a list with elements jkbase and jksuffixes. The jkbase variable is a single character indicating the jackknife replicate weight base name, while jksuffixes is a vector with one element for each jackknife replicate weight. When the two are pasted together, they should form the complete set of jackknife replicate weights. The weights argument can also have an attribute that is the default weight. If the primary sampling unit and stratum variables change by weight, they can also be defined on the weight list as psuVar and stratumVar. When this option is used, it overrides the psuVar and stratumVar on the edsurvey.data.frame, which can be left blank. A weight must define only one of psuVar and stratumVar.

The pvvars list has an element for each subject or subscale score that has plausible values. Each element is a list with a varnames element that indicates the column names of the plausible values and an achievementLevel argument that is a named vector of the achievement level cut points.

The fileFormat arguments are data frames that have the following columns:

variableName: name of the variable. Changed to lower case by the constructor if forceLower=TRUE.
Start: start column of the data
End: end column of the data
Width: number of characters wide the data is
Decimal: power of 10 that the data should be divided by
Labels: brief description of the variable
labelValues: an caret (“^”) delimited list of label value pairs, each of which is equal delimited (“=”) as code=value. For example, the string “1=true^2=false^3=invalid” would result in values of 1 being labeled “true”, values 2 being labeled “false”, and values of 3 being labeled “invalid”.
dataType: one of “character”, “numeric”, or “integer”
Weights: Boolean set to TRUE to indicate that the column is a full sample (not replicate) weight column

Examples

Run this code

# NOT RUN {
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package="NAEPprimer"))

# Run a base R function on a column of edsurvey.data.frame
table(sdf$dsex)

# Extract default omitted levels of NAEP primer data
getAttributes(sdf, "omittedLevels") #[1] "Multiple" NA         "Omitted"

# Update default omitted levels of NAEP primer data
sdf <- setAttributes(sdf, "omittedLevels", c("Multiple", "Omitted", NA, "(Missing)"))
getAttributes(sdf, "omittedLevels") #[1] "Multiple"  "Omitted"   NA          "(Missing)"
# }

Run the code above in your browser using DataLab