Two new classes in EdSurvey are described in this section: the edsurvey.data.frame
and light.edsurvey.data.frame
. The edsurvey.data.frame
class stores metadata about survey data and data is stored on the
disk (via the LaF
package), allowing GB of data to be used easily on a machine otherwise
inappropriate for manipulating large datasets.
The light.edsurvey.data.frame
is typically generated
by the getData
function and stores the data in a
data.frame
.
Both of the classes use attributes to manage metadata and allow
for correct statistics to be used in calculating results; the
getAttributes
acts as an accessor for these attributes, while
setAttributes
acts as a mutator for the attributes.
As a convenience, edsurvey.data.frame
implements the $
function to extract a variable.
edsurvey.data.frame(userConditions, defaultConditions, data, dataSch,
dataTch, dataListMeta, weights, pvvars, subject, year, assessmentCode,
dataType, gradeLevel, achievementLevels, omittedLevels, fileFormat,
fileFormatSchool, fileFormatTeacher, survey, country, psuVar, stratumVar,
jkSumMultiplier, recodes = NULL, validateFactorLabels = FALSE,
forceLower = TRUE)# S3 method for edsurvey.data.frame
$(x, i)
getAttributes(data, attribute = NULL)
setAttributes(data, attribute, value)
a list of user conditions that includes subsetting or recoding conditions
a list of default conditions that are often set for each survey
in the edsurvey.data.frame
constructor, this is an LaF
object
that connects to the main data, often at the student level. For
getAttributes
and setAttributes
, this argument is
an edsurvey.data.frame
or light.edusrvey.data.frame
.
an LaF
object that connects to the school-level data (optional)
an LaF
object that connects to the teacher-level data (optional)
a list that stores variables that can be used to link school-level and teacher-level data to the main data. See Details.
a list that stores information regarding weight variables. See Details.
a list that stores information regarding plausible values. See Details.
a character that indicates subject domain of the given data
a character or numeric that indicates year of the given data
a character that indicates the code of the assessment. Can be “National” or “International”.
a character that indicates the unit level of the main data. Examples include dQuoteStudent, “teacher”, “school”, “Adult Data”.
a character that indicates grade level of the given data
a list of achievement level categories and cutpoints
a list of default omitted levels for the given data
a data.frame
that stores codebook information for the main data. See Details.
a data.frame
that stores codebook information for the school-level data (if exists). See Details.
a data.frame
that stores codebook information for the teacher-level data (if exists). See Details.
a character that indicates the name of the survey
a character that indicates the country of the given data
a character that indicates the PSU sampling unit variable. Ignored when weights have psuVar defined.
a character indicates the stratum variable. Ignored with weights have stratumVar defined.
a numeric value of the jackknife coefficient (used in calculating the jackknife replication estimation)
a list of variable recodes of the given data
a Boolean that indicates whether the getData
function needs to validate factor variables
a Boolean; when set to TRUE
, will automatically lowercase variable names
an edsurvey.data.frame
a character, the column name to extract
a character, name of an attribute to get or set
new value of the given attribute
An object of class edsurvey.data.frame
with the following elements:
Elements that store data connections and data codebooks
an LaF
object containing a connection to the student dataset on disk
an LaF
object containing a connection to the school dataset on disk if exists. If not, will be NULL
.
an LaF
object containing a connection to the teacher dataset on disk if exists. If not, will be NULL
.
a data.frame
containing the format of the file in the data
parameter. See Details.
a data.frame
containing the format of the file in the dataSch
parameter. See Details.
a data.frame
containing the format of the file in the dataTch
parameter. See Details.
Elements that store sample design and default subsetting information of the given survey data
a list containing all user conditions, set using the subset.edsurvey.data.frame
method
the default subsample conditions
a list containing the weights. See Details.
a character that indicates the default strata identification variable name in the data. Often used in Taylor series estimation.
a character that indicates the default PSU (sampling unit) identification variable name in the data. Often used in Taylor series estimation.
a list containing the plausible values. See Details.
default achievement cutoff scores and names. See Details.
the levels of the factor variables that will be omitted from the edsurvey.data.frame
Elements that store descriptive information of the survey
the type of survey data
the subject of the data
the year of assessment
the assessment code
the type of data (e.g., “student” or “school”)
the grade of the dataset contained in the edsurvey.data.frame
edsurvey.data.frame
is an object that stores connection to data on the
disk along with important survey sample design information.
edsurvey.data.frame.list
is a list of edsurvey.data.frame
objects. It is often used in trend or cross-regional analysis in the
gap
function. See edsurvey.data.frame.list
for
more information on how to create an edsurvey.data.frame.list
. Users
can also refer to the vignette titled
Using EdSurvey for Trend Analysis
for examples.
Besides edsurvey.data.frame
class, EdSurvey
package also
implements light.edsurvey.data.frame
class, which can be used by both
EdSurvey and non-EdSurvey functions. More particularly, \
light.edsurvey.data.frame
is a data.frame
that also has basic
survey and sample design information (i.e., plausible values and weights), which
will be used for variance estimation in analytical functions. Because it is
also a base R data.frame
, users can also apply base R functions for
data manipulation.
vignette titled
getData
for more examples.
Many functions will remove attributes from a data frame, such as
a light.edsurvey.data.frame
, and the
rebindAttributes
function can add them back.
Users can get a light.edsurvey.data.frame
object by using
getData
method with addAttributes=TRUE
.
Extracting a column from an edsurvey.data.frame
Users can extract a column from an edsurvey.data.frame
object using $
or []
like a normal data frame.
Extracting and updating attributes of an object of class edsurvey.data.frame
or light.edsurvey.data.frame
Users can use getAttributes
method to extract any of the attributes of
an edsurvey.data.frame
or light.edsurvey.data.frame
. Note that
a light.edsurvey.data.frame
will not have attributes related to data connection
because data has already been read in memory.
If users want to update an attribute (i.e., omittedLevels
), users can
use the setAttributes
method.
The dataListMeta
argument is a list with an element student
that is also a list.
Each element of the student
list is another dataset name
(teacher
or school
) that indicates the variables used to link
the student file to those files. The merge variables are
shown with a caret character (“^
”) between them. The first variable
is the name of the merge variable on the student file, and the second variable
is the name of the merge variable on the school file. When multiple variables
are used to merge, a semicolon can separate pairs of variables; e.g.,
student=list(school="varA^varY;varB^varZ")
would indicate that the student file
can be merged to the school file using the varA
and varB
variables from the student file to merge it to varY
and varZ
,
respectively, on the school file.
The weight
list has an element named after each weight variable name
that is a list with elements jkbase
and jksuffixes
. The
jkbase
variable is a single character indicating the jackknife replicate
weight base name, while jksuffixes
is a vector with one element for each
jackknife replicate weight. When the two are pasted together, they should form
the complete set of jackknife replicate weights. The weights
argument
can also have an attribute that is the default weight. If the primary sampling
unit and stratum variables change by weight, they can also be defined on the weight
list as psuVar
and stratumVar
. When this option is used, it overrides
the psuVar
and stratumVar
on the edsurvey.data.frame
,
which can be left blank. A weight must define only one of psuVar
and stratumVar
.
The pvvars
list has an element for each subject or subscale score
that has plausible values. Each element is a list with a varnames
element that indicates the column names of the plausible values and an
achievementLevel
argument that is a named vector of the achievement
level cut points.
The fileFormat
arguments are data frames that have the following columns:
name of the variable. Changed to lower case by the
constructor if forceLower=TRUE
.
start column of the data
end column of the data
number of characters wide the data is
power of 10 that the data should be divided by
brief description of the variable
an caret (“^
”) delimited list of label
value pairs, each of which is equal delimited (“=
”)
as code=value
. For example, the string “1=true^2=false^3=invalid”
would result in values of 1 being labeled “true”, values
2 being labeled “false”, and values of 3 being labeled “invalid”.
one of “character”, “numeric”, or “integer”
Boolean set to TRUE
to indicate that the column is
a full sample (not replicate) weight column
# NOT RUN {
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package="NAEPprimer"))
# Run a base R function on a column of edsurvey.data.frame
table(sdf$dsex)
# Extract default omitted levels of NAEP primer data
getAttributes(sdf, "omittedLevels") #[1] "Multiple" NA "Omitted"
# Update default omitted levels of NAEP primer data
sdf <- setAttributes(sdf, "omittedLevels", c("Multiple", "Omitted", NA, "(Missing)"))
getAttributes(sdf, "omittedLevels") #[1] "Multiple" "Omitted" NA "(Missing)"
# }
Run the code above in your browser using DataLab