Learn R Programming

EdSurvey (version 2.2.3)

getData: Read Data to a Data Frame

Description

Reads in selected columns to a data.frame or a light.edsurvey.data.frame. On an edsurvey.data.frame, the data are stored on disk.

Usage

getData(data, varnames = NULL, drop = FALSE, dropUnusedLevels = TRUE,
  omittedLevels = TRUE, defaultConditions = TRUE, formula = NULL,
  recode = NULL, includeNaLabel = FALSE, addAttributes = FALSE,
  returnJKreplicates = TRUE)

Arguments

data

an edsurvey.data.frame or a light.edsurvey.data.frame

varnames

a character vector of variable names that will be returned. When both varnames and a formula are specified, variables associated with both are returned. Set to NULL by default.

drop

a logical value. When set to the default value of FALSE, when a single column is returned, it is still represented as a data.frame and is not converted to a vector.

dropUnusedLevels

a logical value. When set to the default value of TRUE, drops unused levels of all factor variables.

omittedLevels

a logical value. When set to the default value of TRUE, drops those levels of all factor variables that are specified in an edsurvey.data.frame. Use print on an edsurvey.data.frame to see the omitted levels.

defaultConditions

a logical value. When set to the default value of TRUE, uses the default conditions stored in an edsurvey.data.frame to subset the data. Use print on an edsurvey.data.frame to see the default conditions.

formula

a formula. When included, getData returns data associated with all variables of the formula. When both varnames and a formula are specified, the variables associated with both are returned. Set to NULL by default.

recode

a list of lists to recode variables. Defaults to NULL. Can be set as recode = list(var1 = list(from = c("a","b","c"), to = "d")). See Examples.

includeNaLabel

a logical value to indicate if NA (missing) values are returned as literal NA values or as factor levels coded as NA.

addAttributes

a logical value set to TRUE to get a data.frame that can be used in calls to other functions that usually would take an edsurvey.data.frame. This data.frame is also called light.edsurvey.data.frame. See Details section in edsurvey.data.frame for more information on light.edsurvey.data.frame.

returnJKreplicates

a logical value indicating if JK replicate weights should be returned. Defaults to TRUE.

Value

When addAttributes is FALSE, returns a data.frame containing data associated with requested variables. When addAttributes is TRUE, returns a light.edsurvey.data.frame.

Details

By default, an edsurvey.data.frame does not have data read into memory until getData is called and returns a data frame. This structure allows EdSurvey to have a minimal memory footprint. To keep the footprint small, you need to limit varnames to just the necessary variables.

When getData is called, it returns a data.frame. When the addAttributes argument is set to TRUE, that data.frame has several attributes added to make it usable by the functions in the EdSurvey package (e.g., lm.sdf), and the class is a light.edsurvey.data.frame.

Note that if both formula and varnames are populated, the variables on both will be included.

See the vignette titled getData for long-form documentation on this function.

See Also

subset.edsurvey.data.frame for how to remove rows from the output

Examples

Run this code
# NOT RUN {
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# get two variables, without weights
df <- getData(data=sdf, varnames=c("dsex", "b017451"))
table(df)

# example of using recode
df2 <- getData(data=sdf, varnames=c("dsex", "t088301"),
               recode=list(t088301=list(from=c("Yes, available","Yes, I have access"),
                                        to=c("Yes")),
                           t088301=list(from=c("No, have no access"),
                                        to=c("No"))))
table(df2)

# When readNAEP is called on a data file, it appends a default 
# condition to the edsurvey.data.frame. You can see these conditions
# by printing the sdf
sdf

# As per the default condition specified, getData restricts the data to only
# Reporting Sample. This behavior can be changed as follows:
df2 <- getData(data=sdf, varnames=c("dsex", "b017451"), defaultConditions = FALSE)
table(df2)

# Similarly, the default behavior of omitting certain levels specified
# in the edsurvey.data.frame can be changed as follows:
df2 <- getData(data=sdf, varnames=c("dsex", "b017451"), omittedLevels = FALSE)
table(df2)

# the variable "c052601" is from the school-level data file; merging is handled automatically
# returns a light.edsurvey.data.frame using addAttributes=TRUE argument
gddat <- getData(data=sdf, 
                 varnames=c("composite", "dsex", "b017451","c052601"),
                 addAttributes = TRUE)
class(gddat)
# look at the first few lines
head(gddat)
# }

Run the code above in your browser using DataLab