convert: Converts a dataset from one statistical software to another

Description

This function converts (or transfers) between R, Stata, SPSS, SAS, Excel and DDI XML files. Unlike the regular import / export functions from packages haven or rio, this function uses the DDI standard as an exchange platform and facilitates a consistent conversion of the missing values.

Usage

convert(
  from,
  to = NULL,
  declared = TRUE,
  chartonum = FALSE,
  recode = TRUE,
  encoding = "UTF-8",
  csv = NULL,
  ...
)

Value

An invisible R data frame, when the argument to is NULL.

Arguments

from: A path to a file, or a data.frame object
to: Character, the name of a software package or a path to a specific file
declared: Logical, return the resulting dataset as a declared object
chartonum: Logical, recode character categorical variables to numerical categorical variables
recode: Logical, recode missing values
encoding: The character encoding used to read a file
csv: Complex argument, see the Details section
...: Additional parameters passed to other functions, see the Details section

Author

Adrian Dusa

Details

When the argument to specifies a certain statistical package ("R", "Stata", "SPSS", "SAS", "XPT") or "Excel", the name of the destination file will be identical to the one in the argument from, with an automatically added software specific extension.

SPSS portable file (with the extension ".por") can only be read, but not written.

The argument to can also be specified as a path to a specific file, in which case the software package is determined from its file extension. The following extentions are currently recognized: .xml for DDI, .rds for R, .dta for Stata, .sav for SPSS, .xpt for SAS, and .xlsx for Excel.

Additional parameters can be specified via the three dots argument ..., that are passed to the respective functions from packages haven and readxl. For instance the function write_dta() has an additional argument called version when writing a Stata file.

The most important argument to consider is called user_na, part of the function read_sav(). Defaulted to FALSE in package haven, in package DDIwR it is used as having the value of TRUE, and it can be deactivated by explicitly specifying user_na = FALSE in function convert().

The same three dots argument is used to pass additional parameters to other functions in this package, for instance exportCodebook() when writing to a DDI file. One of its argument embed (activated by default) can be used to control embedding the data in the XML file. Deactivating it will create a CSV file in the same directory, using the same file name as the XML file.

When converting from DDI, if the dataset is not embedded in the XML file, the CSV file is expected to be found in the same directory as the DDI Codebook, and it should have the same file name as the XML file. The path to the CSV file can be provided via the csv argument. Additional formal parameters of the function read.csv() can be passed via the same three dots ... argument. Alternatively, the csv argument can also be an R data frame.

When converting to DDI, if the argument embed is set to FALSE, users have the option to save the data in a separate CSV file (the default) or not to save the data at all, by setting csv to FALSE.

The DDI .xml file generates unique IDs for all variables, if not already present in the attributes. These IDs are useful for newer versions of the DDI Codebook, for referencing purposes.

The argument chartonum signals recoding character categorical variables, and employs the function recodeCharcat(). This only makes sense when recoding to Stata, which does not allow allocating labels for anything but integer variables.

If the argument to is left to NULL, the data is (invisibly) returned to the R enviroment. Conversion to R, either in the working space or as a data file, will result (by default) in a data frame containing declared labelled variables, as defined in package declared.

The current version reads and creates DDI Codebook version 2.6, with future versions to extend the functionality for DDI Lifecycle versions 3.x and link to the future package DDI4R for the UML model based version 4. It extends the standard DDI Codebook by offering the possibility to embed a serialized version of the R dataset into the XML file containing the Codebook, within a notes child of the fileDscr component. This type of generated codebook is unique to this package and automatically detected when converting to another statistical software. This will likely be replaced with a time insensitive text version.

Converting to SAS is experimental, and it relies on the same package haven that uses the ReadStat C library. The safest way to convert, which at the same time consistently converts the missing values, is to export the data to a CSV file and create a setup file produced by function setupfile() and run the commands manually.

Converting data from SAS is possible, however reading the metadata is also experimental (the current version of haven only partially imports the metadata). Either specify the path to the catalog file using the argument catalog_file from the function read_sas(), or have the catalog file in the same directory as the data set, with the same file name and the extension .sas7bcat

The argument recode controls how missing values are treated. If the input file has SPSS like numeric codes, they will be recoded to extended (a-z) missing types when converting to Stata or SAS. If the input has Stata like extended codes, they will be recoded to SPSS like numeric codes.

The character encoding is usually passed to the corresponding functions from package haven. It can be set to NULL to reset at the default in that package.

Converting to SPSS works with numerical and character labelled vectors, with or without labels. Date/Time variables are partially supported by package haven: either having such a variable with no labels and missing values, or if labels and missing values are declared the variable is automatically coerced to numeric, and users may have to make the proper settings in SPSS.

References

DDI - Data Documentation Initiative, see the DDI Alliance website.

Examples

Run this code

if (FALSE) {
# Assuming an SPSS file called test.sav is located in the working directory
# The following command imports the file into the R environment:
test <- convert("test.sav")

# The following command will extract the metadata in a DDI Codebook and
# produce a test.xml file in the same directory
convert("test.sav", to = "DDI")

# The data may be saved separately from the DDI file, using:
convert("test.sav", to = "DDI", embed = FALSE)

# To produce a Stata file:
convert("test.sav", to = "Stata")

# To produce an R file:
convert("test.sav", to = "R")

# To produce an Excel file:
convert("test.sav", to = "Excel")
}

Run the code above in your browser using DataLab