data_read: Read (import) data files from various sources

Description

This functions imports data from various file types. It is a small wrapper around haven::read_spss(), haven::read_stata(), haven::read_sas(), readxl::read_excel() and data.table::fread() resp. readr::read_delim() (the latter if package data.table is not installed). Thus, supported file types for importing data are data files from SPSS, SAS or Stata, Excel files or text files (like '.csv' files). All non-supported file types are passed to rio::import().

Usage

data_read(path, path_catalog = NULL, encoding = NULL, verbose = TRUE, ...)

Value

A data frame.

Arguments

path: Character string, the file path to the data file.
path_catalog: Character string, path to the catalog file. Only relevant for SAS data files.
encoding: The character encoding used for the file. Usually not needed.
verbose: Toggle warnings and messages.
...: Arguments passed to the related read_*() function.

Supported file types

data_read() is a wrapper around the haven, data.table, readr readxl and rio packages. Currently supported file types are .txt, .csv, .xls, .xlsx, .sav, .por, .dta and .sas (and related files). All other file types are passed to rio::import().

Compressed files (zip) and URLs

data_read() can also read the above mentioned files from URLs or from inside zip-compressed files. Thus, path can also be a URL to a file like "http://www.url.com/file.csv". When path points to a zip-compressed file, and there are multiple files inside the zip-archive, then the first supported file is extracted and loaded.

General behaviour

data_read() detects the appropriate read_*() function based on the file-extension of the data file. Thus, in most cases it should be enough to only specify the path argument. However, if more control is needed, all arguments in ... are passed down to the related read_*() function.

Differences to other packages that read foreign data formats

data_read() is most comparable to rio::import(). For data files from SPSS, SAS or Stata, which support labelled data, variables are converted into their most appropriate type. The major difference to rio::import() is that data_read() automatically converts variables into factors, unless the variables are only partially labelled, in which case variables are converted to numerics. Character vectors are preserved. Hence, variables, where all values are labelled, will be converted into factors, where imported value labels will be set as factor levels. Else, if a variable has no value labels or less value labels than values, the variable is either converted into numeric or character vector. Value labels are then preserved as "labels" attribute.