The SAS transport format is a open format, as is required for submission of the data to the FDA.
read_xpt(
file,
col_select = NULL,
skip = 0,
n_max = Inf,
.name_repair = "unique"
)write_xpt(
data,
path,
version = 8,
name = NULL,
label = attr(data, "label"),
adjust_tz = TRUE
)
A tibble, data frame variant with nice defaults.
Variable labels are stored in the "label" attribute of each variable. It is not printed on the console, but the RStudio viewer will show it.
If a dataset label is defined, it will be stored in the "label" attribute of the tibble.
write_xpt()
returns the input data
invisibly.
Either a path to a file, a connection, or literal data (either a single string or a raw vector).
Files ending in .gz
, .bz2
, .xz
, or .zip
will
be automatically uncompressed. Files starting with http://
,
https://
, ftp://
, or ftps://
will be automatically
downloaded. Remote gz files can also be automatically downloaded and
decompressed.
Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with I()
, be a string
containing at least one new line, or be a vector containing at least one
string with a new line.
Using a value of clipboard()
will read from the system clipboard.
One or more selection expressions, like in
dplyr::select()
. Use c()
or list()
to use more than one expression.
See ?dplyr::select
for details on available selection options. Only the
specified columns will be read from data_file
.
Number of lines to skip before reading data.
Maximum number of lines to read.
Treatment of problematic column names:
"minimal"
: No name repair or checks, beyond basic existence,
"unique"
: Make sure names are unique and not empty,
"check_unique"
: (default value), no name repair, but check they are
unique
,
"universal"
: Make the names unique
and syntactic
a function: apply custom name repair (e.g., .name_repair = make.names
for names in the style of base R).
A purrr-style anonymous function, see rlang::as_function()
This argument is passed on as repair
to vctrs::vec_as_names()
.
See there for more details on these terms and the strategies used
to enforce them.
Data frame to write.
Path to a file where the data will be written.
Version of transport file specification to use: either 5 or 8.
Member name to record in file. Defaults to file name sans extension. Must be <= 8 characters for version 5, and <= 32 characters for version 8.
Dataset label to use, or NULL
. Defaults to the value stored in
the "label" attribute of data
.
Note that although SAS itself supports dataset labels up to 256 characters long, dataset labels in SAS transport files must be <= 40 characters.
Stata, SPSS and SAS do not have a concept of time zone,
and all date-time variables are treated as UTC. adjust_tz
controls
how the timezone of date-time values is treated when writing.
If TRUE
(the default) the timezone of date-time values is ignored, and
they will display the same in R and Stata/SPSS/SAS, e.g.
"2010-01-01 09:00:00 NZDT"
will be written as "2010-01-01 09:00:00"
.
Note that this changes the underlying numeric data, so use caution if
preserving between-time-point differences is critical.
If FALSE
, date-time values are written as the corresponding UTC value,
e.g. "2010-01-01 09:00:00 NZDT"
will be written as
"2009-12-31 20:00:00"
.
tmp <- tempfile(fileext = ".xpt")
write_xpt(mtcars, tmp)
read_xpt(tmp)
Run the code above in your browser using DataLab