Learn R Programming

Hmisc (version 2.2-3)

sas.get: Convert a SAS Dataset to an S Data Frame

Description

Converts a SAS dataset into an S data frame. You may choose to extract only a subset of variables or a subset of observations in the SAS dataset. The function will automatically convert PROC FORMAT-coded variables to factor objects. The original SAS codes are stored in an attribute called sas.codes and these may be added back to the levels of a factor variable using the code.levels function. Information about special missing values may be captured in an attribute of each variable having special missing values. This attribute is called special.miss, and such variables are given class special.miss. There are print, [], format, and is.special.miss methods for such variables. date, time, and date-time variables use respectively Dates, DateTimeClasses, and chron variables. If using S-Plus 5 or 6 or later, the timeDate function is used instead. If a date variable represents a partial date (.5 added if month missing, .25 added if day missing, .75 if both), an attribute partial.date is added to the variable, and the variable also becomes a class imputed variable. The describe function uses information about partial dates and special missing values. There is an option to automatically PKUNZIP compressed SAS datasets.

sas.get works by composing and running a SAS job that creates various ASCII files that are read and analyzed by sas.get. You can also run the SAS sas_get macro, which writes the ASCII files for downloading, in a separate step or on another computer, and then tell sas.get (through the sasout argument) to access these files instead of running SAS.

Usage

sas.get(library, member, variables=character(0), ifs=character(0),
     format.library=library, id,
     dates.=c("sas","yymmdd","yearfrac","yearfrac2"),
     keep.log=TRUE, log.file="_temp_.log", macro=sas.get.macro,
     data.frame.out=existsFunction("data.frame"), clean.up=!.R., quiet=FALSE,
     temp=tempfile("SaS"), formats=TRUE, 
     recode=formats, special.miss=FALSE, sasprog="sas",
     as.is=.5, check.unique.id=TRUE, force.single=FALSE, where,
     uncompress=FALSE)

is.special.miss(x, code)

x[...]

## S3 method for class 'special.miss': print(x, ...)

## S3 method for class 'special.miss': format(x, ...)

sas.codes(object)

code.levels(object)

Arguments

library
character string naming the directory in which the dataset is kept. The default is library=".", indicating that the current directory is to be used.
member
character string giving the second part of the two part SAS dataset name. (The first part is irrelevant here - it is mapped to the directory name.)
x
a variable that may have been created by sas.get with special.miss=T or with recode in effect.
variables
vector of character strings naming the variables in the SAS dataset. The resulting data frame will contain only those variables from the SAS dataset. To get all of the variables (the default), an empty string may be given. It is a fatal error if any o
ifs
a vector of character strings, each containing one SAS "subsetting if" statement. These will be used to extract a subset of the observations in the SAS dataset.
format.library
The directory containing the file formats.sc2, which contains the definitions of the user defined formats used in this dataset. By default, we look for the formats in the same directory as the data. The user defined formats must be available (so SA
formats
Set formats to FALSE to keep sas.get from telling the SAS macro to retrieve value label formats from format.library. When you do not specify formats or recode, sas.get
recode
This parameter defaults to T if formats is T. If it is T, variables that have an appropriate format (see above) are recoded as factor objects, which map the values to the value labels for t
special.miss
For numeric variables, any missing values are stored as NA in S. You can recover special missing values by setting special.miss to T. This will cause the special.miss attribute and the special.miss clas
id
The name of the variable to be used as the row names of the S dataset. The id variable becomes the row.names attribute of a data frame, but the id variable is still retained as a variable in the data frame. You can also specify a vector of va
dates.
specifies the format for storing SAS dates in the resulting data frame
as.is
SAS character variables are converted to S factor objects if as.is=FALSE or if as.is is a number between 0 and 1 inclusive and the number of unique values of the variable is less than the number of observations (n) t
check.unique.id
If id is specified, the row names are checked for uniqueness if check.unique.id=T. If any are duplicated, a warning is printed. Note that if a data frame is being created with duplicate row names, statements such as my.da
force.single
By default, SAS numeric variables having LENGTHs > 4 are stored as S double precision numerics, which allow for the same precision as a SAS LENGTH 8 variable. Set force.single=T to store every numeric variable in si
keep.log
logical flag: if FALSE, delete the SAS log file upon completion.
log.file
the name of the SAS log file.
macro
the name of an S object in the current search path that contains the text of the SAS macro called by S. The S object is a character vector that can be edited using, for example, sas.get.macro <- editor(sas.get.macro).
data.frame.out
set to FALSE to make the result a list instead of a data frame
clean.up
logical flag: if TRUE, remove all temporary files when finished. You may want to keep these while debugging the SAS macro. Not needed for R.
quiet
logical flag: if FALSE, print the contents of the SAS log file if there has been an error.
temp
the prefix to use for the temporary files. Two characters will be added to this, the resulting name must fit on your file system.
sasprog
the name of the system command to invoke SAS
uncompress
set to FALSE by default. Set it to T to automatically invoke the DOS PKUNZIP command if member.zip exists, to uncompress the SAS dataset before proceeding. This assumes you have the file permissions to
where
by default, a list or data frame which contains all the variables is returned. If you specify where, each individual variable is placed into a separate object (whose name is the name of the variable) using the assign function wi
code
a special missing value code (A through Z or underscore) to check against. If code is omitted, is.special.miss will return a T for each observation that has any special missing value.
object
a variable in a data frame created by sas.get
...
ignored

Value

  • A data frame resembling the SAS dataset. If id was specified, that column of the data frame will be used as the row names of the data frame. Each variable in the data frame or vector in the list will have the attributes label and format containing SAS labels and formats. Underscores in formats are converted to periods. Formats for character variables have $ placed in front of their names. If formats is T and there are any appropriate format definitions in format.library, the returned object will have attribute formats containing lists named the same as the format names (with periods substituted for underscores and character formats prefixed by $). Each of these lists has a vector called values and one called labels with the PROC FORMAT; VALUE ...definitions.

Side Effects

if a SAS error occurs the SAS log file will be printed under the control of the pager function.

BACKGROUND

The references cited below explain the structure of SAS datasets and how they are stored. See SAS Language for a discussion of the "subsetting if" statement.

Details

If you specify special.miss=T and there are no special missing values in the data SAS dataset, the SAS step will bomb.

For variables having a PROC FORMAT VALUE format with some of the levels undefined, sas.get will interpret those values as NA if you are using recode.

If you leave the sasprog argument at its default value of "sas", be sure that the SAS executable is in the PATH specified in your autoexec.bat file. Also make sure that you invoke S so that your current project directory is known to be the current working directory. This is best done by creating a shortcut in Windows95, for which the command to execute will be something like drive:\spluswin\cmd\splus.exe HOME=. and the program is flagged to start in drive:\myproject for example. In this way, you will be able to examine the SAS log file easily since it will be placed in drive:\myproject by default.

SAS will create SASWORK and SASUSER directories in what it thinks are the current working directories. To specify where SAS should put these instead, edit the config.sas file or specify a sasprog argument of the following form: sasprog="\sas\sas.exe -saswork c:\saswork -sasuser c:\sasuser".

When sas.get needs to run SAS it is run in iconized form.

The SAS macro sas_get uses record lengths of up to 4096 in two places. If you are exporting records that are very long (because of a large number of variables and/or long character variables), you may want to edit these LRECLs to quadruple them, for example.

References

SAS Institute Inc. (1990). SAS Language: Reference, Version 6. First Edition. SAS Institute Inc., Cary, North Carolina.

SAS Institute Inc. (1988). SAS Technical Report P-176, Using the SAS System, Release 6.03, under UNIX Operating Systems and Derivatives. SAS Institute Inc., Cary, North Carolina.

SAS Institute Inc. (1985). SAS Introductory Guide. Third Edition. SAS Institute Inc., Cary, North Carolina.

See Also

data.frame, describe, label, upData

Examples

Run this code
mice <- sas.get("saslib", mem="mice", var=c("dose", "strain", "ld50"))
plot(mice$dose, mice$ld50)

nude.mice <- sas.get(lib=unix("echo $HOME/saslib"), mem="mice",
	ifs="if strain='nude'")

nude.mice.dl <- sas.get(lib=unix("echo $HOME/saslib"), mem="mice",
	var=c("dose", "ld50"), ifs="if strain='nude'")

# Get a dataset from current directory, recode PROC FORMAT; VALUE \dots 
# variables into factors with labels of the form "good(1)" "better(2)",
# get special missing values, recode missing codes .D and .R into new
# factor levels "Don't know" and "Refused to answer" for variable q1
d <- sas.get(mem="mydata", recode=2, special.miss=TRUE)
attach(d)
nl <- length(levels(q1))
lev <- c(levels(q1), "Don't know", "Refused")
q1.new <- as.integer(q1)
q1.new[is.special.miss(q1,"D")] <- nl+1
q1.new[is.special.miss(q1,"R")] <- nl+2
q1.new <- factor(q1.new, 1:(nl+2), lev)
# Note: would like to use factor() in place of as.integer \dots but
# factor in this case adds "NA" as a category level

d <- sas.get(mem="mydata")
sas.codes(d$x)    # for PROC FORMATted variables returns original data codes
d$x <- code.levels(d$x)   # or attach(d); x <- code.levels(x)
# This makes levels such as "good" "better" "best" into e.g.
# "1:good" "2:better" "3:best", if the original SAS values were 1,2,3

# For the following example, suppose that SAS is run on a
# different machine from the one on which S is run.
# The sas_get macro is used to create files needed by
# sas.get.  To make a text file containing the sas_get macro
# run the following S command, for example:
#   cat(sas.get.macro, file='/sasmacro/sas_get.sas', sep='\n')

# Here is the SAS job.  This job assumes that you put
# sas_get.sas in an autocall macro library.


#  libname db '/my/sasdata/area';
#  %sas_get(db.mydata, dict, data, formats, specmiss,
#           formats=1, specmiss=1)


# Substitute whatever file names you may want.
# Next the 4 files are moved to the S machine (using
# ASCII file transfer mode) and the following S
# program is run:


mydata <- sas.get(sasout=c('dict','data','formats','specmiss'),
                  id='idvar')


# If PKZIP is run after %sas_get, e.g. "PKZIP port dict data formats"
# (assuming that specmiss was not used here), use


mydata <- sas.get(sasout='a:port', id='idvar')


# which will run PKUNZIP port to unzip a:port.zip, creating the
# dict, data, and formats files which are generated (and later
# deleted) by sas.get


# Retrieve the same variables from another dataset (or an update of
# the original dataset)
mydata2 <- sas.get('mydata2', var=names(mydata))
# This only works if none of the original SAS variable names contained _

# Code from Don MacQueen to generate SAS dataset to test import of
# date, time, date-time variables
# data ssd.test;
#     d1='3mar2002'd ;
#     dt1='3mar2002 9:31:02'dt;
#     t1='11:13:45't;
#     output;
#
#     d1='3jun2002'd ;
#     dt1='3jun2002 9:42:07'dt;
#     t1='11:14:13't;
#     output;
#     format d1 mmddyy10. dt1 datetime. t1 time.;
# run;

Run the code above in your browser using DataLab