Read comma-separated text data files, allowing optional translation
to lower case for variable names after making them valid S names.
There is a facility for reading long variable labels as one of the
rows. If labels are not specified and a final variable name is not
the same as that in the header, the original variable name is saved as
a variable label. Uses read.csv
if the data.table
package is not in effect, otherwise calls fread
.
csv.get(file, lowernames=FALSE, datevars=NULL, datetimevars=NULL,
dateformat='%F',
fixdates=c('none','year'), comment.char="", autodate=TRUE,
allow=NULL, charfactor=FALSE,
sep=',', skip=0, vnames=NULL, labels=NULL, text=NULL, ...)
a new data frame.
the file name for import.
set this to TRUE
to change variable names to
lower case.
character vector of names (after lowernames
is
applied) of variables to consider as a factor or character vector
containing dates in a format matching dateformat
. The
default is "%F"
which uses the yyyy-mm-dd format.
character vector of names (after lowernames
is applied) of variables to consider to be date-time variables, with
date formats as described under datevars
followed by a space
followed by time in hh:mm:ss format. chron
is used to store
such variables. If all times in the variable
are 00:00:00 the variable will be converted to an ordinary date variable.
for cleanup.import
is the input format (see
strptime
)
for any of the variables listed in datevars
that have a dateformat
that cleanup.import
understands,
specifying fixdates
allows corrections of certain formatting
inconsistencies before the fields are attempted to be converted to
dates (the default is to assume that the dateformat
is followed
for all observation for datevars
). Currently
fixdates='year'
is implemented, which will cause 2-digit or
4-digit years to be shifted to the alternate number of digits when
dateform
is the default "%F"
or is "%y-%m-%d"
,
"%m/%d/%y"
, or "%m/%d/%Y"
. Two-digits years are
padded with 20
on the left. Set dateformat
to the
desired format, not the exceptional format.
a character vector of length one containing a single character or an empty string. Use '""' to turn off the interpretation of comments altogether.
Set to true to allow function to guess at which variables are dates
a vector of characters allowed by R that should not be converted to periods in variable names. By default, underscores in variable names are converted to periods as with R before version 1.9.
set to TRUE
to change character variables to
factors if they have fewer than n/2 unique values. Blanks and null
strings are converted to NA
s.
field separator, defaults to comma
number of records to skip before data start. Required if
vnames
or labels
is given.
number of row containing variable names, default is one
number of row containing variable labels, default is no labels
a character string containing the .csv
file to use
instead of file=
. Passed to read.csv
as the
text=
argument.
arguments to pass to read.csv
other than
skip
and sep
.
Frank Harrell, Vanderbilt University
csv.get
reads comma-separated text data files, allowing optional
translation to lower case for variable names after making them valid S
names. Original possibly non-legal names are taken to be variable
labels if labels
is not specified. Character or factor
variables containing dates can be converted to date variables.
cleanup.import
is invoked to finish the job.
if (FALSE) {
dat <- csv.get('myfile.csv')
# Read a csv file with junk in the first row, variable names in the
# second, long variable labels in the third, and junk in the 4th row
dat <- csv.get('myfile.csv', vnames=2, labels=3, skip=4)
}
Run the code above in your browser using DataLab