csv.get: Read Comma-Separated Text Data Files

Description

Read comma-separated text data files, allowing optional translation to lower case for variable names after making them valid S names. There is a facility for reading long variable labels as one of the rows. If labels are not specified and a final variable name is not the same as that in the header, the original variable name is saved as a variable label. Uses read.csv if the data.table package is not in effect, otherwise calls fread.

Usage

csv.get(file, lowernames=FALSE, datevars=NULL, datetimevars=NULL,
        dateformat='%F',
        fixdates=c('none','year'), comment.char="", autodates=TRUE,
        allow=NULL, charfactor=FALSE,
        sep=',', skip=0, vnames=NULL, labels=NULL, …)

Arguments

file

the file name for import.

lowernames

set this to TRUE to change variable names to lower case.

datevars

character vector of names (after lowernames is applied) of variables to consider as a factor or character vector containing dates in a format matching dateformat. The default is "%F" which uses the yyyy-mm-dd format.

datetimevars

character vector of names (after lowernames is applied) of variables to consider to be date-time variables, with date formats as described under datevars followed by a space followed by time in hh:mm:ss format. chron is used to store such variables. If all times in the variable are 00:00:00 the variable will be converted to an ordinary date variable.

dateformat

for cleanup.import is the input format (see strptime)

fixdates

for any of the variables listed in datevars that have a dateformat that cleanup.import understands, specifying fixdates allows corrections of certain formatting inconsistencies before the fields are attempted to be converted to dates (the default is to assume that the dateformat is followed for all observation for datevars). Currently fixdates='year' is implemented, which will cause 2-digit or 4-digit years to be shifted to the alternate number of digits when dateform is the default "%F" or is "%y-%m-%d", "%m/%d/%y", or "%m/%d/%Y". Two-digits years are padded with 20 on the left. Set dateformat to the desired format, not the exceptional format.

comment.char

a character vector of length one containing a single character or an empty string. Use '""' to turn off the interpretation of comments altogether.

autodates

Set to true to allow function to guess at which variables are dates

allow

a vector of characters allowed by R that should not be converted to periods in variable names. By default, underscores in variable names are converted to periods as with R before version 1.9.

charfactor

set to TRUE to change character variables to factors if they have fewer than n/2 unique values. Blanks and null strings are converted to NAs.

sep

field separator, defaults to comma

skip

number of records to skip before data start. Required if vnames or labels is given.

vnames

number of row containing variable names, default is one

labels

number of row containing variable labels, default is no labels

…

arguments to pass to read.csv other than skip and sep.

Value

a new data frame.

Details

csv.get reads comma-separated text data files, allowing optional translation to lower case for variable names after making them valid S names. Original possibly non-legal names are taken to be variable labels if labels is not specified. Character or factor variables containing dates can be converted to date variables. cleanup.import is invoked to finish the job.

Examples

Run this code

# NOT RUN {
dat <- csv.get('myfile.csv')

# Read a csv file with junk in the first row, variable names in the
# second, long variable labels in the third, and junk in the 4th row
dat <- csv.get('myfile.csv', vnames=2, labels=3, skip=4)
# }

Run the code above in your browser using DataLab