tisFromCsv: Read time series from Comma Separated Values (.csv) file

Description

Reads tis (Time Indexed Series) from a csv file, returning the series in a list, and optionally storing them in an environment.

Usage

tisFromCsv(csvFile, dateCol = "date", dateFormat = "%Y%m%d", tz = "",
           tif = NULL, defaultTif = "business",
           save = F, envir = parent.frame(),
           naNumber = NULL, chopNAs = TRUE,
           tolerance = sqrt(.Machine$double.eps), ...)

Arguments

csvFile

A file name, connection, or URL acceptable to read.csv. Also see the the rest of this help entry for required attributes of this file.

dateCol

name of the column holding dates. This column must be present in the file.

dateFormat

format of the dates in dateCol. If the dateCol cells contain Excel dates, use dateFormat == "excel". If they are strings, see strptime for date formats.

the time zone to be used by strptime when converting date strings into POSIXlt timestamps. The default is to use the current time zone, which means it can change from, say, EST to EDT in the spring, and back to EST in the fall. If you have an "impossible" time in your csv file, like 2 am on March 13, 2011, this will result in an unexpected NA in the created ti dates, which will result in those rows in your csv being effectively ignored.

tif

time index frequency of the data. If this is NULL (the default), the function tries to infer the frequency from the dates in the ymdCol column.

defaultTif

If the frequency can't be inferred from the dates in the ymdCol column, this tif frequency will be used. This should be a rare occurrence.

save

If true, save the individual series in the enviroment given by the envir argument. Default is FALSE.

envir

if save == TRUE, the individual series (one per column) are saved in this enviroment. Default is the frame of the caller.

naNumber

if non-NULL, numbers within tolerance of this number are considered to be NA values. NA strings can be specified by including an na.strings argument as one of the … arguments that are passed along to read.csv.

chopNAs

if TRUE (the default), leading and trailing NA values are cut off of each column.

tolerance

Used to determine whether or not numbers in the file are close enough to naNumber to be regarded as equal to it. The default is about 1.48e-08.

…

Additional arguments passed along to the underlying read.csv function.

Value

A list of tis time series, one per column of the csv file. The list is returned invisibly if save is TRUE.

Details

File Requirements: The csv file must have column names across the top, and everything but the first row should be numeric. There must be as many column names (enclosed in quotes) as there are columns, and the column named by dateCol must have dates in the format indicated by dateFormat. The dateCol column must be present.

Missing (NA) values: Missing and NA values are the same thing. The underlying read.csv has "," as its default separator and "NA" as its default na.string, so the rows

20051231,,13,,42,NA, 20060131,NA,14,,43,,NA

indicate NA values for both the Dec 2005 and Jan 2006 observations of the first, third, fifth and sixth series.

The values in the file are read into a single large tis series, with a tif (Time Index Frequency) inferred from the first six dates in the ymd column. The first date is converted to a ti (Time Index) of that frequency and becomes the start of the series. If chopNAs is TRUE, each individual column is then windowed via naWindow to strip off leading and trailing NA values, and the resulting series are put into a list with names given by lower-casing the column names from the csv file. If save is TRUE, the series are also stored in envir using those same names.

Description

Usage

Arguments

Value

Details

See Also