import: Generic data import for openair

Description

Generic (workhorse) function for importing and formatting data for use with the openair package. The function uses read.table (in utils).

Usage

mydata <- import(file = file.choose(), file.type="csv", 
		header.at = 1, data.at = 2, 
		eof.report = NULL, na.strings = c("", "NA"), quote=""", 
		date.name = "date", date.break = "/", date.order = "dmy", 
		time.name = "date", time.break = ":", time.order = "hm", 
		time.format ="GMT", 
		is.ws = NULL, is.wd = NULL, is.site = NULL,
		misc.info = NULL, 
		bad.24 = FALSE, 
		correct.time = NULL, 
		output = "final")

Arguments

file

The name of the file to be imported. Default,

file =
file.choose()

, opens browser. Alternatively, the use of read.table (in utils) also allows this to be a character vector of a file path, connection or url (altho

file.type

The file format, defaults to common "csv" (comma delimited) format, but also allows "txt" (tab delimited).

header.at

The file row holding header information. This is used to set names for the resulting imported data frame.

data.at

The file row to start reading data from. When generating the data frame, the function will ignore all information before this row, and attempt to include all data from this row onwards unless eof.report enabled.

eof.report

End of file marker. When genearating the data frame, the function will ignore all information after eof.report is encountered. The default setting (NULL) turns this argument off.

na.strings

Strings of any terms that are to be interpreted as NA values within the file.

quote

String of characters (or character equivalents) the imported file may use to represent a character field.

date.name

Header name of column (or columns) holding date information. Combined with time information as single date column in the generated data frame.

date.break

The break character separating days, months and years in date information. For example, "-" in "01-01-2009".

date.order

The order of date information, using d for days, j for Julian date, m for months and y for years. So, "dmy" or "mdy" for common UK or US logger date stamp formats. Allows any logical combination ("y", "ymd", etc). Can also handle more complex date s

time.name

Header name of column (or columns) holding time information. Combined with date information as single date column in the generated data frame.

time.break

The break character separating hours, minutes and seconds in time information. For example, ":" in "12:00:00".

time.order

The order of time information, using h for hours, m for minutes and s for seconds. The argument allows any logical combination ("hm", "hms", etc). Like date.order, can also handle more complex date structures by calling POSIX* directly. For example,

time.format

The time format the imported data was logged in. Allows most common formats, e.g.: "GMT" (default), "UTC", etc. See as.POSIX* functions for further information.

is.ws

Wind speed information identifier. Default NULL turns this option off. When set to valid header/data column name, used to select wind speed data. Note: data renamed "ws" as part of this operation.

is.wd

Wind direction information identifier. Default NULL turns this option off. When set to valid header/data column name, used to select wind direction data. Note: data renamed "wd" as part of this operation.

is.site

Site information identifier. Default NULL turns this option off. When set, the standard (import) method uses this information to generate a "site" data column.

misc.info

Row number(s) of any additional information that may be required from the original file. Each line retained as a character vector in the generated data frame comment.

bad.24

Time stamp reset. Some time series are logged as 00:00:01 to 24:00:00 as opposed to the more conventional 00:00:00 to 23:59:59. bad.24 = TRUE resets the time stamp for the latter, which is not allowed by some R time series classes and functions.

correct.time

Numerical correction (in seconds) for imported date. Default NULL turns this option off. When enabled, used to offset "date" entries.

output

Type of data object to be generated. Default "final" returns a standard data set for use in openair. Alternative "working" returns a list of file components without testing file structure. This Option is intended to be used with wrapper functions.

Value

Using the default output = "final" setting, the function returns a data frame for use in openair. By comparison to the original file, the resulting data frame is modified as follows: Time and date information will combined in a single column "date", formatted as a conventional timeseries (as.POSIX*). Time adjustments may also be made, subject to bad.24 and correct.time argument settings. Columns identified as wind speed and wind direction information using "is.ws" and "is.wd", respectively, will be renamed "ws" and "wd", respectively. An additional "site" column will be generated if enabled by "is.site". Any additional information (as defined in "misc.info") and data adjustments (as set in''bad.24' and 'correct.time') will be retained in the data frame comment. Using the alternative output = "working" setting, the function returns a list containing separate data frames for the different elements of the data frame (data, names, date, misc.info, etc.).

Details

The import() function was developed to import and format data for direct use with the openair package. The main intention was to simplify initial data handling for those unfamilar with R, and, in particular, associated time series formatting requirements. Using default settings, import() imports files configured like example file "example data long.csv" (supplied with openair or available from the openair website). Other similar file structures can be readily imported by modifying the function arguments. More complex data importing and formatting can be achieved using an import wrapper. For example, the importAURNCsv is an import wrapper that uses import() with modified arguments to import data previously downloaded from the UK AURN database. This enforces unique handling of "is.site" and employs two additional arguments, "data.order" and "simplify.names" and rbind (in reshape) to complete additional reformatting.

Examples

Run this code

##########
# example 1
##########
# data obtained from http://www.openair-project.org

#import data as mydata
## mydata <- import("example data long.csv")

#use openair function
## polar.plot(mydata, pollutant="nox")

Run the code above in your browser using DataLab