Utility function to split data frames up in various ways for conditioning
plots. Users would generally not be expected to call this function directly.
Widely used by many openair
functions usually through the option
type
.
cutData(
x,
type = "default",
hemisphere = "northern",
n.levels = 4,
start.day = 1,
is.axis = FALSE,
local.tz = NULL,
latitude = 51,
longitude = -0.5,
...
)
Returns a data frame with a column cond
that is defined by
type
.
A data frame containing a field date
.
A string giving the way in which the data frame should be split. Pre-defined values are: “default”, “year”, “hour”, “month”, “season”, “weekday”, “site”, “weekend”, “monthyear”, “daylight”, “dst” (daylight saving time).
type
can also be the name of a numeric or factor. If a numeric
column name is supplied cutData
will split the data into four
quantiles. Factors levels will be used to split the data without any
adjustment.
Can be "northern"
or "southern"
, used to
split data into seasons.
Number of quantiles to split numeric data into.
What day of the week should the type = "weekday"
start on? The user can change the start day by supplying an integer
between 0 and 6. Sunday = 0, Monday = 1, ... For example to start the
weekday plots on a Saturday, choose start.day = 6
.
A logical (TRUE
/FALSE
), used to request
shortened cut labels for axes.
Used for identifying whether a date has daylight savings time
(DST) applied or not. Examples include local.tz = "Europe/London"
,
local.tz = "America/New_York"
i.e. time zones that assume DST.
https://en.wikipedia.org/wiki/List_of_zoneinfo_time_zones shows time
zones that should be valid for most systems. It is important that the
original data are in GMT (UTC) or a fixed offset from GMT. See
import
and the openair manual for information on how to import data
and ensure no DST is applied.
The decimal latitude used in type = "daylight"
.
The decimal longitude. Note that locations west of Greenwich are negative.
All additional parameters are passed on to next function(s).
David Carslaw (cutData) and Karl Ropkins (cutDaylight)
This section give a brief description of each of the define levels of
type
. Note that all time dependent types require a column date
.
"default" does not split the data but will describe the levels as a date range in the format "day month year".
"year" splits the data by each year.
"month" splits the data by month of the year.
"hour" splits the data by hour of the day.
"monthyear" splits the data by year and month. It differs from month in that a level is defined for each month of the data set. This is useful sometimes to show an ordered sequence of months if the data set starts half way through a year; rather than starting in January.
"weekend" splits the data by weekday and weekend.
"weekday" splits the data by day of the week - ordered to start Monday.
"season" splits data up by season. In the northern hemisphere winter =
December, January, February; spring = March, April, May etc. These
definitions will change of hemisphere = "southern"
.
"seasonyear (or "yearseason") will split the data into year-season intervals,
keeping the months of a season together. For example, December 2010 is
considered as part of winter 2011 (with January and February 2011). This
makes it easier to consider contiguous seasons. In contrast, type =
"season"
will just split the data into four seasons regardless of the year.
"daylight" splits the data relative to estimated sunrise and sunset to give
either daylight or nighttime. The cut is made by cutDaylight
but more
conveniently accessed via cutData
, e.g. cutData(mydata, type =
"daylight", latitude = my.latitude, longitude = my.longitude)
. The daylight
estimation, which is valid for dates between 1901 and 2099, is made using the
measurement location, date, time and astronomical algorithms to estimate the
relative positions of the Sun and the measurement location on the Earth's
surface, and is based on NOAA methods. Measurement location should be set
using latitude
(+ to North; - to South) and longitude
(+ to
East; - to West).
"dst" will split the data by hours that are in daylight saving time (DST) and
hours that are not for appropriate time zones. The option "dst" also requires
that the local time zone is given e.g. local.tz = "Europe/London"
,
local.tz = "America/New_York"
. Each of the two periods will be in
local time. The main purpose of this option is to test whether there
is a shift in the diurnal profile when DST and non-DST hours are compared.
This option is particularly useful with the timeVariation
function.
For example, close to the source of road vehicle emissions, `rush-hour' will
tend to occur at the same local time throughout the year e.g. 8 am and
5 pm. Therefore, comparing non-DST hours with DST hours will tend to show
similar diurnal patterns (at least in the timing of the peaks, if not
magnitude) when expressed in local time. By contrast a variable such as wind
speed or temperature should show a clear shift when expressed in local time.
In essence, this option when used with timeVariation
may help
determine whether the variation in a pollutant is driven by man-made
emissions or natural processes.
"wd" splits the data by 8 wind sectors and requires a column wd
: "NE",
"E", "SE", "S", "SW", "W", "NW", "N".
"ws" splits the data by 8 quantiles of wind speed and requires a column
ws
.
"site" splits the data by site and therefore requires a column site
.
Note that all the date-based types e.g. month/year are derived from a column
date
. If a user already has a column with a name of one of the
date-based types it will not be used.
## split data by day of the week
mydata <- cutData(mydata, type = "weekday")
Run the code above in your browser using DataLab