Principal Variable Analysis (PVA) (Cummings, 2007) selects a subset from a set of the variables such that the variables in the subset are as uncorrelated as possible, in an effort to ensure that all aspects of the variation in the data are covered. Here, all observations in a specified time interval are used for calculation the correlations on which the selection is based.
intervalPVA(responses, data, times.factor = "Days", start.time, end.time,
nvarselect = NULL, p.variance = 1, include = NULL,
plot = TRUE, ...)
A data.frame
giving the results of the variable selection.
It will contain the columns Variable
, Selected
,
h.partial
, Added.Propn
and Cumulative.Propn
.
A character
giving the names of the columns in
data
from which the variables are to be selected.
A data.frame
containing the columns of variables from which the
selection is to be made.
A character
giving the name of the column in
data
containing the factor for times at which the data was
collected. Its levels will be used to identify the subset and
should be numeric values stored as characters.
A numeric
giving the time,
in terms of a level of times.factor
, at which the time interval
begins; observations at this time and up to and including end.time
will be included.
A numeric
giving the time,
in terms of levels of times.factor
, at the end of the interval;
observations after this time will not be included.
A numeric
specifying the number of variables to be selected,
which includes those listed in include
. If nvarselect = 1
, as
many variables are selected as is need to satisfy p.variance
.
A numeric
specifying the minimum proportion of the variance
that the selected variables must account for,
A character
giving the names of the columns in
data
for the variables whose selection is mandatory.
A logical
indicating whether a plot of the cumulative proportion
of the variance explained is to be produced.
allows passing of arguments to other functions.
Chris Brien
The variable that is most correlated with the other variables is selected first for inclusion. The partial correlation for each of the remaining variables, given the first selected variable, is calculated and the most correlated of these variables is selects for inclusion next. Then the partial correlations are adjust for the second included variables. This process is repeated until the specified criteria have been satisfied. The possibilities are to:
the default (nvarselect = NULL
and p.variance = 1
) select all variables in
increasing order of amount of information they provide;
select exactly nvarselect
variables;
select just enough variables, up to a maximum of nvarselect
variables, to explain
at least p.variance
*100 per cent of the total variance.
Cumming, J. A. and D. A. Wood (2007) Dimension reduction via principal variables. Computational Statistics and Data Analysis, 52, 550--565.
PVA
, rcontrib
data(exampleData)
responses <- c("Area","Area.SV","Area.TV", "Image.Biomass", "Max.Height","Centre.Mass",
"Density", "Compactness.TV", "Compactness.SV")
results <- intervalPVA(responses, longi.dat,
start.time = "31", end.time = "31",
p.variance=0.9, plot = FALSE)
Run the code above in your browser using DataLab