FPCA: Functional Principal Component Analysis

Description

FPCA for dense or sparse functional data.

Usage

FPCA(Ly, Lt, optns = list())

Value

A list containing the following fields:

sigma2: Variance for measurement error.
lambda: A vector of length K containing eigenvalues.
phi: An nWorkGrid by K matrix containing eigenfunctions, supported on workGrid.
xiEst: A n by K matrix containing the FPC estimates.
xiVar: A list of length n, each entry containing the variance estimates for the FPC estimates.
obsGrid: The (sorted) grid points where all observation points are pooled.
mu: A vector of length nWorkGrid containing the mean function estimate.
workGrid: A vector of length nWorkGrid. The internal regular grid on which the eigen analysis is carried on.
smoothedCov: A nWorkGrid by nWorkGrid matrix of the smoothed covariance surface.
fittedCov: A nWorkGrid by nWorkGrid matrix of the fitted covariance surface, which is guaranteed to be non-negative definite.
fittedCorr: A nWorkGrid by nWorkGrid matrix of the fitted correlation surface computed from fittedCov.
optns: A list of actually used options.
timings: A vector with execution times for the basic parts of the FPCA call.
bwMu: The selected (or user specified) bandwidth for smoothing the mean function.
bwCov: The selected (or user specified) bandwidth for smoothing the covariance function.
rho: A regularizing scalar for the measurement error variance estimate.
cumFVE: A vector with the fraction of the cumulative total variance explained with each additional FPC.
FVE: A fraction indicating the total variance explained by chosen FPCs with corresponding 'FVEthreshold'.
selectK: Number K of selected components.
criterionValue: A scalar specifying the criterion value obtained by the selected number of components with specific methodSelectK: FVE, AIC, BIC values or NULL for fixed K.
inputData: A list containing the original 'Ly' and 'Lt' lists used as inputs to FPCA. NULL if 'lean' was specified to be TRUE.

Arguments

Ly: A list of n vectors containing the observed values for each individual. Missing values specified by NAs are supported for dense case (dataType='Dense').
Lt: A list of n vectors containing the observation time points for each individual corresponding to y. Each vector should be sorted in ascending order.
optns: A list of options control parameters specified by list(name=value). See `Details'.

Details

If the input is sparse data, make sure you check the design plot is dense and the 2D domain is well covered by support points, using plot or CreateDesignPlot. Some study design such as snippet data (where each subject is observed only on a sub-interval of the period of study) will have an ill-covered design plot, in which case the nonparametric covariance estimate will be unreliable. WARNING! Slow computation times may occur if the dataType argument is incorrect. If FPCA is taking a while, please double check that a dense design is not mistakenly coded as 'Sparse'. Applying FPCA to a mixture of very dense and sparse curves may result in computational issues.

Available control options are

userBwCov: The bandwidth value for the smoothed covariance function; positive numeric - default: determine automatically based on 'methodBwCov'
methodBwCov: The bandwidth choice method for the smoothed covariance function; 'GMeanAndGCV' (the geometric mean of the GCV bandwidth and the minimum bandwidth),'CV','GCV' - default: 10% of the support
userBwMu: The bandwidth value for the smoothed mean function (using 'CV' or 'GCV'); positive numeric - default: determine automatically based on 'methodBwMu'
methodBwMu: The bandwidth choice method for the mean function; 'GMeanAndGCV' (the geometric mean of the GCV bandwidth and the minimum bandwidth),'CV','GCV' - default: 5% of the support
dataType: The type of design we have (usually distinguishing between sparse or dense functional data); 'Sparse', 'Dense', 'DenseWithMV', 'p>>n' - default: determine automatically based on 'IsRegular'
diagnosticsPlot: Deprecated. Same as the option 'plot'
plot: Plot FPCA results (design plot, mean, scree plot and first K (<=3) eigenfunctions); logical - default: FALSE
error: Assume measurement error in the dataset; logical - default: TRUE
fitEigenValues: Whether also to obtain a regression fit of the eigenvalues - default: FALSE
FVEthreshold: Fraction-of-Variance-Explained threshold used during the SVD of the fitted covariance function; numeric (0,1] - default: 0.99
FVEfittedCov: Fraction-of-Variance explained by the components that are used to construct fittedCov; numeric (0,1] - default: NULL (all components available will be used)
kernel: Smoothing kernel choice, common for mu and covariance; "rect", "gauss", "epan", "gausvar", "quar" - default: "gauss"; dense data are assumed noise-less so no smoothing is performed.
kFoldMuCov: The number of folds to be used for mean and covariance smoothing. Default: 10
lean: If TRUE the 'inputData' field in the output list is empty. Default: FALSE
maxK: The maximum number of principal components to consider - default: min(20, N-2,nRegGrid-2), N:# of curves, nRegGrid:# of support points in each direction of covariance surface
methodXi: The method to estimate the PC scores; 'CE' (Conditional Expectation), 'IN' (Numerical Integration) - default: 'CE' for sparse data and dense data with missing values, 'IN' for dense data. If time points are irregular but spacing is small enough, 'IN' method is utilized by default.
methodMuCovEst: The method to estimate the mean and covariance in the case of dense functional data; 'cross-sectional', 'smooth' - default: 'cross-sectional'
nRegGrid: The number of support points in each direction of covariance surface; numeric - default: 51
numBins: The number of bins to bin the data into; positive integer > 10, default: NULL
methodSelectK: The method of choosing the number of principal components K; 'FVE','AIC','BIC', or a positive integer as specified number of components: default 'FVE')
shrink: Whether to use shrinkage method to estimate the scores in the dense case (see Yao et al 2003) - default FALSE
outPercent: A 2-element vector in [0,1] indicating the percentages of the time range to be considered as left and right boundary regions of the time window of observation - default (0,1) which corresponds to no boundary
methodRho: The method of regularization (add to diagonal of covariance surface) in estimating principal component scores; 'trunc': rho is truncation of sigma2, 'ridge': rho is a ridge parameter, 'vanilla': vanilla approach - default "vanilla".
rotationCut: The 2-element vector in [0,1] indicating the percent of data truncated during sigma^2 estimation; default (0.25, 0.75))
useBinnedData: Should the data be binned? 'FORCE' (Enforce the # of bins), 'AUTO' (Select the # of bins automatically), 'OFF' (Do not bin) - default: 'AUTO'
useBinnedCov: Whether to use the binned raw covariance for smoothing; logical - default:TRUE
usergrid: Whether to use observation grid for fitting, if false will use equidistant grid. logical - default:FALSE
userCov: The user-defined smoothed covariance function; list of two elements: numerical vector 't' and matrix 'cov', 't' must cover the support defined by 'Ly' - default: NULL
userMu: The user-defined smoothed mean function; list of two numerical vector 't' and 'mu' of equal size, 't' must cover the support defined 'Ly' - default: NULL
userSigma2: The user-defined measurement error variance. A positive scalar. If specified then the vanilla approach is used (methodRho is set to 'vanilla', unless specified otherwise). Default to `NULL`
userRho: The user-defined measurement truncation threshold used for the calculation of functional principal components scores. A positive scalar. Default to `NULL`
useBW1SE: Pick the largest bandwidth such that CV-error is within one Standard Error from the minimum CV-error, relevant only if methodBwMu ='CV' and/or methodBwCov ='CV'; logical - default: FALSE
imputeScores: Whether to impute the FPC scores or not; default: 'TRUE'
verbose: Display diagnostic messages; logical - default: FALSE

References

Yao, F., Müller, H.G., Clifford, A.J., Dueker, S.R., Follett, J., Lin, Y., Buchholz, B., Vogel, J.S. (2003). "Shrinkage estimation for functional principal component scores, with application to the population kinetics of plasma folate." Biometrics 59, 676-685. (Shrinkage estimates for dense data)

Yao, Fang, Müller, Hans-Georg and Wang, Jane-Ling (2005). "Functional data analysis for sparse longitudinal data." Journal of the American Statistical Association 100, no. 470 577-590. (Sparse data FPCA)

Liu, Bitao and Müller, Hans-Georg (2009). "Estimating derivatives for samples of sparsely observed functions, with application to online auction dynamics." Journal of the American Statistical Association 104, no. 486 704-717. (Sparse data FPCA)

Castro, P. E., Lawton, W.H. and Sylvestre, E.A. (1986). "Principal modes of variation for processes with continuous sample curves." Technometrics 28, no. 4, 329-337. (modes of variation for dense data FPCA)

Examples

Run this code

set.seed(1)
n <- 20
pts <- seq(0, 1, by=0.05)
sampWiener <- Wiener(n, pts)
sampWiener <- Sparsify(sampWiener, pts, 10)
res <- FPCA(sampWiener$Ly, sampWiener$Lt, 
            list(dataType='Sparse', error=FALSE, kernel='epan', verbose=TRUE))
plot(res) # The design plot covers [0, 1] * [0, 1] well.
CreateCovPlot(res, 'Fitted')
CreateCovPlot(res, corr = TRUE)

Run the code above in your browser using DataLab