tempGP: temporal Gaussian process

Description

A Gaussian process based power curve model which explicitly models the temporal aspect of the power curve. The model consists of two parts: f(x) and g(t).

Usage

tempGP(
  trainX,
  trainY,
  trainT = NULL,
  fast_computation = TRUE,
  limit_memory = 5000L,
  optim_control = list(batch_size = 100L, learn_rate = 0.05, max_iter = 5000L, tol =
    1e-06, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08, logfile = NULL)
)

Value

An object of class tempGP with the following attributes:

trainX - same as the input matrix trainX.
trainY - same as the input vector trainY.
thinningNumber - the thinning number computed by the algorithm.
modelF - A list containing the details of the model for predicting function f(x):
- X - The input variable matrix for computing the cross-covariance for predictions, same as trainX unless the model is updated. See updateData.tempGP method for details on updating the model.
- y - The response vector, again same as trainY unless the model is updated.
- weightedY - The weighted response, that is, the response left multiplied by the inverse of the covariance matrix.
modelG - A list containing the details of the model for predicting function g(t):
- residuals - The residuals after subtracting function f(x) from the response. Used to predict g(t). See updateData.tempGP method for updating the residuals.
- time_index - The time indices of the residuals, same as trainT.
estimatedParams - Estimated hyperparameters for function f(x).
llval - log-likelihood value of the hyperparameter optimization for f(x).
gradval - gradient vector at the optimal log-likelihood value.

Arguments

trainX

A matrix with each column corresponding to one input variable.

trainY

A vector with each element corresponding to the output at the corresponding row of trainX.

trainT

A vector for time indices of the data points. By default, the function assigns natural numbers starting from 1 as the time indices.

fast_computation

A Boolean that specifies whether to do exact inference or fast approximation. Default is TRUE.

limit_memory

An integer or NULL. The integer is used sample training points during prediction to limit the total memory requirement. Setting the value to NULL would result in no sampling, that is, full training data is used for prediction. Default value is 5000.

optim_control

A list parameters passed to the Adam optimizer when fast_computation is set to TRUE. The default values have been tested rigorously and tend to strike a balance between accuracy and speed.

batch_size: Number of training points sampled at each iteration of Adam.
learn_rate: The step size for the Adam optimizer.
max_iter: The maximum number of iterations to be performed by Adam.
tol: Gradient tolerance.
beta1: Decay rate for the first moment of the gradient.
beta2: Decay rate for the second moment of the gradient.
epsilon: A small number to avoid division by zero.
logfile: A string specifying a file name to store hyperparameters value for each iteration.

References

Prakash, A., Tuo, R., & Ding, Y. (2022). "The temporal overfitting problem with applications in wind power curve modeling." Technometrics. tools:::Rd_expr_doi("10.1080/00401706.2022.2069158").

Examples

Run this code


    data = DSWE::data1
    trainindex = 1:100 #using the first 100 data points to train the model
    traindata = data[trainindex,]
    xCol = 2 #input variable columns
    yCol = 7 #response column
    trainX = as.matrix(traindata[,xCol])
    trainY = as.numeric(traindata[,yCol])
    tempGPObject = tempGP(trainX, trainY)