A Gaussian process based power curve model which explicitly models the temporal aspect of the power curve. The model consists of two parts: f(x)
and g(t)
.
tempGP(
trainX,
trainY,
trainT = NULL,
fast_computation = TRUE,
limit_memory = 5000L,
max_thinning_number = 20L,
vecchia = TRUE,
optim_control = list(batch_size = 100L, learn_rate = 0.05, max_iter = 5000L, tol =
1e-06, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08, logfile = NULL)
)
An object of class tempGP
with the following attributes:
trainX - same as the input matrix trainX
.
trainY - same as the input vector trainY
.
thinningNumber - the thinning number computed by the algorithm.
modelF - A list containing the details of the model for predicting function f(x)
:
X - The input variable matrix for computing the cross-covariance for predictions, same as trainX
unless the model is updated. See updateData.tempGP
method for details on updating the model.
y - The response vector, again same as trainY
unless the model is updated.
weightedY - The weighted response, that is, the response left multiplied by the inverse of the covariance matrix.
modelG - A list containing the details of the model for predicting function g(t)
:
residuals - The residuals after subtracting function f(x)
from the response. Used to predict g(t)
. See updateData.tempGP
method for updating the residuals.
time_index - The time indices of the residuals, same as trainT
.
estimatedParams - Estimated hyperparameters for function f(x)
.
llval - log-likelihood value of the hyperparameter optimization for f(x)
.
gradval - gradient vector at the optimal log-likelihood value.
A matrix with each column corresponding to one input variable.
A vector with each element corresponding to the output at the corresponding row of trainX
.
A vector for time indices of the data points. By default, the function assigns natural numbers starting from 1 as the time indices.
A Boolean that specifies whether to do exact inference or fast approximation. Default is TRUE
.
An integer or NULL
. The integer is used sample training points during prediction to limit the total memory requirement. Setting the value to NULL
would result in no sampling, that is, full training data is used for prediction. Default value is 5000
.
An integer specifying the max lag to compute the thinning number. If the PACF does not become insignificant till max_thinning_number
, then max_thinning_number
is used for thinning.
A Boolean that specifies whether to do exact inference or vecchia approximation. Default is TRUE
.
A list parameters passed to the Adam optimizer when fast_computation
is set to TRUE
. The default values have been tested rigorously and tend to strike a balance between accuracy and speed.
batch_size
: Number of training points sampled at each iteration of Adam.
learn_rate
: The step size for the Adam optimizer.
max_iter
: The maximum number of iterations to be performed by Adam.
tol
: Gradient tolerance.
beta1
: Decay rate for the first moment of the gradient.
beta2
: Decay rate for the second moment of the gradient.
epsilon
: A small number to avoid division by zero.
logfile
: A string specifying a file name to store hyperparameters value for each iteration.
Prakash, A., Tuo, R., & Ding, Y. (2022). "The temporal overfitting problem with applications in wind power curve modeling." Technometrics. tools:::Rd_expr_doi("10.1080/00401706.2022.2069158").
Katzfuss, M., & Guinness, J. (2021). "A General Framework for Vecchia Approximations of Gaussian Processes." Statistical Science. tools:::Rd_expr_doi("10.1214/19-STS755").
Guinness, J. (2018). "Permutation and Grouping Methods for Sharpening Gaussian Process Approximations." Technometrics. tools:::Rd_expr_doi("10.1080/00401706.2018.1437476").
predict.tempGP
for computing predictions and updateData.tempGP
for updating data in a tempGP object.
data = DSWE::data1
trainindex = 1:50 #using the first 50 data points to train the model
traindata = data[trainindex,]
xCol = 2 #input variable columns
yCol = 7 #response column
trainX = as.matrix(traindata[,xCol])
trainY = as.numeric(traindata[,yCol])
tempGPObject = tempGP(trainX, trainY)
Run the code above in your browser using DataLab