smartpred: Smart Prediction

Description

Data-dependent parameters in formula terms can cause problems in when predicting. The smartpred package for Rand S-PLUS saves data-dependent parameters on the object so that the bug is fixed. The lm and glm functions have been fixed properly. Note that the VGAM package by T. W. Yee automatically comes with smart prediction.

Arguments

Value

Returns the usual object, but with one list/slot component called smart.prediction containing any data-dependent parameters.

Side Effects

The variables .max.smart, .smart.prediction and .smart.prediction.counter are created while the model is being fitted. In Rthey are created in a new environment called smartpredenv. In S-PLUS they are created in frame 1. These variables are deleted after the model has been fitted. However, in R, if there is an error in the model fitting function or the fitting model is killed (e.g., by typing control-C) then these variables will be left in smartpredenv. At the beginning of model fitting, these variables are deleted if present in smartpredenv.

During prediction, the variables .smart.prediction and .smart.prediction.counter are reconstructed and read by the smart functions when the model frame is re-evaluated. After prediction, these variables are deleted.

If the modelling function is used with argument smart=FALSE (e.g., vglm(..., smart=FALSE)) then smart prediction will not be used, and the results should match with the original Ror S-PLUS functions.

WARNING

In S-PLUS, if the "bigdata" library is loaded then it is detach()'ed. This is done because scale cannot be made smart if "bigdata" is loaded (it is loaded by default in the Windows version of Splus 8.0, but not in Linux/Unix). The function search tells what is currently attached.

In Rand S-PLUS the functions predict.bs and predict.ns are not smart. That is because they operate on objects that contain attributes only and do not have list components or slots. In Rthe function predict.poly is not smart.

Details

Rversion 1.6.0 introduced a partial fix for the prediction problem because it does not work all the time, e.g., for terms such as I(poly(x, 3)), poly(c(scale(x)), 3), bs(scale(x), 3), scale(scale(x)). See the examples below. Smart prediction, however, will always work. The basic idea is that the functions in the formula are now smart, and the modelling functions make use of these smart functions. Smart prediction works in two ways: using smart.expression, or using a combination of put.smart and get.smart.

Examples

Run this code

# Create some data first
n = 20
set.seed(86) # For reproducibility of the random numbers
x = sort(runif(n)) 
y = sort(runif(n))
if(is.R()) library(splines)   # To get ns() in R

# This will work for R 1.6.0 and later, but fail for S-PLUS
fit = lm(y ~ ns(x, df=5))
plot(x, y)
lines(x, fitted(fit))
newx = seq(0, 1, len=n)
points(newx, predict(fit, data.frame(x=newx)), type="b", col=2, err=-1)

# The following fails for R 1.6.x and later but works with smart prediction
fit = lm(y ~ ns(scale(x), df=5))
fit$smart.prediction
plot(x, y)
lines(x, fitted(fit))
newx = seq(0, 1, len=n)
points(newx, predict(fit, data.frame(x=newx)), type="b", col=2, err=-1)

# The following requires the VGAM package to be loaded 
library(VGAM)
fit = vlm(y ~ ns(scale(x), df=5))
fit@smart.prediction
plot(x, y)
lines(x, fitted(fit))
newx = seq(0, 1, len=n)
points(newx, predict(fit, data.frame(x=newx)), type="b", col=2, err=-1)

Run the code above in your browser using DataLab