gendata: Generate Data Frame with Predictor Combinations

Description

If nobs is not specified, allows user to specify predictor settings by e.g. age=50, sex="male", and any omitted predictors are set to reference values (default=median for continuous variables, first level for categorical ones - see datadist). If any predictor has more than one value given, expand.grid is called to generate all possible combinations of values, unless expand=FALSE. If nobs is given, a data frame is first generated which has nobs of adjust-to values duplicated. Then an editor window is opened which allows the user to subset the variable names down to ones which she intends to vary (this streamlines the data.ed step). Then, if any predictors kept are discrete and viewvals=TRUE, a window (using page) is opened defining the possible values of this subset, to facilitate data editing. Then the data.ed function is invoked to allow interactive overriding of predictor settings in the nobs rows. The subset of variables are combined with the other predictors which were not displayed with data.ed, and a final full data frame is returned. gendata is most useful for creating a newdata data frame to pass to predict.

Usage

gendata(fit, ..., nobs, viewvals=FALSE, expand=TRUE, factors)

Value

a data frame with all predictors, and an attribute names.subset if nobs is specified. This attribute contains the vector of variable names for predictors which were passed to de and hence were allowed to vary. If neither nobs nor any predictor settings were given, returns a data frame with adjust-to values.

Arguments

fit: a fit object created with rms in effect
...: predictor settings, if nobs is not given.
nobs: number of observations to create if doing it interactively using X-windows
viewvals: if nobs is given, set viewvals=TRUE to open a window displaying the possible value of categorical predictors
expand: set to FALSE to prevent expand.grid from being called, and to instead just convert to a data frame.
factors: a list containing predictor settings with their names. This is an alternative to specifying the variables separately in .... Unlike the usage of ..., variables getting default ranges in factors should have NA as their value.

Side Effects

optionally writes to the terminal, opens X-windows, and generates a temporary file using sink.

Author

Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com

Details

if you have a variable in ... that is named n, no, nob, nob, add nobs=FALSE to the invocation to prevent that variable from being misrecognized as nobs

Examples

Run this code

set.seed(1)
age <- rnorm(200, 50, 10)
sex <- factor(sample(c('female','male'),200,TRUE))
race <- factor(sample(c('a','b','c','d'),200,TRUE))
y <- sample(0:1, 200, TRUE)
dd <- datadist(age,sex,race)
options(datadist="dd")
f <- lrm(y ~ age*sex + race)
gendata(f)
gendata(f, age=50)
d <- gendata(f, age=50, sex="female")  # leave race=reference category
d <- gendata(f, age=c(50,60), race=c("b","a"))  # 4 obs.
d$Predicted <- predict(f, d, type="fitted")
d      # Predicted column prints at the far right
options(datadist=NULL)
if (FALSE) {
d <- gendata(f, nobs=5, view=TRUE)        # 5 interactively defined obs.
d[,attr(d,"names.subset")]             # print variables which varied
predict(f, d)
}

Run the code above in your browser using DataLab