Newdata: Create a new data.frame for predict

Description

Generate a new data.frame or matrix from another with column(s) selected by x adopting n values in range(data[,x]) and all other columns constant.

If canbeNumeric(x) is TRUE, the output with has x adopting n values in the range(x) and all other numeric variables at their median and other variables at their most common values.

If canbeNumeric(x) is FALSE, the output with has x adopting all possible values of x with all other variables at the same constant values as when canbeNumeric(x) is TRUE (and n is ignored). If x has a levels attribute, the possible values are defined by that levels attribute. Otherwise, it is defined by unique(x).

This is designed to create a new data.frame to be used as newdata for predict.

Usage

Newdata(data, x, n, na.rm=TRUE)

Arguments

data

a data.frame or matrix.

name of a column of data. If NA or NULL, select all columns of data.

an integer vector indicating the number of levels of data[, x] if canbeNumeric(datat[, x]). If canbeNumeric(datat[, x]) is FALSE, take at most n of the most popular levels.

Default is 2 if length(x) > 1 or if x is either NA or NULL.

If n = 1, use the median for canbeNumeric and the most popular level otherwise.

If n < 1, drop that variable.

na.rm

logical passed to range(x)

Value

A data.frame with n rows and columns matching those of data, as described above.

Details

1. Check data, x.

2. If canbeNumeric(x) is TRUE, let xNew be n values spanning range(x). Else, let xNew <- levels(x).

3. If is.null(xNew), set it to sort(unique(x)).

4. let newDat <- data[rep(1, n), ], and replace x by xNew.

5. otherVars <- colnames(data) != x

6. for(x2 in otherVars)replace newDat[, x2]: If canbeNumeric(x2) is TRUE, use median(x2). Otherwise, use its (first) most common value.

Examples

Run this code

# NOT RUN {
##
## 1.  A reasonable test with numerics, dates, 
##     an ordered factor and character variables
##
xDate <- as.Date('2001-02-03')+1:4
tstDF <- data.frame(x1=1:4, xDate=xDate, 
  xD2=as.POSIXct(xDate), 
  sex=ordered(c('M', 'F', 'M', 'F')), 
  huh=letters[c(1:3, 3)], stringsAsFactors=FALSE)

newDat <- Newdata(tstDF, 'xDate', n=5)

# check
newD <- data.frame(x1=2.5, 
  xDate=xDate[1]+seq(0, 3, length=5), 
  xD2=as.POSIXct(xDate[2]+0.5), 
  sex=ordered(c('M', 'F', 'M', 'F'))[2], 
  huh=letters[3], stringsAsFactors=FALSE)
attr(newD, 'out.attrs') <- attr(newDat, 'out.attrs')
# }
# NOT RUN {
all.equal(newDat, newD)
# }
# NOT RUN {
##
## 2.  Test with only one column 
##
newDat1 <- Newdata(tstDF[, 2, drop=FALSE], 'xDate', n=5)

# check 
newDat1. <- newD[, 2, drop=FALSE]
attr(newDat1., 'out.attrs') <- attr(newDat1, 'out.attrs')
# }
# NOT RUN {
all.equal(newDat1, newDat1.)
# }
# NOT RUN {
##
## 3.  Test with a factor 
##
newSex <- Newdata(tstDF, 'sex')

# check 
newS <- with(tstDF, data.frame(
  x1=2.5, xDate=xDate[1]+1.5, 
  xD2=as.POSIXct(xDate[1]+1.5), 
  sex=ordered(c('M', 'F'))[2:1], 
  huh=letters[3], stringsAsFactors=FALSE) )
attr(newS, 'out.attrs') <- attr(newSex, 'out.attrs')
# }
# NOT RUN {
all.equal(newSex, newS)
# }
# NOT RUN {
##
## 4.  Test with an integer column number 
##
newDat2 <- Newdata(tstDF, 2, n=5)

# check 
# }
# NOT RUN {
all.equal(newDat2, newD)
# }
# NOT RUN {
##
## 5.  Test with all
##
NewAll <- Newdata(tstDF)

# check 
tstLvls <- as.list(tstDF[c(1, 4), ])
tstLvls$sex <- tstDF$sex[2:1]
tstLvls$huh <- letters[c(3, 1)]
tstLvls$stringsAsFactors <- FALSE

NewA. <- do.call(expand.grid, tstLvls)
attr(NewA., 'out.attrs') <- attr(NewAll, 'out.attrs')
# }
# NOT RUN {
all.equal(NewAll, NewA.)
# }