baselearners: Base-learners for Gradient Boosting

Description

Base-learners for fitting base-models in the generic implementation of component-wise gradient boosting in function mboost.

Usage

## linear base-learner
bols(..., by = NULL, index = NULL, intercept = TRUE, df = NULL,
     lambda = 0, contrasts.arg = "contr.treatment")
## smooth P-spline base-learner
bbs(..., by = NULL, index = NULL, knots = 20, boundary.knots = NULL,
    degree = 3, differences = 2, df = 4, lambda = NULL, center = FALSE,
    cyclic = FALSE, constraint = c("none", "increasing", "decreasing"),
    deriv = 0)
## bivariate P-spline base-learner
bspatial(..., df = 6)
## radial basis functions base-learner
brad(..., by = NULL, index = NULL, knots = 100, df = 4, lambda = NULL,
     covFun = fields::stationary.cov,
     args = list(Covariance="Matern", smoothness = 1.5, theta=NULL))
     
## (genetic) pathway-based kernel base-learner
bkernel(..., df = 4, lambda = NULL, kernel = c("lin", "sia", "net"),
        pathway = NULL, knots = NULL, args = list())
## random effects base-learner
brandom(..., by = NULL, index = NULL, df = 4, lambda = NULL,
        contrasts.arg = "contr.dummy")
## tree based base-learner
btree(..., by = NULL, nmax = Inf, tree_controls = partykit::ctree_control(
    teststat = "quad", testtype = "Teststatistic", 
    mincriterion = 0, minsplit = 10, minbucket = 4,
    maxdepth = 1, saveinfo = FALSE))
## constrained effects base-learner
bmono(...,
      constraint = c("increasing", "decreasing", "convex", "concave",
                     "none", "positive", "negative"),
      type = c("quad.prog", "iterative"),
      by = NULL, index = NULL, knots = 20, boundary.knots = NULL,
      degree = 3, differences = 2, df = 4, lambda = NULL,
      lambda2 = 1e6, niter=10, intercept = TRUE,
      contrasts.arg = "contr.treatment",
      boundary.constraints = FALSE,
      cons.arg = list(lambda = 1e+06, n = NULL, diff_order = NULL))
## Markov random field base-learner
bmrf(..., by = NULL, index = NULL, bnd = NULL, df = 4, lambda = NULL,
    center = FALSE)
## user-specified base-learner
buser(X, K = NULL, by = NULL, index = NULL, df = 4, lambda = NULL)
## combining single base-learners to form new,
## more complex base-learners
bl1 %+% bl2
bl1 %X% bl2
bl1 %O% bl2

Value

An object of class blg (base-learner generator) with a

dpp function.

The call of dpp returns an object of class

bl (base-learner) with a fit function. The call to

fit finally returns an object of class bm (base-model).

Arguments

...: one or more predictor variables or one matrix or data frame of predictor variables. For smooth base-learners, the number of predictor variables and the number of columns in the data frame / matrix must be less than or equal to 2. If a matrix (with at least 2 columns) is given to bols or brandom, it is directly used as the design matrix. Especially, no intercept term is added regardless of argument intercept. If the argument has only one column, it is simplified to a vector and an intercept is added or not according to the argmuent intercept.
by: an optional variable defining varying coefficients, either a factor or numeric variable. If by is a factor, the coding is determined by the global options("contrasts") or as specified "locally" for the factor (see contrasts). Per default treatment coding is used. Note that the main effect needs to be specified in a separate base-learner. btree currently only allows binary factors.
index: a vector of integers for expanding the variables in .... For example, bols(x, index = index) is equal to bols(x[index]), where index is an integer of length greater or equal to length(x).
df: trace of the hat matrix for the base-learner defining the base-learner complexity. Low values of df correspond to a large amount of smoothing and thus to "weaker" base-learners. Certain restrictions have to be kept for the specification of df since most of the base-learners rely on penalization approaches with a non-trivial null space. For example, for P-splines fitted with bbs, df has to be larger than the order of differences employed in the construction of the penalty term. However, when option center != FALSE, the effect is centered around its unpenalized part and therefore any positive number is admissible for df. For details on the computation of degrees of freedom see section ‘Global Options’.
lambda: smoothing penalty, computed from df when df is specified. For details on the computation of degrees of freedom see section ‘Global Options’.
knots: either the number of knots or a vector of the positions of the interior knots (for more details see below). For multiple predictor variables, knots may be a named list where the names in the list are the variable names.
boundary.knots: boundary points at which to anchor the B-spline basis (default the range of the data). A vector (of length 2) for the lower and the upper boundary knot can be specified.This is only advised for bbs(..., cyclic = TRUE), where the boundary knots specify the points at which the cyclic function should be joined. In analogy to knots a names list can be specified.
degree: degree of the regression spline.
differences: a non-negative integer, typically 1, 2 or 3. If differences = k, k-th-order differences are used as a penalty (0-th order differences specify a ridge penalty).
intercept: if intercept = TRUE an intercept is added to the design matrix of a linear base-learner. If intercept = FALSE, continuous covariates should be (mean-) centered.
center: if center != FALSE the corresponding effect is re-parameterized such that the unpenalized part of the fit is subtracted and only the deviation effect is fitted. The unpenalized, parametric part has then to be included in separate base-learners using bols (see the examples below). There are two possible ways to re-parameterization; center = "differenceMatrix" is based on the difference matrix (the default for bbs with one covariate only) and center = "spectralDecomp" uses a spectral decomposition of the penalty matrix (see Fahrmeir et al., 2004, Section 2.3 for details). The latter option is the default (and currently only option) for bbs with multiple covariates or bmrf.
cyclic: if cyclic = TRUE the fitted values coincide at the boundaries (useful for cyclic covariates such as day time etc.). For details see Hofner et al. (2016).
covFun: the covariance function (i.e. radial basis) needed to compute the basis functions. Per default stationary.cov function (from package fields) is used.
args: a named list of arguments to be passed to cov.function. Thus strongly dependent on the specified cov.function.
kernel: one of "lin" (linear kernel), "sia" (size adjusted kernel), or "net" (network kernel). For details see calc_kernel
pathway: name of pathway; Pathway needs to be contained in the GWAS data set.
contrasts.arg: a named list of characters suitable for input to the contrasts replacement function, or the contrast matrix itself, see model.matrix, or a single character string (or contrast matrix) which is then used as contrasts for all factors in this base-learner (with the exception of factors in by). See also example below for setting contrasts. Note that a special contrasts.arg exists in package mboost, namely "contr.dummy". This contrast is used per default in brandom and can also be used in bols. It leads to a dummy coding as returned by model.matrix(~ x - 1) were the intercept is implicitly included but each factor level gets a seperate effect estimate (see example below).
nmax: integer, maximal number of bins in the predictor variables. Use Inf to switch-off binning.
tree_controls: an object of class "TreeControl", which can be obtained using ctree_control. Defines hyper-parameters for the trees which are used as base-learners, stumps are fitted by default. By default, stumps and thus additive models are fitted.
constraint: type of constraint to be used. For bmono, the constraint can be either monotonic "increasing" (default), "decreasing", or "convex" or "concave". Additionally, "none" can be used to specify unconstrained P-splines. This is especially of interest in conjunction with boundary.constraints = TRUE. For bbs, the constraint can be "none", monotonic "increasing", or "decreasing". In general it is advisable to use bmono to fit monotonic splines.
type: determines how the constrained least squares problem should be solved. If type = "quad.prog", a numeric quadratic programming method (Goldfarb and Idnani, 1982, 1983) is used (see solve.QP in package quadprog). If type = "iterative", the iterative procedure described in Hofner et al. (2011b) is used. The quadratic programming approach is usually much faster than the iterative approach. For details see Hofner et al. (2016).
lambda2: penalty parameter for the (monotonicity) constraint.
niter: maximum number of iterations used to compute constraint estimates. Increase this number if a warning is displayed.
boundary.constraints: a logical indicating whether additional constraints on the boundaries of the spline should be applied (default: FALSE). This is still experimental.
cons.arg: a named list with additional arguments for boundary constraints. The element lambda specifies the penalty parameter that is used for the additional boundary constraint. The element n specifies the number of knots to be subject to the constraint and can be either a scalar (use same number of constrained knots on each side) or a vector. Per default 10% of the knots on each side are used. The element diff_order can be used to specify the order of the boundary penalty: 1 (constant; default for monotonically constrained effects) or 2 (linear; default for all other effects).
bnd: Object of class bnd, in which the boundaries of a map are defined and from which neighborhood relations can be constructed. See read.bnd. If a boundary object is not available, the neighborhood matrix can also be given directly.
X: design matrix as it should be used in the penalized least squares estimation. Effect modifiers do not need to be included here (by can be used for convenience).
K: penalty matrix as it should be used in the penalized least squares estimation. If NULL (default), unpenalized estimation is used.
deriv: an integer; the derivative of the spline of the given order at the data is computed, defaults to zero. Note that this argument is only used to set up the design matrix and cannot be used in the fitting process.
bl1: a linear base-learner or a list of linear base-learners.
bl2: a linear base-learner or a list of linear base-learners.

Global Options

Three global options affect the base-learners:

options("mboost_useMatrix")

defaulting to TRUE indicates that the base-learner may use sparse matrix techniques for its computations. This reduces the memory consumption but might (for smaller sample sizes) require more computing time.

options("mboost_indexmin")

is an integer that specifies the minimum sample size needed to optimize model fitting by automatically taking ties into account (default = 10000).

options("mboost_dftraceS")

FALSE by default, indicating how the degrees of freedom should be computed. Per default $$\mathrm{df}(\lambda) = \mathrm{trace}(2S - S^{\top}S),$$ with smoother matrix $S = X(X^{\top}X + \lambda K)^{-1} X$ is used (see Hofner et al., 2011a). If TRUE, the trace of the smoother matrix $\mathrm{df}(\lambda) = \mathrm{trace}(S)$ is used as degrees of freedom.

Note that these formulae specify the relation of df and lambda as the smoother matrix $S$ depends only on $\lambda$ (and the (fixed) design matrix $X$, the (fixed) penalty matrix $K$).

Details

bols refers to linear base-learners (potentially estimated with a ridge penalty), while bbs provide penalized regression splines. bspatial fits bivariate surfaces and brandom defines random effects base-learners. In combination with option by, these base-learners can be turned into varying coefficient terms. The linear base-learners are fitted using Ridge Regression where the penalty parameter lambda is either computed from df (default for bbs, bspatial, and brandom) or specified directly (lambda = 0 means no penalization as default for bols).

In bols(x), x may be a numeric vector or factor. Alternatively, x can be a data frame containing numeric or factor variables. In this case, or when multiple predictor variables are specified, e.g., using bols(x1, x2), the model is equivalent to lm(y ~ ., data = x) or lm(y ~ x1 + x2), respectively. By default, an intercept term is added to the corresponding design matrix (which can be omitted using intercept = FALSE). It is strongly advised to (mean-) center continuous covariates, if no intercept is used in bols (see Hofner et al., 2011a). If x is a matrix, it is directly used as the design matrix and no further preprocessing (such as addition of an intercept) is conducted. When df (or lambda) is given, a ridge estimator with df degrees of freedom (see section ‘Global Options’) is used as base-learner. Note that all variables are treated as a group, i.e., they enter the model together if the corresponding base-learner is selected. For ordinal variables, a ridge penalty for the differences of the adjacent categories (Gertheiss and Tutz 2009, Hofner et al. 2011a) is applied.

With bbs, the P-spline approach of Eilers and Marx (1996) is used. P-splines use a squared k-th-order difference penalty which can be interpreted as an approximation of the integrated squared k-th derivative of the spline. In bbs the argument knots specifies either the number of (equidistant) interior knots to be used for the regression spline fit or a vector including the positions of the interior knots. Additionally, boundary.knots can be specified. However, this is only advised if one uses cyclic constraints, where the boundary.knots specify the points where the function is joined (e.g., boundary.knots = c(0, 2 * pi) for angles as in a sine function or boundary.knots = c(0, 24) for hours during the day). For details on cylcic splines in the context of boosting see Hofner et al. (2016).

bspatial implements bivariate tensor product P-splines for the estimation of either spatial effects or interaction surfaces. Note that bspatial(x, y) is equivalent to bbs(x, y, df = 6). For possible arguments and defaults see there. The penalty term is constructed based on bivariate extensions of the univariate penalties in x and y directions, see Kneib, Hothorn and Tutz (2009) for details. Note that the dimensions of the penalty matrix increase (quickly) with the number of knots with strong impact on computational time. Thus, both should not be chosen to large. Different knots for x and y can be specified by a named list.

brandom(x) specifies a random effects base-learner based on a factor variable x that defines the grouping structure of the data set. For each level of x, a separate random intercept is fitted, where the random effects variance is governed by the specification of the degrees of freedom df or lambda (see section ‘Global Options’). Note that brandom(...) is essentially a wrapper to bols(..., df = 4, contrasts.arg = "contr.dummy"), i.e., a wrapper that utilizes ridge-penalized categorical effects. For possible arguments and defaults see bols.

For all linear base-learners the amount of smoothing is determined by the trace of the hat matrix, as indicated by df.

If by is specified as an additional argument, a varying coefficients term is estimated, where by is the interaction variable and the effect modifier is given by either x or x and y (specified via ...). If bbs is used, this corresponds to the classical situation of varying coefficients, where the effect of by varies over the co-domain of x. In case of bspatial as base-learner, the effect of by varies with respect to both x and y, i.e. an interaction surface between x and y is specified as effect modifier. For brandom specification of by leads to the estimation of random slopes for covariate by with grouping structure defined by factor x instead of a simple random intercept. In bbs, bspatial and brandom the computation of the smoothing parameter lambda for given df, or vice versa, might become (numerically) instable if the values of the interaction variable by become too large. In this case, we recommend to rescale the interaction covariate e.g. by dividing by max(abs(by)). If bbs or bspatial is specified with an factor variable by with more than two factors, the degrees of freedom are shared for the complete base-learner (i.e., spread over all factor levels). Note that the null space (see next paragraph) increases, as a separate null space for each factor level is present. Thus, the minimum degrees of freedom increase with increasing number of levels of by (if center = FALSE).

For bbs and bspatial, option center != FALSE requests that the fitted effect is centered around its parametric, unpenalized part (the so called null space). For example, with second order difference penalty, a linear effect of x remains unpenalized by bbs and therefore the degrees of freedom for the base-learner have to be larger than two. To avoid this restriction, option center = TRUE subtracts the unpenalized linear effect from the fit, allowing to specify any positive number as df. Note that in this case the linear effect x should generally be specified as an additional base-learner bols(x). For bspatial and, for example, second order differences, a linear effect of x (bols(x)), a linear effect of y (bols(y)), and their interaction (bols(x*y)) are subtracted from the effect and have to be added separately to the model equation. More details on centering can be found in Kneib, Hothorn and Tutz (2009) and Fahrmeir, Kneib and Lang (2004). We strongly recommend to consult the latter reference before using this option.

brad(x) specifies penalized radial basis functions as used in Kriging. If knots is used to specify the number of knots, the function cover.design is used to specify the location of the knots such that they minimize a geometric space-filling criterion. Furthermore, knots can be specified directly via a matrix. The cov.function allows to specify the radial basis functions. Per default, the flexible Matern correlation function is used. This is specified using cov.function = stationary.cov with Covariance = "Matern" specified via args. If an effective range theta is applicable for the correlation function (e.g., the Matern family) the user can specify this value. Per default (if theta = NULL) the effective range is chosen as $\theta = max(||x_i - x_j||)/c$ such that the correlation function $$\rho(c; \theta = 1) = \varepsilon,$$ where $\varepsilon = 0.001$.

bmrf builds a base of a Markov random field consisting of several regions with a neighborhood structure. The input variable is the observed region. The penalty matrix is either construed from a boundary object or must be given directly via the option bnd. In that case the dimnames of the matrix have to be the region names, on the diagonal the number of neighbors have to be given for each region, and for each neighborhood relation the value in the matrix has to be -1, else 0. With a boundary object at hand, the fitted or predicted values can be directly plotted into the map using drawmap.

bkernel can be used to fit linear (kernel = "lin"), size-adjusted (kernel = "sia") or network (kernel = "net") kernels based on genetic pathways for genome-wide assosiation studies. For details see Friedrichs et al. (2017) and check the associated package kangar00.

buser(X, K) specifies a base-learner with user-specified design matrix X and penalty matrix K, where X and K are used to minimize a (penalized) least squares criterion with quadratic penalty. This can be used to easily specify base-learners that are not implemented (yet). See examples below for details how buser can be used to mimic existing base-learners. Note that for predictions you need to set up the design matrix for the new data manually.

For a categorical covariate with non-observed categories bols(x) and brandom(x) both assign a zero effect to these categories. However, the non-observed categories must be listed in levels(x). Thus, predictions are possible for new observations if they correspond to this category.

By default, all linear base-learners include an intercept term (which can be removed using intercept = FALSE for bols). In this case, the respective covariate should be mean centered (if continuous) and an explicit global intercept term should be added to gamboost via bols (see example below). With bols(x, intercept = FALSE) with categorical covariate x a separate effect for each group (mean effect) is estimated (see examples for resulting design matrices).

Smooth estimates with constraints can be computed using the base-learner bmono() which specifies P-spline base-learners with an additional asymmetric penalty enforcing monotonicity or convexity/concavity (see and Eilers, 2005). For more details in the boosting context and monotonic effects of ordinal factors see Hofner, Mueller and Hothorn (2011b). The quadratic-programming based algorithm is described in Hofner et al. (2016). Alternative monotonicity constraints are implemented via T-splines in bbs() (Beliakov, 2000). In general it is advisable to use bmono to fit monotonic splines as T-splines show undesirable behaviour if the observed data deviates from monotonicty.

Two or more linear base-learners can be joined using %+%. A tensor product of two or more linear base-learners is returned by %X%. When the design matrix can be written as the Kronecker product of two matrices X = kronecker(X2, X1), then bl1 %O% bl2 with design matrices X1 and X2, respectively, can be used to efficiently compute Ridge-estimates following Currie, Durban, Eilers (2006). In all cases the overall degrees of freedom of the combined base-learner increase (additive or multiplicative, respectively). These three features are experimental and for expert use only.

btree fits a stump to one or more variables. Note that blackboost is more efficient for boosting stumps. For further references see Hothorn, Hornik, Zeileis (2006) and Hothorn et al. (2010).

Note that the base-learners bns and bss are deprecated (and no longer available). Please use bbs instead, which results in qualitatively the same models but is computationally much more attractive.

References

Iain D. Currie, Maria Durban, and Paul H. C. Eilers (2006), Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society, Series B--Statistical Methodology, 68(2), 259--280.

Paul H. C. Eilers (2005), Unimodal smoothing. Journal of Chemometrics, 19, 317--328.

Paul H. C. Eilers and Brian D. Marx (1996), Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89-121.

Ludwig Fahrmeir, Thomas Kneib and Stefan Lang (2004), Penalized structured additive regression for space-time data: a Bayesian perspective. Statistica Sinica, 14, 731-761.

Jan Gertheiss and Gerhard Tutz (2009), Penalized regression with ordinal predictors, International Statistical Review, 77(3), 345--365.

D. Goldfarb and A. Idnani (1982), Dual and Primal-Dual Methods for Solving Strictly Convex Quadratic Programs. In J. P. Hennart (ed.), Numerical Analysis, Springer-Verlag, Berlin, pp. 226-239.

D. Goldfarb and A. Idnani (1983), A numerically stable dual method for solving strictly convex quadratic programs. Mathematical Programming, 27, 1--33.

S. Friedrichs, J. Manitz, P. Burger, C.I. Amos, A. Risch, J.C. Chang-Claude, H.E. Wichmann, T. Kneib, H. Bickeboeller, and B. Hofner (2017), Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies. Computational and Mathematical Methods in Medicine. 2017(6742763), 1-17. tools:::Rd_expr_doi("10.1155/2017/6742763").

Benjamin Hofner, Torsten Hothorn, Thomas Kneib, and Matthias Schmid (2011a), A framework for unbiased model selection based on boosting. Journal of Computational and Graphical Statistics, 20, 956--971.

Benjamin Hofner, Joerg Mueller, and Torsten Hothorn (2011b), Monotonicity-Constrained Species Distribution Models. Ecology, 92, 1895--1901.

Benjamin Hofner, Thomas Kneib and Torsten Hothorn (2016), A Unified Framework of Constrained Regression. Statistics & Computing, 26, 1--14.

Thomas Kneib, Torsten Hothorn and Gerhard Tutz (2009), Variable selection and model choice in geoadditive regression models, Biometrics, 65(2), 626--634.

Torsten Hothorn, Kurt Hornik, Achim Zeileis (2006), Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15, 651--674.

Torsten Hothorn, Peter Buehlmann, Thomas Kneib, Matthias Schmid and Benjamin Hofner (2010), Model-based Boosting 2.0, Journal of Machine Learning Research, 11, 2109--2113.

G. M. Beliakov (2000), Shape Preserving Approximation using Least Squares Splines, Approximation Theory and its Applications, 16(4), 80--98.

Examples

Run this code


  set.seed(290875)

  n <- 100
  x1 <- rnorm(n)
  x2 <- rnorm(n) + 0.25 * x1
  x3 <- as.factor(sample(0:1, 100, replace = TRUE))
  x4 <- gl(4, 25)
  y <- 3 * sin(x1) + x2^2 + rnorm(n)
  weights <- drop(rmultinom(1, n, rep.int(1, n) / n))

  ### set up base-learners
  spline1 <- bbs(x1, knots = 20, df = 4)
  extract(spline1, "design")[1:10, 1:10]
  extract(spline1, "penalty")
  knots.x2 <- quantile(x2, c(0.25, 0.5, 0.75))
  spline2 <- bbs(x2, knots = knots.x2, df = 5)
  ols3 <- bols(x3)
  extract(ols3)
  ols4 <- bols(x4)

  ### compute base-models
  drop(ols3$dpp(weights)$fit(y)$model) ## same as:
  coef(lm(y ~ x3, weights = weights))

  drop(ols4$dpp(weights)$fit(y)$model) ## same as:
  coef(lm(y ~ x4, weights = weights))

  ### fit model, component-wise
  mod1 <- mboost_fit(list(spline1, spline2, ols3, ols4), y, weights)

  ### more convenient formula interface
  mod2 <- mboost(y ~ bbs(x1, knots = 20, df = 4) +
                     bbs(x2, knots = knots.x2, df = 5) +
                     bols(x3) + bols(x4), weights = weights)
  all.equal(coef(mod1), coef(mod2))


  ### grouped linear effects
  # center x1 and x2 first
  x1 <- scale(x1, center = TRUE, scale = FALSE)
  x2 <- scale(x2, center = TRUE, scale = FALSE)
  model <- gamboost(y ~ bols(x1, x2, intercept = FALSE) +
                        bols(x1, intercept = FALSE) +
                        bols(x2, intercept = FALSE),
                        control = boost_control(mstop = 50))
  coef(model, which = 1)   # one base-learner for x1 and x2
  coef(model, which = 2:3) # two separate base-learners for x1 and x2
                           # zero because they were (not yet) selected.

  ### example for bspatial
  x1 <- runif(250,-pi,pi)
  x2 <- runif(250,-pi,pi)

  y <- sin(x1) * sin(x2) + rnorm(250, sd = 0.4)

  spline3 <- bspatial(x1, x2, knots = 12)
  Xmat <- extract(spline3, "design")
  ## 12 inner knots + 4 boundary knots = 16 knots per direction
  ## THUS: 16 * 16 = 256 columns
  dim(Xmat)
  extract(spline3, "penalty")[1:10, 1:10]

  ## specify number of knots separately
  form1 <- y ~ bspatial(x1, x2, knots = list(x1 = 12, x2 = 14))

  ## decompose spatial effect into parametric part and
  ## deviation with one df
  form2 <- y ~ bols(x1) + bols(x2) + bols(x1, by = x2, intercept = FALSE) +
               bspatial(x1, x2, knots = 12, center = TRUE, df = 1)

  mod1 <- gamboost(form1)
  if (FALSE) {
  plot(mod1)
  }

  mod2 <- gamboost(form2)
  ## automated plot function:
  if (FALSE) {
  plot(mod2)
  }
  ## plot sum of linear and smooth effects:
  library("lattice")
  df <- expand.grid(x1 = unique(x1), x2 = unique(x2))
  df$pred <- predict(mod2, newdata = df)
  if (FALSE) {
  levelplot(pred ~ x1 * x2, data = df)
  }

  ## specify radial basis function base-learner for spatial effect
  ## and use data-adaptive effective range (theta = NULL, see 'args')
  form3 <- y ~ brad(x1, x2)
  ## Now use different settings, e.g. 50 knots and theta fixed to 0.4
  ## (not really a good setting)
  form4 <- y ~ brad(x1, x2, knots = 50, args = list(theta = 0.4))

  mod3 <- gamboost(form3)
  if (FALSE) {
  plot(mod3)
  }
  dim(extract(mod3, what = "design", which = "brad")[[1]])
  knots <- attr(extract(mod3, what = "design", which = "brad")[[1]], "knots")

  mod4 <- gamboost(form4)
  dim(extract(mod4, what = "design", which = "brad")[[1]])
  if (FALSE) {
  plot(mod4)
  }

  ### random intercept
  id <- factor(rep(1:10, each = 5))
  raneff <- brandom(id)
  extract(raneff, "design")
  extract(raneff, "penalty")

  ## random intercept with non-observed category
  set.seed(1907)
  y <- rnorm(50, mean = rep(rnorm(10), each = 5), sd = 0.1)
  plot(y ~ id)
  # category 10 not observed
  obs <- c(rep(1, 45), rep(0, 5))
  model <- gamboost(y ~ brandom(id), weights = obs)
  coef(model)
  fitted(model)[46:50] # just the grand mean as usual for
                       # random effects models


  ### random slope
  z <- runif(50)
  raneff <- brandom(id, by = z)
  extract(raneff, "design")
  extract(raneff, "penalty")

  ### specify simple interaction model (with main effect)
  n <- 210
  x <- rnorm(n)
  X <- model.matrix(~ x)
  z <- gl(3, n/3)
  Z <- model.matrix(~z)
  beta <- list(c(0,1), c(-3,4), c(2, -4))
  y <- rnorm(length(x), mean = (X * Z[,1]) %*% beta[[1]] +
                               (X * Z[,2]) %*% beta[[2]] +
                               (X * Z[,3]) %*% beta[[3]])
  plot(y ~ x, col = z)
  ## specify main effect and interaction
  mod_glm <- gamboost(y ~ bols(x) + bols(x, by = z),
                  control = boost_control(mstop = 100))
  nd <- data.frame(x, z)
  nd <- nd[order(x),]
  nd$pred_glm <- predict(mod_glm, newdata = nd)
  for (i in seq(along = levels(z)))
      with(nd[nd$z == i,], lines(x, pred_glm, col = z))
  mod_gam <- gamboost(y ~ bbs(x) + bbs(x, by = z, df = 8),
                      control = boost_control(mstop = 100))
  nd$pred_gam <- predict(mod_gam, newdata = nd)
  for (i in seq(along = levels(z)))
      with(nd[nd$z == i,], lines(x, pred_gam, col = z, lty = "dashed"))
  ### convenience function for plotting
  if (FALSE) {
  par(mfrow = c(1,3))
  plot(mod_gam)
  }


  ### remove intercept from base-learner
  ### and add explicit intercept to the model
  tmpdata <- data.frame(x = 1:100, y = rnorm(1:100), int = rep(1, 100))
  mod <- gamboost(y ~ bols(int, intercept = FALSE) +
                      bols(x, intercept = FALSE),
                  data = tmpdata,
                  control = boost_control(mstop = 1000))
  cf <- unlist(coef(mod))
  ## add offset
  cf[1] <- cf[1] + mod$offset
  signif(cf, 3)
  signif(coef(lm(y ~ x, data = tmpdata)), 3)

  ### much quicker and better with (mean-) centering
  tmpdata$x_center <- tmpdata$x - mean(tmpdata$x)
  mod_center <- gamboost(y ~ bols(int, intercept = FALSE) +
                             bols(x_center, intercept = FALSE),
                         data = tmpdata,
                         control = boost_control(mstop = 100))
  cf_center <- unlist(coef(mod_center, which=1:2))
  ## due to the shift in x direction we need to subtract
  ## beta_1 * mean(x) to get the correct intercept
  cf_center[1] <- cf_center[1] + mod_center$offset -
                  cf_center[2] * mean(tmpdata$x)
  signif(cf_center, 3)
  signif(coef(lm(y ~ x, data = tmpdata)), 3)

if (FALSE) ############################################################
## Do not run and check these examples automatically as
## they take some time

  ### large data set with ties
  nunique <- 100
  xindex <- sample(1:nunique, 1000000, replace = TRUE)
  x <- runif(nunique)
  y <- rnorm(length(xindex))
  w <- rep.int(1, length(xindex))

  ### brute force computations
  op <- options()
  options(mboost_indexmin = Inf, mboost_useMatrix = FALSE)
  ## data pre-processing
  b1 <- bbs(x[xindex])$dpp(w)
  ## model fitting
  c1 <- b1$fit(y)$model
  options(op)

  ### automatic search for ties, faster
  b2 <- bbs(x[xindex])$dpp(w)
  c2 <- b2$fit(y)$model

  ### manual specification of ties, even faster
  b3 <- bbs(x, index = xindex)$dpp(w)
  c3 <- b3$fit(y)$model

  all.equal(c1, c2)
  all.equal(c1, c3)

## End(Not run and test)


  ### cyclic P-splines
  set.seed(781)
  x <- runif(200, 0,(2*pi))
  y <- rnorm(200, mean=sin(x), sd=0.2)
  newX <- seq(0,2*pi, length=100)
  ### model without cyclic constraints
  mod <- gamboost(y ~ bbs(x, knots = 20))
  ### model with cyclic constraints
  mod_cyclic <- gamboost(y ~ bbs(x, cyclic=TRUE, knots = 20,
                                 boundary.knots=c(0, 2*pi)))
  par(mfrow = c(1,2))
  plot(x,y, main="bbs (non-cyclic)", cex=0.5)
  lines(newX, sin(newX), lty="dotted")
  lines(newX + 2 * pi, sin(newX), lty="dashed")
  lines(newX, predict(mod, data.frame(x = newX)),
        col="red", lwd = 1.5)
  lines(newX + 2 * pi, predict(mod, data.frame(x = newX)),
        col="blue", lwd=1.5)
  plot(x,y, main="bbs (cyclic)", cex=0.5)
  lines(newX, sin(newX), lty="dotted")
  lines(newX + 2 * pi, sin(newX), lty="dashed")
  lines(newX, predict(mod_cyclic, data.frame(x = newX)),
        col="red", lwd = 1.5)
  lines(newX + 2 * pi, predict(mod_cyclic, data.frame(x = newX)),
        col="blue", lwd = 1.5)

  ### use buser() to mimic p-spline base-learner:
  set.seed(1907)
  x <- rnorm(100)
  y <- rnorm(100, mean = x^2, sd = 0.1)
  mod1 <- gamboost(y ~ bbs(x))
  ## now extract design and penalty matrix
  X <- extract(bbs(x), "design")
  K <- extract(bbs(x), "penalty")
  ## use X and K in buser()
  mod2 <- gamboost(y ~ buser(X, K))
  max(abs(predict(mod1) - predict(mod2)))  # same results

  ### use buser() to mimic penalized ordinal base-learner:
  z <- as.ordered(sample(1:3, 100, replace=TRUE))
  y <- rnorm(100, mean = as.numeric(z), sd = 0.1)
  X <- extract(bols(z))
  K <- extract(bols(z), "penalty")
  index <- extract(bols(z), "index")
  mod1 <- gamboost(y ~  buser(X, K, df = 1, index = index))
  mod2 <- gamboost(y ~  bols(z, df = 1))
  max(abs(predict(mod1) - predict(mod2)))  # same results

  ### kronecker product for matrix-valued responses
  data("volcano", package = "datasets")
  layout(matrix(1:2, ncol = 2))

  ## estimate mean of image treating image as matrix
  image(volcano, main = "data")
  x1 <- 1:nrow(volcano)
  x2 <- 1:ncol(volcano)

  vol <- as.vector(volcano)
  mod <- mboost(vol ~ bbs(x1, df = 3, knots = 10)%O%
                      bbs(x2, df = 3, knots = 10),
                      control = boost_control(nu = 0.25))
  mod[250]

  volf <- matrix(fitted(mod), nrow = nrow(volcano))
  image(volf, main = "fitted")

if (FALSE) ############################################################
## Do not run and check these examples automatically as
## they take some time

  ## the old-fashioned way, a waste of space and time
  x <- expand.grid(x1, x2)
  modx <- mboost(vol ~ bbs(Var2, df = 3, knots = 10) %X%
                       bbs(Var1, df = 3, knots = 10), data = x,
                       control = boost_control(nu = 0.25))
  modx[250]

  max(abs(fitted(mod) - fitted(modx)))

## End(Not run and test)


  ### setting contrasts via contrasts.arg
  x <- as.factor(sample(1:4, 100, replace = TRUE))

  ## compute base-learners with different reference categories
  BL1 <- bols(x, contrasts.arg = contr.treatment(4, base = 1)) # default
  BL2 <- bols(x, contrasts.arg = contr.treatment(4, base = 2))
  ## compute 'sum to zero contrasts' using character string
  BL3 <- bols(x, contrasts.arg = "contr.sum")

  ## extract model matrices to check if it works
  extract(BL1)
  extract(BL2)
  extract(BL3)

  ### setting contrasts using named lists in contrasts.arg
  x2 <- as.factor(sample(1:4, 100, replace = TRUE))

  BL4 <- bols(x, x2,
              contrasts.arg = list(x = contr.treatment(4, base = 2),
                                   x2 = "contr.helmert"))
  extract(BL4)

  ### using special contrast: "contr.dummy":
  BL5 <- bols(x, contrasts.arg = "contr.dummy")
  extract(BL5)

Run the code above in your browser using DataLab