Learn R Programming

partykit (version 1.2-22)

mob_control: Control Parameters for Model-Based Partitioning

Description

Various parameters that control aspects the fitting algorithm for recursively partitioned mob models.

Usage

mob_control(alpha = 0.05, bonferroni = TRUE, minsize = NULL, maxdepth = Inf,
  mtry = Inf, trim = 0.1, breakties = FALSE, parm = NULL, dfsplit = TRUE, prune = NULL,
  restart = TRUE, verbose = FALSE, caseweights = TRUE, ytype = "vector", xtype = "matrix",
  terminal = "object", inner = terminal, model = TRUE, numsplit = "left",
  catsplit = "binary", vcov = "opg", ordinal = "chisq", nrep = 10000,
  minsplit = minsize, minbucket = minsize, applyfun = NULL, cores = NULL)

Value

A list of class mob_control containing the control parameters.

Arguments

alpha

numeric significance level. A node is splitted when the (possibly Bonferroni-corrected) \(p\) value for any parameter stability test in that node falls below alpha (and the stopping criteria minsize and maxdepth are not fulfilled).

bonferroni

logical. Should \(p\) values be Bonferroni corrected?

minsize, minsplit, minbucket

integer. The minimum number of observations in a node. If NULL, the default is to use 10 times the number of parameters to be estimated (divided by the number of responses per observation if that is greater than 1). minsize is the recommended name and minsplit/minbucket are only included for backward compatibility with previous versions of mob and compatibility with ctree, respectively.

maxdepth

integer. The maximum depth of the tree.

mtry

integer. The number of partitioning variables randomly sampled as candidates in each node for forest-style algorithms. If mtry is greater than the number of partitioning variables, no random selection is performed. (Thus, by default all available partitioning variables are considered.)

trim

numeric. This specifies the trimming in the parameter instability test for the numerical variables. If smaller than 1, it is interpreted as the fraction relative to the current node size.

breakties

logical. Should ties in numeric variables be broken randomly for computing the associated parameter instability test?

parm

numeric or character. Number or name of model parameters included in the parameter instability tests (by default all parameters are included).

dfsplit

logical or numeric. as.integer(dfsplit) is the degrees of freedom per selected split employed when computing information criteria etc.

prune

character, numeric, or function for specifying post-pruning rule. If prune is NULL (the default), no post-pruning is performed. For likelihood-based mob() trees, prune can be set to "AIC" or "BIC" for post-pruning based on the corresponding information criteria. More general rules (also in scenarios that are not likelihood-based), can be specified by function arguments to prune, for details see below.

restart

logical. When determining the optimal split point in a numerical variable: Should model estimation be restarted with NULL starting values for each split? The default is TRUE. If FALSE, then the parameter estimates from the previous split point are used as starting values for the next split point (because in practice the difference are often not huge). (Note that in that case a for loop is used instead of the applyfun for fitting models across sample splits.)

verbose

logical. Should information about the fitting process of mob (such as test statistics, \(p\) values, selected splitting variables and split points) be printed to the screen?

caseweights

logical. Should weights be interpreted as case weights? If TRUE, the number of observations is sum(weights), otherwise it is sum(weights > 0).

ytype, xtype

character. Specification of how mob should preprocess y and x variables. Possible choice are: "vector" (for y only), i.e., only one variable; "matrix", i.e., the model matrix of all variables; "data.frame", i.e., a data frame of all variables.

terminal, inner

character. Specification of which additional information ("estfun", "object", or both) should be stored in each node. If NULL, no additional information is stored.

model

logical. Should the full model frame be stored in the resulting object?

numsplit

character indicating how splits for numeric variables should be justified. Because any splitpoint in the interval between the last observation from the left child segment and the first observation from the right child segment leads to the same observed split, two options are available in mob_control: Either, the split is "left"-justified (the default for backward compatibility) or "center"-justified using the midpoint of the possible interval.

catsplit

character indicating how (unordered) categorical variables should be splitted. By default the best "binary" split is searched (by minimizing the objective function). Alternatively, if set to "multiway", the node is simply splitted into all levels of the categorical variable.

vcov

character indicating which type of covariance matrix estimator should be employed in the parameter instability tests. The default is the outer product of gradients ("opg"). Alternatively, vcov = "info" employs the information matrix and vcov = "sandwich" the sandwich matrix (both of which are only sensible for maximum likelihood estimation).

ordinal

character indicating which type of parameter instability test should be employed for ordinal partitioning variables (i.e., ordered factors). This can be "chisq", "max", or "L2". If "chisq" then the variable is treated as unordered and a chi-squared test is performed. If "L2", then a maxLM-type test as for numeric variables is carried out but correcting for ties. This requires simulation of p-values via catL2BB and requires some computation time. For "max" a weighted double maximum test is used that computes p-values via pmvnorm.

nrep

numeric. Number of replications in the simulation of p-values for the ordinal "L2" statistic (if used).

applyfun

an optional lapply-style function with arguments function(X, FUN, ...). It is used for refitting the model across potential sample splits. The default is to use the basic lapply function unless the cores argument is specified (see below).

cores

numeric. If set to an integer the applyfun is set to mclapply with the desired number of cores.

Details

See mob for more details and references.

For post-pruning, prune can be set to a function(objfun, df, nobs) which either returns TRUE to signal that a current node can be pruned or FALSE. All supplied arguments are of length two: objfun is the sum of objective function values in the current node and its child nodes, respectively. df is the degrees of freedom in the current node and its child nodes, respectively. nobs is vector with the number of observations in the current node and the total number of observations in the dataset, respectively.

If the objective function employed in the mob() call is the negative log-likelihood, then a suitable function is set up on the fly by comparing (2 * objfun + penalty * df) in the current and the daughter nodes. The penalty can then be set via a numeric or character value for prune: AIC is used if prune = "AIC" or prune = 2 and BIC if prune = "BIC" or prune = log(n).

See Also

mob