friedman_data: Simulate data from the Friedman model
Description
Simulate draws from a bernoulli distribution over c(-1,1), where the
log-odds is defined according to:
$$log{p(y=1|x)/p(y=-1|x)} = gamma*(1 - x_1 + x_2 - ... + x_6)*(x_1 + x_2 + ... + x_6)$$
and \(x\) is distributed as N(0, I_dxd). See Friedman (2000).
Usage
friedman_data(n = 500, d = 10, gamma = 10)
Arguments
n
Number of points to simulate.
d
The dimension of the predictor variable \(x\).
gamma
A parameter controlling the Bayes error, with higher values of
gamma corresponding to lower error rates.
Value
Returns a list with the following components:
y
Vector of simulated response in c(-1,1).
X
An nxd matrix of simulated predictors.
p
The true conditional probability \(p(y=1|x)\).
References
Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic
regression: a statistical view of boosting (with discussion), Annals of
Statistics 28: 337-307.