Returns the matrix H where yhat is approximately equal to H y where yhat is the predicted values for new_data
. If new_data
is unspecified, yhat will be the in-sample fits.
If BART was the same as OLS, H would be an orthogonal projection matrix. Here it is a projection matrix, but clearly non-orthogonal. Unfortunately, I cannot get
this function to work correctly because of three possible reasons (1) BART does not work by averaging tree predictions: it is a sum of trees model where each tree sees the residuals
via backfitting (2) the prediction in each node is a bayesian posterior draw which is close to ybar of the observations contained in the node if noise is gauged to be small and
(3) there are transformations of the original y variable. I believe I got close and I think I'm off by a constant multiple which is a function of the number of trees. I can
use regression to estimate the constant multiple and correct for it. Turn regression_kludge
to TRUE
for this. Note that the weights do not add up to one here.
The intuition is because due to the backfitting there is multiple counting. But I'm not entirely sure.
get_projection_weights(bart_machine, new_data = NULL, regression_kludge = FALSE)
Returns a matrix of proportions with number of rows equal to the number of rows of new_data
and number of columns equal to the number of rows of the original training data, n.
An object of class ``bartMachine''.
Data that you wish to investigate the training sample projection / weights. If NULL
, the original training data is used.
See explanation in the description. Default is FALSE
.