Sieve
References:https://arxiv.org/abs/2206.02994
R package, Sieve. Perform nonparametric estimation by the method of sieves (estimation using multivariate orthogonal series). This type of estimators has been actively studied and applied in univariate feature settings, but in multivariate cases it hasn't received its deserved attention.
Installing a package from GitHub can be tricky. But I found 80% of the errors can be solved by restarting RStudio.
The current version can solve regression and classification problems. The algorithm gives the estimated condition mean (regression) and estimated conditional probability functions (classification). I will make it able to handle time-to-event outcomes very soon.
Computationally tractable:
The time and space expense both scale linearly in sample size and the number of basis functions specified by the users. Can directly handle 10k x 100 (sample size x dimension of features) data science problems.
Theoretically guaranteed:
Adaptive to the number of features/predictors truly associated with the outcome. Can achieve the information lower bounds (minimax rate) of estimation in many cases.
What is penalized sieve estimation?
Generating the proper basis functions (something like multivariate Fourier basis), put everything in a LASSO solver (thank you glmnet!). That's it.
(Questions, suggestion, collaboration: shoot me an email: zty@uw.edu, Tianyu Zhang. Department of Biostatistics, University of Washington)