shapley.plot

This function applies different criteria to visualize SHAP contributions

This R package introduces Weighted Mean SHapley Additive exPlanations (WMSHAP), an innovative method for calculating SHAP values for a grid of fine-tuned base-learner machine learning models as well as stacked ensembles, a method not previously available due to the common reliance on single best-performing models. By integrating the weighted mean SHAP values from individual base-learners comprising the ensemble or individual base-learners in a tuning grid search, the package weights SHAP contributions according to each model's performance, assessed by multiple either R squared (for both regression and classification models). alternatively, this software also offers weighting SHAP values based on the area under the precision-recall curve (AUCPR), the area under the curve (AUC), and F2 measures for binary classifiers. It further extends this framework to implement weighted confidence intervals for weighted mean SHAP values, offering a more comprehensive and robust feature importance evaluation over a grid of machine learning models, instead of solely computing SHAP values for the best model. This methodology is particularly beneficial for addressing the severe class imbalance (class rarity) problem by providing a transparent, generalized measure of feature importance that mitigates the risk of reporting SHAP values for an overfitted or biased model and maintains robustness under severe class imbalance, where there is no universal criteria of identifying the absolute best model. Furthermore, the package implements hypothesis testing to ascertain the statistical significance of SHAP values for individual features, as well as comparative significance testing of SHAP contributions between features. Additionally, it tackles a critical gap in feature selection literature by presenting criteria for the automatic feature selection of the most important features across a grid of models or stacked ensembles, eliminating the need for arbitrary determination of the number of top features to be extracted. This utility is invaluable for researchers analyzing feature significance, particularly within severely imbalanced outcomes where conventional methods fall short. Moreover, it is also expected to report democratic feature importance across a grid of models, resulting in a more comprehensive and generalizable feature selection. The package further implements a novel method for visualizing SHAP values both at subject level and feature level as well as a plot for feature selection based on the weighted mean SHAP ratios.

E. F. Haghish 

shapley

Weighted Mean SHAP and CI for Robust Feature Assessment in ML
Grid

shapley.plot function

<dl><dt>shapley</dt>
<dd>object of class 'shapley', as returned by the 'shapley' function</dd>
<dt>plot</dt>
<dd>character, specifying the type of the plot, which can be either
'bar', 'waffle', or 'shap'. The default is 'bar'.</dd>
<dt>method</dt>
<dd>Character. The column name in <code>summaryShaps</code> used
for feature selection. Default is <code>"mean"</code>, which
selects important features which have weighted mean shap
ratio (WMSHAP) higher than the specified cutoff. Other
alternative is "lowerCI", which selects features which
their lower bound of confidence interval is higher than
the cutoff.</dd>
<dt>cutoff</dt>
<dd>numeric, specifying the cutoff for the method used for selecting
the top features.</dd>
<dt>top_n_features</dt>
<dd>Integer. If specified, the top n features with the
highest weighted SHAP values will be selected, overrullung
the 'cutoff' and 'method' arguments.</dd>
<dt>features</dt>
<dd>character vector, specifying the feature to be plotted.</dd>
<dt>legendstyle</dt>
<dd>character, specifying the style of the plot legend, which
can be either 'continuous' (default) or 'discrete'. the
continuous legend is only applicable to 'shap' plots and
other plots only use 'discrete' legend.</dd>
<dt>scale_colour_gradient</dt>
<dd>character vector for specifying the color gradients
for the plot.</dd></dl>

Arguments

Author

Plot weighted SHAP contributions — shapley.plot

<dl>

<dt>shapley</dt>
<dd>object of class 'shapley', as returned by the 'shapley' function</dd>


<dt>plot</dt>
<dd>character, specifying the type of the plot, which can be either
'bar', 'waffle', or 'shap'. The default is 'bar'.</dd>


<dt>method</dt>
<dd>Character. The column name in <code>summaryShaps</code> used
for feature selection. Default is <code>"mean"</code>, which
selects important features which have weighted mean shap
ratio (WMSHAP) higher than the specified cutoff. Other
alternative is "lowerCI", which selects features which
their lower bound of confidence interval is higher than
the cutoff.</dd>


<dt>cutoff</dt>
<dd>numeric, specifying the cutoff for the method used for selecting
the top features.</dd>


<dt>top_n_features</dt>
<dd>Integer. If specified, the top n features with the
highest weighted SHAP values will be selected, overrullung
the 'cutoff' and 'method' arguments.</dd>


<dt>features</dt>
<dd>character vector, specifying the feature to be plotted.</dd>


<dt>legendstyle</dt>
<dd>character, specifying the style of the plot legend, which
can be either 'continuous' (default) or 'discrete'. the
continuous legend is only applicable to 'shap' plots and
other plots only use 'discrete' legend.</dd>


<dt>scale_colour_gradient</dt>
<dd>character vector for specifying the color gradients
for the plot.</dd>

</dl>

shapley.plot: Plot weighted SHAP contributions

Description

Usage

Value

Arguments

Author

Examples