Fair regression model based on nonconvex optimization from Komiyama et al. (2018).
nclm(response, predictors, sensitive, unfairness, covfun, lambda = 0,
save.auxiliary = FALSE)
nclm()
returns an object of class c("nclm", "fair.model")
.
a numeric vector, the response variable.
a numeric matrix or a data frame containing numeric and factor columns; the predictors.
a numeric matrix or a data frame containing numeric and factor columns; the sensitive attributes.
a positive number in [0, 1], how unfair is the model allowed
to be. A value of 0
means the model is completely fair, while a value
of 1
means the model is not constrained to be fair at all.
a function computing covariance matrices. It defaults to the
cov()
function from the stats package.
a non-negative number, a ridge-regression penalty coefficient. It defaults to zero.
a logical value, whether to save the fitted values and
the residuals of the auxiliary model that constructs the decorrelated
predictors. The default value is FALSE
.
Marco Scutari
nclm()
defines fairness as statistical parity. The model bounds the
proportion of the variance that is explained by the sensitive attributes over
the total explained variance.
The algorithm proposed by Komiyama et al. (2018) works like this:
regresses the predictors against the sensitive attributes;
constructs a new set of predictors that are decorrelated from the sensitive attributes using the residuals of this regression;
regresses the response against the decorrelated predictors and the sensitive attributes, while
bounding the proportion of variance the sensitive attributes can explain with respect to the overall explained variance of the model.
Both sensitive
and predictors
are standardized internally before
estimating the regression coefficients, which are then rescaled back to match
the original scales of the variables. response
is only standardized if
it has a variance smaller than 1
, as that seems to improve the
stability of the solutions provided by the optimizer (as far as the data
included in fairml are concerned).
The covfun
argument makes it possible to specify a custom function to
compute the covariance matrices used in the constrained optimization. Some
examples are the kernel estimators described in Komiyama et al. (2018) and
the shrinkage estimators in the corpcor package.
Komiyama J, Takeda A, Honda J, Shimao H (2018). "Nonconvex Optimization for
Regression with Fairness Constraints". Proceedints of the 35th International
Conference on Machine Learning (ICML), PMLR 80:2737--2746.
http://proceedings.mlr.press/v80/komiyama18a/komiyama18a.pdf
frrm, zlm