ndrlm: Genearlized Network-based Dimensionality Reduction and Regression (GNDR)

Description

The main function of Generalized Network-based Dimensionality Reduction and Regression (GNDR) for supervised learning.

Usage

ndrlm(Y,X,latents="in",dircon=FALSE,optimize=TRUE,
                target="adj.r.square",rel_weight=FALSE,
                cor_method=1,
                cor_type=1,min_comm=2,Gamma=1,
                null_model_type=4,mod_mode=1,use_rotation=FALSE,
                rotation="oblimin",pareto=FALSE,fit_weights=NULL,
                lower.bounds.x = c(rep(-100,ncol(X))),
                upper.bounds.x = c(rep(100,ncol(X))),
                lower.bounds.latentx = c(0,0,0,0),
                upper.bounds.latentx = c(0.6,0.6,0.6,0.3),
                lower.bounds.y = c(rep(-100,ncol(Y))),
                upper.bounds.y = c(rep(100,ncol(Y))),
                lower.bounds.latenty = c(0,0,0,0),
                upper.bounds.latenty = c(0.6,0.6,0.6,0.3),
                popsize = 20, generations = 30, cprob = 0.7, cdist = 5,
                mprob = 0.2, mdist=10, seed=NULL)

Value

fval: Objective function for fitting
target: Target performance measures. The possible target measure are "adj.r.square" = adjusted R square (default), "r.sqauare" = R square, "MAE" = mean absolute error, "MAPE" = mean absolute percentage error, "MASE" = mean absolute scaled error ,"MSE"= mean square error,"RMSE" = root mean square error
hyperparams: optimized hyperparameters
pareto: in the case of multiple objectives TRUE provides pareto-optimal solution, while FALSE (default) provides weighted mean of objective functions (see out_weights)
Y: A numeric data frame of output variables
X: A numeric data frame of input variables
latents: Latent model: "in", "out", "both", "none"
NDAin: GNDA object, which is the result of model reduction and features selection in the case of employing latent-independent variables
NDAin_weight: Weights of input variables (used in ndr)
NDAin_min_evalue: Optimized minimal eigenvector centrality value (used in ndr)
NDAin_min_communality: Optimized minimal communality value of indicators (used in ndr)
NDAin_com_communalities: Optimized minimal common communalities (used in ndr)
NDAin_min_R: Optimized minimal square correlation between indicators (used in ndr)
NDAout: GNDA object, which is the result of model reduction and features selection in the case of employing latent-dependent variables
NDAout_weight: Weights of input variables (used in ndr)
NDAout_min_evalue: Optimized minimal eigenvector centrality value (used in ndr)
NDAout_min_communality: Optimized minimal communality value of indicators (used in ndr)
NDAout_com_communalities: Optimized minimal common communalities (used in ndr)
NDAout_min_R: Optimized minimal square correlation between indicators (used in ndr)
fits: List of linear regrassion models
otimized: Wheter fittings are optimized or not
NSGA: Outpot structure of NSGA-II optimization (list), if the optimization value is true (see in mco::nsga2)
extra_vars.X: Logic variable. If direct connection (dircon=TRUE) is allowed not only the latent but the excluded input variables are analyized in the linear models as extra input variables.
extra_vars.Y: Logic variable. If direct connection (dircon=TRUE) is allowed not only the latent but the excluded output variables are analyized in the linear models as extra input variables.
dircon_X: The list of input variables which are directly connected to output variables.
dircon_Y: The list of output variables which are directly connected to output variables.
seed: applied seed value (default=NULL, no seed)
fn: Function (regression) name: NDRLM
Call: Callback function

Arguments

Y: A numeric data frame of output variables
X: A numeric data frame of input variables
latents: The employs of latent variables: "in" employs latent-independent variables (default); "out" employs latent-dependent variables; "both" employs both latent-dependent and latent independent variables; "none" do not employs latent variable (= multiple regression)
dircon: Wether enable or disable direct connection between input and output variables (default=FALSE)
optimize: Optimization of fittings (default=TRUE)
target: Target performance measures. The possible target measure are "adj.r.square" = adjusted R square (default), "r.sqauare" = R square, "MAE" = mean absolute error, "MAPE" = mean absolute percentage error, "MASE" = mean absolute scaled error ,"MSE"= mean square error,"RMSE" = root mean square error
rel_weight: Use relative weights. In this case, all weights should be non-negative. (default=FALSE)
cor_method: Correlation method (optional). '1' Pearson's correlation (default), '2' Spearman's correlation, '3' Kendall's correlation, '4' Distance correlation
cor_type: Correlation type (optional). '1' Bivariate correlation (default), '2' partial correlation, '3' semi-partial correlation
min_comm: Minimal number of indicators per community (default: 2).
Gamma: Gamma parameter in multiresolution null modell (default: 1).
null_model_type: '1' Differential Newmann-Grivan's null model, '2' The null model is the mean of square correlations between indicators, '3' The null model is the specified minimal square correlation, '4' Newmann-Grivan's modell (default)
mod_mode: Community-based modularity calculation mode: '1' Louvain modularity (default), '2' Fast-greedy modularity, '3' Leading Eigen modularity, '4' Infomap modularity, '5' Walktrap modularity, '6' Leiden modularity
use_rotation: FALSE no rotation (default), TRUE the rotation is used.
rotation: "none", "varimax", "quartimax", "promax", "oblimin", "simplimax", and "cluster" are possible rotations/transformations of the solution. "oblimin" is the default, if use_rotation is TRUE.
pareto: in the case of multiple objectives TRUE (default value) provides pareto-optimal solution, while FALSE provides weighted mean of objective functions (see out_weights)
fit_weights: weights of fitting the output variables (weights of means of objectives)
lower.bounds.x: Lower bounds of weights of independent variables in GNDA
upper.bounds.x: Upper bounds of weights of independent variables in GNDA
lower.bounds.latentx: Lower bounds of hyper-parementers of GNDA for independent variables (values must be positive)
upper.bounds.latentx: Upper bounds of hyper-parementers of GNDA for independent variables (value must be lower than one)
lower.bounds.y: Lower bounds of weights of dependent variables in GNDA
upper.bounds.y: Upper bounds of weights of dependent variables in GNDA
lower.bounds.latenty: Lower bounds of hyper-parementers of GNDA for dependent variables (values must be positive)
upper.bounds.latenty: Upper bounds of hyper-parementers of GNDA for dependent variables (value must be lower than one)
popsize: size of population of NSGA-II for fitting betas (default=20)
generations: number of generations to breed of NSGA-II for fitting betas (default=30)
cprob: crossover probability of NSGA-II for fitting betas (default=0.7)
cdist: crossover distribution index of NSGA-II for fitting betas (default=5)
mprob: mutation probability of NSGA-II for fitting betas (default=0.2)
mdist: mutation distribution index of NSGA-II for fitting betas (default=10)
seed: default seed value (default=NULL, no seed)

Author

Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona

e-mail*: kosztyan.zsolt@gtk.uni-pannon.hu

Details

NDRLM is a variable fitting with feature selection based on the tunes of GNDA method with NSGA-II algorithm for parameter fittings.

References

Kosztyan, Z. T., Kurbucz, M. T., & Katona, A. I. (2022). Network-based dimensionality reduction of high-dimensional, low-sample-size datasets. Knowledge-Based Systems, 109180. doi:10.1016/j.knosys.2022.109180

Examples

Run this code


# Using NDRLM without fitting optimization
X<-freeny.x
Y<-freeny.y
NDRLM<-ndrlm(Y,X,optimize=FALSE)
summary(NDRLM)
plot(NDRLM)

if (FALSE) {
# Using NDRLM with optimized fitting

NDRLM<-ndrlm(Y,X)
summary(NDRLM)

# Using Leiden's modularity for grouping variables

X<-freeny.x
Y<-freeny.y
NDRLM<-ndrlm(Y,X,mod_mode=6)
plot(NDRLM)

# Using relative weights

NDRLM<-ndrlm(Y,X,mod_mode=6,rel_weight=TRUE)
plot(NDRLM)

# Using Spearman's correlation

NDRLM<-ndrlm(Y,X,cor_method=2)
summary(NDRLM)

# Using greater population and generations

NDRLM<-ndrlm(Y,X,popsize=52,generations=40)
summary(NDRLM)

# No latent variables
NDRLM<-ndrlm(Y,X,latents="none")
plot(NDRLM)

# In-out model
library(lavaan)
df<-PoliticalDemocracy # Data of Political Democracy

dem<-PoliticalDemocracy[,c(1:8)]
ind60<-PoliticalDemocracy[,-c(1:8)]

NBSEM<-ndrlm(dem,ind60,latents = "both",seed = 2)
plot(NBSEM)
}

Run the code above in your browser using DataLab