pcat: Reduction for the Levels of a Factor.

Description

The function is trying to merged similar levels of a given factor. Its based on ideas given by Tutz (2013).

Usage

pcat(fac, df = NULL, lambda = NULL, method = c("ML", "GAIC"), start = 0.001, 
         Lp = 0, kappa = 1e-05, iter = 100, c.crit = 1e-04, k = 2)
gamlss.pcat(x, y, w, xeval = NULL, ...)
plotDF(y, factor = NULL, formula = NULL, data, along = seq(0, nlevels(factor)), 
         kappa = 1e-06, Lp = 0, ...)
plotLambda(y, factor = NULL, formula = NULL, data, along = seq(-2, 2, 0.1), 
         kappa = 1e-06, Lp = 0, ...)

Value

The function pcat reruns a vector endowed with a number of attributes. The vector itself is used in the construction of the model matrix, while the attributes are needed for the backfitting algorithms additive.fit(). The backfitting is done in gamlss.pcat.

Arguments

fac, factor: a factor to reduce its levels
df: the effective degrees of freedom df
lambda: the smoothing parameter
method: which method is used for the estimation of the smoothing parameter, "ML" or "GAIC" are allowed.
start: starting value for lambda if it estimated using "ML" or "GAIC"
Lp: The type of penalty required, Lp=0 is the default. Use Lp=1 for lasso type and different values for different required penalty.
kappa: a regulation parameters used for the weights in the penalties.
iter: the number of internal iteration allowed
c.crit: the convergent criterion
k: the penalty if "GAIC" method is used.
x: explanatory factor
y: the response or iterative response variable
w: iterative weights
xeval: indicator whether to predict
formula: A formula
data: A data frame
along: a sequence of values
...: for extra variables

Author

Mikis Stasinopoulos d.stasinopoulos@londonmet.ac.uk, Paul Eilers and Marco Enea

Details

The pcat() is used for the fitting of the factor. The function shrinks the levels of the categorical factor (not towards the overall mean as the function random() is doing) but towards each other. This results to a reduction of the number if levels of the factors. Different norms can be used for the shrinkage by specifying the argument Lp.

References

Tutz G. (2013) Regularization and Sparsity in Discrete Structures in the Proceedings of the 29th International Workshop on Statistical Modelling, Volume 1, p 29-42, Gottingen, Germany

Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.

Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.

(see also https://www.gamlss.com/).

Examples

Run this code

# Simulate data 1
    n <- 10  # number of levels 
    m <- 200 # number of observations  
set.seed(2016)
level <-  as.factor(floor(runif(m) * n) + 1)
  a0  <-  rnorm(n)
sigma <-  0.4
   mu <-  a0[level]
   y <-  mu + sigma * rnorm(m)
plot(y~level)
points(1:10,a0, col="red")
 da1 <- data.frame(y, level)
#------------------
  mn <- gamlss(y~1,data=da1 ) # null model 
  ms <- gamlss(y~level-1, data=da1) # saturated model 
  m1 <- gamlss(y~pcat(level), data=da1) # calculating lambda ML
AIC(mn, ms, m1)
if (FALSE) {
m11 <- gamlss(y~pcat(level, method="GAIC", k=log(200)), data=da1) # GAIC
AIC(mn, ms, m1, m11) 
#gettng the fitted object -----------------------------------------------------
getSmo(m1)
coef(getSmo(m1))
fitted(getSmo(m1))[1:10]
plot(getSmo(m1)) # 
# After the fit a new factor is created  this factor has the reduced levels
 levels(getSmo(m1)$factor)
# -----------------------------------------------------------------------------
}

Run the code above in your browser using DataLab