Learn R Programming

robCompositions (version 2.4.1)

smoothSplines: Estimate density from histogram

Description

Given raw (discretized) distributional observations, smoothSplines computes the density function that 'best' fits data, in a trade-off between smooth and least squares approximation, using B-spline basis functions.

Usage

smoothSplines(
  k,
  l,
  alpha,
  data,
  xcp,
  knots,
  weights = matrix(1, dim(data)[1], dim(data)[2]),
  num_points = 100,
  prior = "default",
  cores = 1,
  fast = 0
)

Value

An object of class smoothSpl, containing among the other the following variables:

bspline

each row is the vector of B-spline coefficients

Y

the values of the smoothed curve, for the grid given

Y_clr

the values of the smoothed curve, in the clr setting, for the grid given

Arguments

k

smoothing splines degree

l

order of derivative in the penalization term

alpha

weight for penalization

data

an object of class "matrix" containing data to be smoothed, row by row

xcp

vector of control points

knots

either vector of knots for the splines or a integer for the number of equispaced knots

weights

matrix of weights. If not given, all data points will be weighted the same.

num_points

number of points of the grid where to evaluate the density estimated

prior

prior used for zero-replacements. This must be one of "perks", "jeffreys", "bayes_laplace", "sq" or "default"

cores

number of cores for parallel execution, if the option was enabled before installing the package

fast

1 if maximal performance is required (print statements suppressed), 0 otherwise

Author

Alessia Di Blasi, Federico Pavone, Gianluca Zeni, Matthias Templ

Details

The original discretized densities are not directly smoothed, but instead the centred logratio transformation is first applied, to deal with the unit integral constraint related to density functions.
Then the constrained variational problem is set. This minimization problem for the optimal density is a compromise between staying close to the given data, at the corresponding xcp, and obtaining a smooth function. The non-smoothness measure takes into account the lth derivative, while the fidelity term is weigthed by alpha.
The solution is a natural spline. The vector of its coefficients is obtained by the minimum norm solution of a linear system. The resulting splines can be either back-transformed to the original Bayes space of density functions (in order to provide their smoothed counterparts for vizualization and interpretation purposes), or retained for further statistical analysis in the clr space.

References

J. Machalova, K. Hron & G.S. Monti (2016): Preprocessing of centred logratio transformed density functions using smoothing splines. Journal of Applied Statistics, 43:8, 1419-1435.

Examples

Run this code
SepalLengthCm <- iris$Sepal.Length
Species <- iris$Species

iris1 <- SepalLengthCm[iris$Species==levels(iris$Species)[1]]
h1 <- hist(iris1, nclass = 12, plot = FALSE)

midx1 <- h1$mids
midy1 <- matrix(h1$density, nrow=1, ncol = length(h1$density), byrow=TRUE)
knots <- 7
if (FALSE) {
sol1 <- smoothSplines(k=3,l=2,alpha=1000,midy1,midx1,knots)
plot(sol1)

h1 <- hist(iris1, freq = FALSE, nclass = 12, xlab = "Sepal Length     [cm]", main = "Iris setosa")
# black line: kernel method; red line: smoothSplines result
lines(density(iris1), col = "black", lwd = 1.5)
xx1 <- seq(sol1$Xcp[1],tail(sol1$Xcp,n=1),length.out = sol1$NumPoints)
lines(xx1,sol1$Y[1,], col = 'red', lwd = 2)
}

Run the code above in your browser using DataLab