Learn R Programming

HistDAWass

(Histogram-valued Data analysis using Wasserstein

metric)

In this document we describe the main features of the HistDAWass package. The name is the acronym for Histogram-valued Data analysis using Wasserstein metric. The implemented classes and functions are related to the analysis of data tables containing histograms in each cell instead of the classical numeric values.

In this document we describe the main features of the HistDAWass package. The name is the acronym for Histogram-valued Data analysis using Wasserstein metric. The implemented classes and functions are related to the anlysis of data tables containing histograms in each cell instead of the classical numeric values.

What is the L2 Wasserstein metric?

given two probability density functions f and g, each one has a cumulative distribution function F and G and thei respectively quantile functions (the inverse of a cumulative distribution function) Qf and Qg. The L2 Wasserstein distance is

The implemented classes are those described in the following table

Classwrapper function for initializingDescription
distributionHdistributionH(x,p)A class describing a histogram distibution
MatHMatH(x, nrows, ncols,rownames,varnames, by.row )A class describing a matrix of distributions
TdistributionHTdistributionH()A class derived from distributionH equipped with a timestamp or a time window
HTSHTS()A class describing a Histgram-valued time series
library(HistDAWass)
mydist=distributionH(x=c(0,1,2),p=c(0,0.3,1))

From raw data to histograms

data2hist functions

Basic statistics for a distributionH (A histogram)

  • mean

    • the mean of a histogram
  • standard deviation

    • the standard deviation of a histogram
  • skewness

    • the third standardized moment of a histogram
  • kurthosis

    • the fourth standardized momemt of a histogram

Basic statistics for a MatH (A matrix of histogrm-valued data)

  • The average hisogram of a column

    • It is an average histogram that minimizes the sum of squared

    Wasserstein distances.

  • The standard deviation of a variable

    • It is a number that measures the dispersion of a set of

    histograms.

  • The covarince matrix of a MatH

    • It is a matrix that measures the covariances into a set of

    hitogram variables.

  • The correlation matrix of a MatH

    • It is a matrix that measures the correlation into a set of

    hitogram variables.

Visualization > plot of a distributionH

plot of a MatH

plot of a HTS

Data Analysis methods

Clustering

  • Kmeans

  • Adaptive distance based Kmeans

  • Fuzzy cmeans

  • Fuzzy cmeans based on adaptive Wasserstein distances

  • Kohonen batch self organizing maps

  • Kohonen batch self organizing maps with Wasserstein adaptive distances

  • Hierarchical clustering

Dimension reduction techniques

  • Principal components analysis of a single histogram variable

  • Principal components analysis of a set of histogram variables (using Multiple Factor Analysis)

Methods for Histogram time series

Smoothing

  • Moving averages

  • Exponential smoothing

Forecasting

  • KNN prediction of histogram time series

Linear regression

A two component model for a linear regression using Least Square method

Copy Link

Version

Install

install.packages('HistDAWass')

Monthly Downloads

302

Version

1.0.8

License

GPL (>= 2)

Maintainer

Antonio Irpino

Last Published

January 24th, 2024

Functions in HistDAWass (1.0.8)

HTS.moving.averages

Smoothing with moving averages of a histogram time series
HTS.exponential.smoothing

Smoothing with exponential smoothing of a histogram time series
HTS-class

Class HTS
Agronomique

Agronomique data
Center.cell.MatH

Method Center.cell.MatH Centers all the cells of a matrix of distributions
HistDAWass-package

Histogram-Valued Data Analysis
MatH-class

Class MatH.
China_Month

A monthly climatic dataset of China
BLOOD

Blood dataset for Histogram data analysis
DouglasPeucker

Ramer-Douglas-Peucker algorithm for curve fitting with a PolyLine
WH.plot_multiple_Spanish.funs

Plotting Spanish fun plots for Multiple factor analysis of Histogram Variables
BloodBRITO

Blood dataset from Brito P. for Histogram data analysis
RetHTS

A histogram-valued dataset of returns
OzoneFull

Full Ozone dataset for Histogram data analysis
HTS.predict.knn

K-NN predictions of a histogram time series
WH.bind.row

Method WH.bind.row
WH.correlation

Method WH.correlation
WH.plot_multiple_indivs

Plot histograms of individuals after a Multiple factor analysis of Histogram Variables
WH.bind

Method WH.bind
WH.SSQ

Method WH.SSQ
ShortestDistance

Shortes distance from a point o a 2d segment
WH.bind.col

Method WH.bind.col
get.histo

Method get.histo: show the distribution with bins
WH.SSQ2

Method WH.SSQ2
WH.regression.two.components

Multiple regression analysis for histogram variables based on a two component model and L2 Wasserstein distance
WH.regression.GOF

Goodness of Fit indices for Multiple regression of histogram variables based on a two component model and L2 Wasserstein distance
skewH

Method skewH: computes the skewness of a distribution
WH.mat.sum

Method WH.mat.sum
WH.regression.two.components.predict

Multiple regression analysis for histogram variables based on a two component model and L2 Wasserstein distance
WH_fcmeans

Fuzzy c-means of a dataset of histogram-valued data
[

extract from a MatH Method [
WH_MAT_DIST

L2 Wasserstein distance matrix
TMatH-class

Class TMatH
TdistributionH-class

Class TdistributionH
WH_adaptive.kmeans

K-means of a dataset of histogram-valued data using adaptive Wasserstein distances
WH_adaptive_fcmeans

Fuzzy c-means with adaptive distances for histogram-valued data
WH_2d_Adaptive_Kohonen_maps

Batch Kohonen self-organizing 2d maps using adaptive distances for histogram-valued data
WH_2d_Kohonen_maps

Batch Kohonen self-organizing 2d maps for histogram-valued data
get.MatH.main.info

Method get.MatH.main.info
WH.correlation2

Method WH.correlation2
get.MatH.rownames

Method get.MatH.rownames
WH.mat.prod

Method WH.mat.prod
WH.vec.mean

Method WH.vec.mean
get.MatH.ncols

Method get.MatH.ncols
WH.var.covar

Method WH.var.covar
OzoneH

Complete Ozone dataset for Histogram data analysis
WH.vec.sum

Method WH.vec.sum
get.MatH.stats

Method get.MatH.stats
plot-TdistributionH

plot for a TdistributionH object
get.distr

Method get.distr: show the distribution
plot-distributionH

plot for a distributionH object
WH.1d.PCA

Principal components analysis of histogram variable based on Wasserstein distance
WH_hclust

Hierarchical clustering of histogram data
WH_kmeans

K-means of a dataset of histogram-valued data
distributionH-class

Class distributionH.
get.MatH.nrows

Method get.MatH.nrows
WH.var.covar2

Method WH.var.covar2
dotpW

Method dotpW
WH.MultiplePCA

Principal components analysis of a set of histogram variable based on Wasserstein distance
WassSqDistH

Method WassSqDistH
get.m

Method get.m: the mean of a distribution
get.s

Method get.s: the standard deviation of a distribution
show

Method show for distributionH
meanH

Method meanH: computes the mean of a distribution
minus

Method -
subsetHTS

Method subsetHTS: extract a subset of a histogram time series
summaryHTS

A function for summarize HTS
checkEmptyBins

Method checkEmptyBins
compP

Method compP
get.MatH.varnames

Method get.MatH.varnames
plot-HTS

Method plot for a histogram time series
get.cell.MatH

Method get.cell.MatH Returns the histogram in a cell of a matrix of distributions
compQ

Method compQ
plot-MatH

Method plot for a matrix of histograms
set.cell.MatH

Method set.cell.MatH assign a histogram to a cell of a matrix of histograms
show-MatH

Method show for MatH
plotPredVsObs

A function for comparing observed vs predicted histograms
stdH

Method stdH: computes the standard deviation of a distribution
stations_coordinates

Stations coordinates of China_Month and China_Seas datasets
plot_errors

A function for plotting functions of errors
data2hist

From real data to distributionH.
+

Method +
kurtH

Method kurtH: computes the kurthosis of a distribution
register

Method register
registerMH

Method registerMH
*-methods

Method *
rQQ

Method rQQ
crwtransform

Method crwtransform: returns the centers and the radii of bins of a distribution
is.registeredMH

Method is.registeredMH
China_Seas

A seasonal climatic dataset of China
Age_Pyramids_2014

Age pyramids of all the countries of the World in 2014