Learn R Programming

⚠️There's a newer version (1.0.8) of this package.Take me there.

HistDAWass

(Histogram-valued Data analysis using Wasserstein metric)

In this document we describe the main features of the HistDAWass package. The name is the acronym for Histogram-valued Data analysis using Wasserstein metric. The implemented classes and functions are related to the anlysis of data tables containing histograms in each cell instead of the classical numeric values.

In this document we describe the main features of the HistDAWass package. The name is the acronym for Histogram-valued Data analysis using Wasserstein metric. The implemented classes and functions are related to the anlysis of data tables containing histograms in each cell instead of the classical numeric values.

What is the L2 Wasserstein metric?

given two probability density functions f and g, each one has a cumulative distribution function F and G and thei respectively quantile functions (the inverse of a cumulative distribution function) Qf and Qg. The L2 Wasserstein distance is

$$d_W(f,g)=\sqrt{\int\limits_0^1{(Q_f(p) - Q_g(p))^2 dp}}$$

The implemented classes are those described in the following table

library(HistDAWass)
mydist=distributionH(x=c(0,1,2),p=c(0,0.3,1))

From raw data to histograms

data2hist functions

Basic statistics for a distributionH (A histogram)

  • mean
    • the mean of a histogram
  • standard deviation
    • the standard deviation of a histogram
  • skewness
    • the third standardized moment of a histogram
  • kurthosis
    • the fourth standardized momemt of a histogram

Basic statistics for a MatH (A matrix of histogrm-valued data)

  • The average hisogram of a column

    • It is an average histogram that minimizes the sum of squared Wasserstein distances.
  • The standard deviation of a variable

    • It is a number that measures the dispersion of a set of histograms.
  • The covarince matrix of a MatH

    • It is a matrix that measures the covariances into a set of hitogram variables.
  • The correlation matrix of a MatH

    • It is a matrix that measures the correlation into a set of hitogram variables.

Visualization

plot of a distributionH

plot of a MatH

plot of a HTS

Data Analysis methods

Clustering

  • Kmeans

  • Adaptive distance based Kmeans

  • Fuzzy cmeans

  • Fuzzy cmeans based on adaptive Wasserstein distances

  • Kohonen batch self organizing maps

  • Kohonen batch self organizing maps with Wasserstein adaptive distances

  • Hierarchical clustering

Dimension reduction techniques

  • Principal components analysis of a single histogram variable

  • Principal components analysis of a set of histogram variables (using Multiple Factor Analysis)

Methods for Histogram time series

Smoothing

  • Moving averages

  • Exponential smoothing

Predicting

  • KNN prediction of histogram time series

Linear regression

A two component model for a linear regression using Least Square method

Copy Link

Version

Install

install.packages('HistDAWass')

Monthly Downloads

302

Version

1.0.4

License

GPL (>= 2)

Maintainer

Antonio Irpino

Last Published

February 19th, 2020

Functions in HistDAWass (1.0.4)

China_Month

A monthly climatic dataset of China
WH.mat.prod

Method WH.mat.prod
MatH-class

Class MatH.
WH.vec.mean

Method WH.vec.mean
WH.correlation2

Method WH.correlation2
WH.vec.sum

Method WH.vec.sum
HistDAWass-package

Histogram-Valued Data Analysis
checkEmptyBins

Method checkEmptyBins
dotpW

Method dotpW
[

extract from a MatH Method [
get.MatH.stats

Method get.MatH.stats
get.MatH.varnames

Method get.MatH.varnames
compP

Method compP
plot-MatH

Method plot for a matrix of histograms
China_Seas

A seasonal climatic dataset of China
DouglasPeucker

Ramer-Douglas-Peucker algorithm for curve fitting with a PolyLine
HTS.moving.averages

Smoothing with moving averages of a histogram time series
plot-TdistributionH

plot for a TdistributionH object
WH.bind

Method WH.bind
WH.var.covar

Method WH.var.covar
WH_2d_Adaptive_Kohonen_maps

Batch Kohonen self-organizing 2d maps using adaptive distances for histogram-valued data
HTS.predict.knn

K-NN predictions of a histogram time series
WH.var.covar2

Method WH.var.covar2
WH_2d_Kohonen_maps

Batch Kohonen self-organizing 2d maps for histogram-valued data
WH.bind.col

Method WH.bind.col
get.MatH.nrows

Method get.MatH.nrows
*-methods

Method *
get.cell.MatH

Method get.cell.MatH Returns the histogram in a cell of a matrix of distributions
get.MatH.rownames

Method get.MatH.rownames
minus

Method -
get.distr

Method get.distr: show the distribution
HTS-class

Class HTS
BLOOD

Blood dataset for Histogram data analysis
HTS.exponential.smoothing

Smoothing with exponential smoothing of a histogram time series
plot-HTS

Method plot for a histogram time series
Age_Pyramids_2014

Age pyramids of all the countries of the World in 2014
TMatH-class

Class TMatH
BloodBRITO

Blood dataset from Brito P. for Histogram data analysis
WH.SSQ

Method WH.SSQ
TdistributionH-class

Class TdistributionH
Agronomique

Agronomique data
show

Method show for distributionH
show-MatH

Method show for MatH
OzoneFull

Full Ozone dataset for Histogram data analysis
OzoneH

Complete Ozone dataset for Histogram data analysis
data2hist

From real data to distributionH.
distributionH-class

Class distributionH.
WH.regression.two.components

Multiple regression analysis for histogram variables based on a two component model and L2 Wasserstein distance
WH_kmeans

K-means of a dataset of histogram-valued data
WH.regression.two.components.predict

Multiple regression analysis for histogram variables based on a two component model and L2 Wasserstein distance
WH.SSQ2

Method WH.SSQ2
WH.1d.PCA

Principal components analysis of histogram variable based on Wasserstein distance
WassSqDistH

Method WassSqDistH
RetHTS

A histogram-valued dataset of returns
WH.bind.row

Method WH.bind.row
register

Method register
subsetHTS

Method subsetHTS: extract a subset of a histogram time series
rQQ

Method rQQ
kurtH

Method kurtH: computes the kurthosis of a distribution
WH.correlation

Method WH.correlation
ShortestDistance

Shortes distance from a point o a 2d segment
stdH

Method stdH: computes the standard deviation of a distribution
WH.mat.sum

Method WH.mat.sum
WH_fcmeans

Fuzzy c-means of a dataset of histogram-valued data
WH.plot_multiple_Spanish.funs

Plotting Spanish fun plots for Multiple factor analysis of Histogram Variables
WH.MultiplePCA

Principal components analysis of a set of histogram variable based on Wasserstein distance
WH_hclust

Hierarchical clustering of histogram data
meanH

Method meanH: computes the mean of a distribution
WH.plot_multiple_indivs

Plot histograms of individuals after a Multiple factor analysis of Histogram Variables
WH_adaptive.kmeans

K-means of a dataset of histogram-valued data using adaptive Wasserstein distances
WH_adaptive_fcmeans

Fuzzy c-means with adaptive distances for histogram-valued data
compQ

Method compQ
WH.regression.GOF

Goodness of Fit indices for Multiple regression of histogram variables based on a two component model and L2 Wasserstein distance
get.histo

Method get.histo: show the distribution with bins
crwtransform

Method crwtransform: returns the centers and the radii of bins of a distribution
get.m

Method get.m: the mean of a distribution
plot-distributionH

plot for a distributionH object
plotPredVsObs

A function for comparing observed vs predicted histograms
get.MatH.main.info

Method get.MatH.main.info
skewH

Method skewH: computes the skewness of a distribution
stations_coordinates

Stations coordinates of China_Month and China_Seas datasets
get.s

Method get.s: the standard deviation of a distribution
get.MatH.ncols

Method get.MatH.ncols
+

Method +
plot_errors

A function for plotting functions of errors
set.cell.MatH

Method set.cell.MatH assign a histogram to a cell of a matrix of histograms
is.registeredMH

Method is.registeredMH
registerMH

Method registerMH
Center.cell.MatH

Method Center.cell.MatH Centers all the cells of a matrix of distributions