Learn R Programming

pdfCluster (version 1.0-4)

oliveoil: Olive oil data

Description

This data set represents eight chemical measurements on different specimen of olive oil produced in various regions in Italy (northern Apulia, southern Apulia, Calabria, Sicily, inland Sardinia and coast Sardinia, eastern and western Liguria, Umbria) and further classifiable into three macro-areas: Centre-North, South, Sardinia. The data set is used to evaluate the pdfCluster ability of recunstructing the macro-area membership.

Usage

data(oliveoil)

Arguments

Format

This data frame contains 572 rows, each corresponding to a different specimen of olive oil, and 10 columns. The first and the second column correspond to the macro-area and the region of origin of the olive oils respectively; here, the term "region" refers to a geographical area and only partially to administrative borders. Columns 3-10 represent the following eight chemical measurements on the acid components for the oil specimens: palmitic, palmitoleic, stearic, oleic, linoleic, linolenic, arachidic, eicosenoic.

Details

Since the raw data are of compositional nature, ideally totalling 10000, some preliminary transformations of data are advisable. In particular, Azzalini and Torelli (2007) adopt an additive log-ratio transformation (ALR). If \(x_j\) denotes the \(j^{th}\) chemical measurement \((j=1,\ldots,8)\), the ALR transformation is \(y_j= \log x_j/x_k, j\neq k\), where \(k\) is an arbitrary but fixed variable. However, in this data set, the raw data do not always sum up exactly to 10000, because of measurement errors. Moreover, some 0's are present in the data, corresponding to measurements below the instrument sensitivity level. Therefore, it is suggested to add 1 to all raw data and normalize them by dividing each entry by the corresponding row sum \(\sum_j (x_j+1)\).

References

Azzalini A., Torelli N. (2007). Clustering via nonparametric density estimation. Statistics and Computing, 17, 71-80.