Pollen: Pollen Data Challenge

Description

A synthetic dataset generated by David Coleman at RCA Laboratories in Princeton, N.J and used in the 1986 American Statistical Association JSM meeting as a data challenge for the Statistical Graphics Section.

Usage

data("Pollen")

Arguments

Format

A data frame with 3848 observations on the following 5 variables, representing ficticious measurements of grains of pollen.

ridge: along X, a numeric vector
nub: along y, a numeric vector
crack: along z, a numeric vector
weight: weight of pollen grain, a numeric vector
density: weight of pollen grain, a numeric vector

Details

The first three variables are the lengths of geometric features observed sampled pollen grains - in the x, y, and z dimensions: a "ridge" along x, a "nub" in the y direction, and a "crack" in along the z dimension. The fourth variable is pollen grain weight, and the fifth is density.

In the description for the data challenge: "the data analyst is advised that there is more than one "feature" to these data. Each feature can be observed through various graphical techniques, but analytic methods, as well, can help "crack" the dataset."

There were several features embedded in this dataset: clusters of points, 5D ellipsoidal voids with no points, and finally, a collection of points which spelled out "EUREKA".

Papers by Becker et al. (1986) and Slomka (1986) describe their work on this problem.

Yihui Xie used this data as an illustration of the animate package, using rgl to zoom in on the magic word. See the video on https://vimeo.com/1982725.

References

Becker, R.A., Denby, L., McGill, R., and Wilks, A. (1986). Datacryptanalysis: A Case Study. Proceedings of the Section on Statistical Graphics, 92-97.

Slomka, M. (1986). The Analysis of a Synthetic Data Set. Proceedings of the Section on Statistical Graphics, 113-116.

Examples

Run this code

data(Pollen)
pairs(Pollen)

Run the code above in your browser using DataLab