A synthetic dataset generated by David Coleman at RCA Laboratories in Princeton, N.J and used in the 1986 American Statistical Association JSM meeting as a data challenge for the Statistical Graphics Section.
data("Pollen")
A data frame with 3848 observations on the following 5 variables, representing ficticious measurements of grains of pollen.
ridge
along X, a numeric vector
nub
along y, a numeric vector
crack
along z, a numeric vector
weight
weight of pollen grain, a numeric vector
density
weight of pollen grain, a numeric vector
The first three variables are the lengths of geometric features observed sampled pollen grains - in the x, y, and z dimensions: a "ridge" along x, a "nub" in the y direction, and a "crack" in along the z dimension. The fourth variable is pollen grain weight, and the fifth is density.
In the description for the data challenge: "the data analyst is advised that there is more than one "feature" to these data. Each feature can be observed through various graphical techniques, but analytic methods, as well, can help "crack" the dataset."
There were several features embedded in this dataset: clusters of points, 5D ellipsoidal voids with no points, and finally, a collection of points which spelled out "EUREKA".
Papers by Becker et al. (1986) and Slomka (1986) describe their work on this problem.
Yihui Xie used this data as an illustration of the animate package, using rgl to zoom in on the magic word. See the video on https://vimeo.com/1982725.
Becker, R.A., Denby, L., McGill, R., and Wilks, A. (1986). Datacryptanalysis: A Case Study. Proceedings of the Section on Statistical Graphics, 92-97.
Slomka, M. (1986). The Analysis of a Synthetic Data Set. Proceedings of the Section on Statistical Graphics, 113-116.