Consumption metrics gathered during an execution of the Distributed Machine Learning algorithm Principal Component Analysis (PCA) in an eigth-node cluster, by using the Spark framework.
cpu.pca
A data frame containing 938 observations and four dimensions:
user: CPU usage by the algorithm
system: CPU usage spent by Operating System (O.S.)
iowait: waiting time for Input/Output (I/O) operations
softirq: CPU time spent by software interrupt requests
The values comprise the domain from 0 to 100, for all dimensions. The dataset contains zero-values, however there is no missing or null values.
** A spark cluster of N nodes has 1 (one) master node and N-1 slave nodes.
J.Shlens,A Tutorial on Principal Component Analysis, Epidemiology, vol. 2, no. c, pp. 223???228, 2005.
Jolliffe, I.T.: Principal Component Analysis, Second Edition. Encycl. Stat. Behav. Sci. 30, 487 (2002).
S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, The HiBench benchmark suite: Characterization of the MapReduce-based data analy- sis, in 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), 2010, pp. 41???51.