Learn R Programming

gama (version 1.0.3)

cpu.pca: CPU usage metrics for distributed PCA algorithm

Description

Consumption metrics gathered during an execution of the Distributed Machine Learning algorithm Principal Component Analysis (PCA) in an eigth-node cluster, by using the Spark framework.

Usage

cpu.pca

Arguments

Format

A data frame containing 938 observations and four dimensions:

  1. user: CPU usage by the algorithm

  2. system: CPU usage spent by Operating System (O.S.)

  3. iowait: waiting time for Input/Output (I/O) operations

  4. softirq: CPU time spent by software interrupt requests

The values comprise the domain from 0 to 100, for all dimensions. The dataset contains zero-values, however there is no missing or null values.

** A spark cluster of N nodes has 1 (one) master node and N-1 slave nodes.

References

J.Shlens,A Tutorial on Principal Component Analysis, Epidemiology, vol. 2, no. c, pp. 223???228, 2005.

Jolliffe, I.T.: Principal Component Analysis, Second Edition. Encycl. Stat. Behav. Sci. 30, 487 (2002).

S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, The HiBench benchmark suite: Characterization of the MapReduce-based data analy- sis, in 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), 2010, pp. 41???51.