Learn R Programming

gama (version 1.0.3)

cpu.als: CPU usage metrics for distributed ALS algorithm

Description

Consumption metrics gathered during an execution of the Distributed Machine Learning algorithm Alternating Least Squares (ALS) in an eigth-node cluster, by using the Spark framework.

Usage

cpu.als

Arguments

Format

A data frame containing 308 observations and four dimensions:

  1. user: CPU usage by the algorithm

  2. system: CPU usage spent by Operating System (O.S.)

  3. iowait: waiting time for Input/Output (I/O) operations

  4. softirq: CPU time spent by software interrupt requests

The values comprise the domain from 0 to 100, for all dimensions. The dataset contains zero-values, however there is no missing or null values.

** A spark cluster of N nodes has 1 (one) master node and N-1 slave nodes.

References

D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, Using collaborative filtering to weave an information tapestry, Commun. ACM, vol. 35, no. 12, pp. 61???70, 1992.

Y. Koren, R. Bell, and C. Volinsky, Matrix factorization techniques for recommender systems, Computer (Long. Beach. Calif)., vol. 42, no. 8, 2009.

Y. Hu, Y. Koren, and C. Volinsky, Collaborative filtering for implicit feedback datasets, in Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on, 2008, pp. 263???272.

S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, The HiBench benchmark suite: Characterization of the MapReduce-based data analysis, in 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), 2010, pp. 41???51.