Learn R Programming

gdalraster (version 1.11.1)

RunningStats-class: Class to calculate mean and variance in one pass

Description

RunningStats computes summary statistics on a data stream efficiently. Mean and variance are calculated with Welford's online algorithm (https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance). The min, max, sum and count are also tracked. The input data values are not stored in memory, so this class can be used to compute statistics for very large data streams.

Value

An object of class RunningStats. A RunningStats object maintains the current minimum, maximum, mean, variance, sum and count of values that have been read from the stream. It can be updated repeatedly with new values (i.e., chunks of data read from the input stream), but its memory footprint is negligible. Class methods for updating with new values and retrieving current values of statistics are described in Details. RunningStats is a C++ class exposed directly to R (via RCPP_EXPOSED_CLASS). Methods of the class are accessed in R using the $

operator.

Arguments

na_rm

Logical scalar. TRUE to remove NA from the input data or FALSE to retain NA (defaults to TRUE).

Usage


## Constructor
rs <- new(RunningStats, na_rm)

## Methods (see Details) rs$update(newvalues) rs$get_count() rs$get_mean() rs$get_min() rs$get_max() rs$get_sum() rs$get_var() rs$get_sd() rs$reset()

Details

new(RunningStats, na_rm) Constructor. Returns an object of class RunningStats.

$update(newvalues) Updates the RunningStats object with a numeric vector of newvalues (i.e., a chunk of values from the data stream). No return value, called for side effects.

$get_count() Returns the count of values received from the data stream.

$get_mean() Returns the mean of values received from the data stream.

$get_min() Returns the minimum value received from the data stream.

$get_max() Returns the maximum value received from the data stream.

$get_sum() Returns the sum of values received from the data stream.

$get_var() Returns the variance of values from the data stream (denominator n - 1).

$get_sd() Returns the standard deviation of values from the data stream (denominator n - 1).

$reset() Clears the RunningStats object to its initialized state (count = 0). No return value, called for side effects.

Examples

Run this code
set.seed(42)

rs <- new(RunningStats, na_rm=TRUE)
chunk <- runif(1000)
rs$update(chunk)
object.size(rs)

rs$get_count()
length(chunk)

rs$get_mean()
mean(chunk)

rs$get_min()
min(chunk)

rs$get_max()
max(chunk)

rs$get_var()
var(chunk)

rs$get_sd()
sd(chunk)

# \donttest{
## 10^9 values read in 10,000 chunks
## should take under 1 minute on most PC hardware
for (i in 1:1e4) {
  chunk <- runif(1e5)
  rs$update(chunk)
}
rs$get_count()
rs$get_mean()
rs$get_var()

object.size(rs)
# }

Run the code above in your browser using DataLab