Learn R Programming

caTools (version 1.10)

runmad: Median Absolute Deviation of Moving Windows

Description

Moving (aka running, rolling) Window MAD (Median Absolute Deviation) calculated over a vector

Usage

runmad(x, k, center = runmed(x,k), constant = 1.4826,
         endrule=c("mad", "NA", "trim", "keep", "constant", "func"),
         align = c("center", "left", "right"))

Arguments

x
numeric vector of length n or matrix with n rows. If x is a matrix than each column will be processed separately.
k
width of moving window; must be an integer between one and n. In case of even k's one will have to provide different center function, since runmed does not take even k's.
endrule
character string indicating how the values at the beginning and the end, of the data, should be treated. Only first and last k2 values at both ends are affected, where k2 is the half-bandwidth k2 = k %/% 2
center
moving window center. Defaults to running median (runmed function). Similar to center in mad function. For best acuracy at the edges use
constant
scale factor such that for Gaussian distribution X, mad(X) is the same as sd(X). Same as constant in mad
align
specifies whether result should be centered (default), left-aligned or right-aligned. If endrule="mad" then setting align to "left" or "right" will fall back on slower implementation equivalent to endrule

Value

  • Returns a numeric vector or matrix of the same size as x. Only in case of endrule="trim" the output vectors will be shorter and output matrices will have fewer rows.

concept

  • moving mad
  • rolling mad
  • running mad
  • running window

Details

Apart from the end values, the result of y = runmad(x, k) is the same as for(j=(1+k2):(n-k2)) y[j]=mad(x[(j-k2):(j+k2)], na.rm = TRUE). It can handle non-finite numbers like NaN's and Inf's (like mad(x, na.rm = TRUE)). The main incentive to write this set of functions was relative slowness of majority of moving window functions available in R and its packages. With the exception of runmed, a running window median function, all functions listed in "see also" section are slower than very inefficient apply(embed(x,k),1,FUN) approach. Functions runquantile and runmad are using insertion sort to sort the moving window, but gain speed by remembering results of the previous sort. Since each time the window is moved, only one point changes, all but one points in the window are already sorted. Insertion sort can fix that in O(k) time.

References

About insertion sort used in runmad function see: R. Sedgewick (1988): Algorithms. Addison-Wesley (page 99)

See Also

Links related to:

Examples

Run this code
# show runmed function
  k=25; n=200;
  x = rnorm(n,sd=30) + abs(seq(n)-n/4)
  col = c("black", "red", "green")
  m=runmed(x, k)
  y=runmad(x, k, center=m)
  plot(x, col=col[1], main = "Moving Window Analysis Functions")
  lines(m    , col=col[2])
  lines(m-y/2, col=col[3])
  lines(m+y/2, col=col[3])
  lab = c("data", "runmed", "runmed-runmad/2", "runmed+runmad/2")
  legend(0,0.9*n, lab, col=col, lty=1 )

  # basic tests against apply/embed
  eps = .Machine$double.eps ^ 0.5
  k=25 # odd size window
  a = runmad(x,k, center=runmed(x,k), endrule="trim")
  b = apply(embed(x,k), 1, mad)
  stopifnot(all(abs(a-b)<eps));
  k=24 # even size window
  a = runmad(x,k, center=runquantile(x,k,0.5,type=2), endrule="trim")
  b = apply(embed(x,k), 1, mad)
  stopifnot(all(abs(a-b)<eps));
  
  # test against loop approach
  # this test works fine at the R prompt but fails during package check - need to investigate
  k=24; n=200;
  x = rnorm(n,sd=30) + abs(seq(n)-n/4) # create random data
  x = rep(1:5,40)
  #x[seq(1,n,11)] = NaN;               # commented out for time beeing - on to do list
  #x[5] = NaN;                         # commented out for time beeing - on to do list
  k2 = k  k1 = k-k2-1
  ac = array(runquantile(x,k,0.5))
  a  = runmad(x, k, center=ac)
  bc = array(0,n)
  b  = array(0,n)
  for(j in 1:n) {
    lo = max(1, j-k1)
    hi = min(n, j+k2)
    bc[j] = median(x[lo:hi], na.rm = TRUE)
    b [j] = mad   (x[lo:hi], na.rm = TRUE, center=bc[j])
  }
  eps = .Machine$double.eps ^ 0.5
  #stopifnot(all(abs(ac-bc)<eps)); # commented out for time beeing - on to do list
  #stopifnot(all(abs(a-b)<eps));   # commented out for time beeing - on to do list
  
  # compare calculation at array ends
  k=25; n=200;
  x = rnorm(n,sd=30) + abs(seq(n)-n/4)
  c = runquantile(x,k,0.5,type=2)             # find the center
  a = runmad(x, k, center=c, endrule="mad" )  # fast C code
  b = runmad(x, k, center=c, endrule="func")  # slow R code
  stopifnot(all(abs(a-b)<eps));
  
  # test if moving windows forward and backward gives the same results
  k=51;
  a = runmad(x     , k)
  b = runmad(x[n:1], k)
  stopifnot(all(a[n:1]==b, na.rm=TRUE));

  # test vector vs. matrix inputs, especially for the edge handling
  nRow=200; k=25; nCol=10
  x = rnorm(nRow,sd=30) + abs(seq(nRow)-n/4)
  X = matrix(rep(x, nCol ), nRow, nCol) # replicate x in columns of X
  a = runmad(x, k, center = runquantile(x,k,0.5,type=2))
  b = runmad(X, k, center = runquantile(X,k,0.5,type=2))
  stopifnot(all(abs(a-b[,1])<eps));        # vector vs. 2D array
  stopifnot(all(abs(b[,1]-b[,nCol])<eps)); # compare rows within 2D array
  
  # speed comparison
  x=runif(1e5); k=51;                       # reduce vector and window sizes
  system.time(runmad( x,k,endrule="trim"))
  system.time(apply(embed(x,k), 1, mad))

Run the code above in your browser using DataLab