lrv: Long Run Variance

Description

Estimates the long run variance respectively covariance matrix of the supplied time series.

Usage

lrv(x, method = c("kernel", "subsampling", "bootstrap", "none"), control = list())

Value

long run variance $\sigma^2$ (numeric) resp. $\Sigma$ (numeric matrix)

Arguments

x: vector or matrix with each column representing a time series (numeric).
method: method of estimation. Options are kernel, subsampling, bootstrap and none.
control: a list of control parameters. See 'Details'.

Author

Sheila Görz

Details

The long run variance equals the limit of $n$ times the variance of the arithmetic mean of a short range dependent time series, where $n$ is the length of the time series. It is used to standardize tests concering the mean on dependent data.

If method = "none", no long run variance estimation is performed and the value 1 is returned (i.e. it does not alterate the test statistic).

The control argument is a list that can supply any of the following components:

kFun: Kernel function (character string). More in 'Notes'.
b_n: Bandwidth (numeric > 0 and smaller than sample size).
gamma0: Only use estimated variance if estimated long run variance is < 0? Boolean.
l: Block length (numeric > 0 and smaller than sample size).
overlapping: Overlapping subsampling estimation? Boolean.
distr: Tranform observations by their empirical distribution function? Boolean. Default is FALSE.
B: Bootstrap repetitions (integer).
seed: RNG seed (numeric).
version: What property does the CUSUM test test for? Character string, details below.
loc: Estimated location corresponding to version. Numeric value, details below.
scale: Estimated scale corresponding to version. Numeric value, details below.

Kernel-based estimation

The kernel-based long run variance estimation is available for various testing scenarios (set by control$version) and both for one- and multi-dimensional data. It uses the bandwidth $b_n = $ control$b_n and kernel function $k(x) = $ control$kFun. For tests on certain properties also a corresponding location control$loc ($m_n$) and scale control$scale ($v_n$) estimation needs to be supplied. Supported testing scenarios are:

"mean"
- 1-dim. data: $$\hat{\sigma}^2 = \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x})^2 + \frac{2}{n} \sum_{h = 1}^{b_n} \sum_{i = 1}^{n - h} (x_i - \bar{x}) (x_{i + h} - \bar{x}) k(h / b_n).$$ If control$distr = TRUE, then the long run variance is estimated on the empirical distribution of $x$. The resulting value is then multiplied with $\sqrt{\pi} / 2$.
  
  Default values: b_n = $0.9 n^{1/3}$, kFun = "bartlett".
- multivariate time series: The $k,l$-element of $\Sigma$ is estimated by $$\hat{\Sigma}^{(k,l)} = \frac{1}{n} \sum_{i,j = 1}^{n}(x_i^{(k)} - \bar{x}^{(k)}) (x_j^{(l)} - \bar{x}^{(l)}) k((i-j) / b_n),$$ $k, l = 1, ..., m$.
  
  Default values: b_n = $\log_{1.8 + m / 40}(n / 50)$, kFun = "bartlett".
"empVar" for tests on changes in the empirical variance. $$\hat{\sigma}^2 = \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} ((x_i - m_n)^2 - v_n)((x_{i+|h|} - m_n)^2 - v_n).$$

Default values: $m_n =$ mean(x), $v_n = $ var(x).
"MD" for tests on a change in the median deviation. $$\hat{\sigma}^2 = \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} (|x_i - m_n| - v_n)(|x_{i+|h|} - m_n| - v_n).$$

Default values: $m_n =$ median(x), $v_n = \frac{1}{n-1} \sum_{i = 1}^n |x_i - m_n|$.
"GMD" for tests on changes in Gini's mean difference. $$\hat{\sigma}^2 = 4 \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} \hat{\phi}_n(x_i)\hat{\phi}_n(x_{i+|h|})$$ with $\hat{\phi}_n(x) = n^{-1} \sum_{i = 1}^n |x - x_i| - v_n$.

Default value: $v_n =$ $\frac{2}{n(n-1)} \sum_{1 \leq i < j \leq n} |x_i - x_j|.$
"Qalpha" for tests on changes in Qalpha. $$\hat{\sigma}^2 = \frac{4}{\hat{u}(v_n)} \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} \hat{\phi}_n(x_i)\hat{\phi}_n(x_{i+|h|}),$$ where $\hat{\phi}_n(x) = n^{-1} \sum_{i = 1}^n 1_{\{|x - x_i| \leq v_n\}} - m_n$ and $$\hat{u}(t) = \frac{2}{n(n-1)h_n} \sum_{1 \leq i < j \leq n} K\left(\frac{|x_i - x_j| - t}{h_n}\right)$$ the kernel density estimation of the densitiy $u$ corresponding to the distribution function $U(t) = P(|X-Y| \leq t)$, $h_n =$ IQR(x)$n^{-\frac{1}{3}}$ and $K$ is the quatratic kernel function.

Default values: $m_n = \alpha = 0.5$, $v_n =$ Qalpha(x, m_n)[n-1].
"tau" for tests in changes in Kendall's tau.

Only available for bivariate data: assume that the given data x has the format $(x_i, y_i)_{i = 1, ..., n}$. $$\hat{\sigma}^2 = \sum_{h = -(n-1)}^{n-1} W \left( \frac{|h|}{b_n} \right) \frac{1}{n} \sum_{i = 1}^{n - |h|} \hat{\phi}_n((x_i, y_i))\hat{\phi}_n((x_{i+|h|}, y_{i+|h|}),$$ where $\hat{\phi}_n(x) = 4 F_n(x, y) - 2F_{X,n}(x) 2 - F_{Y,n}(y) + 1 - v_n$ and $F_n$, $F_{X,n}$ and $F_{Y,n}$ are the empirical distribution functions of $((X_i, Y_i))_{i = 1, ..., n}$, $(X_i)_{i = 1, ..., n}$ and $(Y_i)_{i = 1, ..., n}$.

Default value: $v_n = \hat{\tau}_n = \frac{2}{n(n-1)} \sum_{1 \leq i < j \leq n} sign\left((x_j - x_i)(y_j - y_i)\right)$.
"rho" for tests on changes in Spearman's rho.

Only availabe for $d$-variate data with $d > 1$: assume that the given data x has the format $(x_{i,j} | i = 1, ..., n; j = 1, ..., d)$. $$\hat{\sigma}^2 = a(d)^2 2^{2d} \left\{ \sum_{h = -(n-1)}^{n-1} K\left( \frac{|h|}{b_n} \right) \left( \sum_{i = 1}^{n-|h|} n^{-1} \prod_{j = 1}^d \hat{\phi}_n(x_i, x_j) \hat{\phi}_n(x_{i+|h|}, x_j) - M^2 \right) \right\} ,$$ where $a(d) = (d+1) / (2^d - d - 1)$, $M = n^{-1} \sum_{i = 1}^n \prod_{j = 1}^d \hat{\phi}_n(x_i, x_j)$ and $\hat{\phi}_n(x, y) = 1 - \hat{U}_n(x, y)$, $\hat{U}_n(x, y) = n^{-1}$ (rank of $x_{i,j}$ in $x_{i,1}, ..., x_{i,n})$.

When control$gamma0 = TRUE (default) then negative estimates of the long run variance are replaced by the autocovariance at lag 0 (= ordinary variance of the data). The function will then throw a warning.

Subsampling estimation

For method = "subsampling" there are an overlapping and a non-overlapping version (parameter control$overlapping). Also it can be specified if the observations x were transformed by their empirical distribution function $\tilde{F}_n$ (parameter control$distr). Via control$l the block length $l$ can be controlled.

If control$overlapping = TRUE and control$distr = TRUE: $$\hat{\sigma}_n = \frac{\sqrt{\pi}}{\sqrt{2l}(n - l + 1)} \sum_{i = 0}^{n-l} \left| \sum_{j = i+1}^{i+l} (F_n(x_j) - 0.5) \right|.$$

Otherwise, if control$distr = FALSE, the estimator is $$\hat{\sigma}^2 = \frac{1}{l (n - l + 1)} \sum_{i = 0}^{n-l} \left( \sum_{j = i + 1}^{i+l} x_j - \frac{l}{n} \sum_{j = 1}^n x_j \right)^2.$$

If control$overlapping = FALSE and control$distr = TRUE: $$\hat{\sigma} = \frac{1}{n/l} \sqrt{\pi/2} \sum_{i = 1}{n/l} \frac{1}{\sqrt{l}} \left| \sum_{j = (i-1)l + 1}^{il} F_n(x_j) - \frac{l}{n} \sum_{j = 1}^n F_n(x_j) \right|.$$

Otherwise, if control$distr = FALSE, the estimator is $$\hat{\sigma}^2 = \frac{1}{n/l} \sum_{i = 1}^{n/l} \frac{1}{l} \left(\sum_{j = (i-1)l + 1}^{il} x_j - \frac{l}{n} \sum_{j = 1}^n x_j\right)^2.$$

Default values: overlapping = TRUE, the block length is chosen adaptively: $$l_n = \max{\left\{ \left\lceil n^{1/3} \left( \frac{2 \rho}{1 - \rho^2} \right)^{(2/3)} \right\rceil, 1 \right\}}$$ where $\rho$ is the Spearman autocorrelation at lag 1.

Bootstrap estimation

If method = "bootstrap" a dependent wild bootstrap with the parameters $B = $ control$B, $l = $ control$l and $k(x) = $ control$kFun is performed: $$ \hat{\sigma}^2 = \sqrt{n} Var(\bar{x^*_k} - \bar{x}), k = 1, ..., B$$ A single $x_{ik}^*$ is generated by $x_i^* = \bar{x} + (x_i - \bar{x}) a_i$ where $a_i$ are independent from the data x and are generated from a multivariate normal distribution with $E(A_i) = 0$, $Var(A_i) = 1$ and $Cov(A_i, A_j) = k\left(\frac{i - j}{l}\right), i = 1, ..., n; j \neq i$. Via control$seed a seed can optionally be specified (cf. set.seed). Only "bartlett", "parzen" and "QS" are supported as kernel functions. Uses the function sqrtm from package pracma.

Default values: B = 1000, kFun = "bartlett", l is the same as for subsampling.

References

Andrews, D.W. "Heteroskedasticity and autocorrelation consistent covariance matrix estimation." Econometrica: Journal of the Econometric Society (1991): 817-858.

Dehling, H., et al. "Change-point detection under dependence based on two-sample U-statistics." Asymptotic laws and methods in stochastics. Springer, New York, NY, (2015). 195-220.

Dehling, H., Fried, R., and Wendler, M. "A robust method for shift detection in time series." Biometrika 107.3 (2020): 647-660.

Parzen, E. "On consistent estimates of the spectrum of a stationary time series." The Annals of Mathematical Statistics (1957): 329-348.

Shao, X. "The dependent wild bootstrap." Journal of the American Statistical Association 105.489 (2010): 218-235.

Examples

Run this code

Z <- c(rnorm(20), rnorm(20, 2))

## kernel density estimation
lrv(Z)

## overlapping subsampling
lrv(Z, method = "subsampling", control = list(overlapping = FALSE, distr = TRUE, l_n = 5))

## dependent wild bootstrap estimation
lrv(Z, method = "bootstrap", control = list(l_n = 5, kFun = "parzen"))

Run the code above in your browser using DataLab