Learn R Programming

highfrequency (version 1.0.1)

driftBursts: Inference on drift burst hypothesis

Description

Calculates the test-statistic for the drift burst hypothesis

Let the efficient log-price be defined as: $$ dX_{t} = \mu_{t}dt + \sigma_{t}dW_{t} + dJ_{t}, $$ where \(\mu_{t}\), \(\sigma_{t}\), and \(J_{t}\) are the spot drift, the spot volatility, and a jump process respectively. However, due to microstructure noise, the observed log-price is $$ Y_{t} = X_{t} + \varepsilon_{t} $$

In order robustify the results to the presence of market microstructure noise, the pre-averaged returns are used: $$ \Delta_{i}^{n}\overline{Y} = \sum_{j=1}^{k_{n}-1}g_{j}^{n}\Delta_{i+j}^{n}Y, $$

where \(g(\cdot)\) is a weighting function, \(min(x, 1-x)\), and \(k_{n}\) is the pre-averaging horizon.

The test statistic for the Drift Burst Hypothesis can then be calculated as

$$ \bar{T}_{t}^{n} = \sqrt{\frac{h_{n}}{K_{2}}}\frac{\hat{\bar{\mu}}_{t}^{n}}{\sqrt{\hat{\bar{\sigma}}_{t}^{n}}}, $$ where $$ \hat{\bar{\mu}}_{t}^{n} = \frac{1}{h_{n}}\sum_{i=1}^{n-k_{n}+2}K\left(\frac{t_{i-1}-t}{h_{n}}\right)\Delta_{i-1}^{n}\overline{Y}, $$ and

\( \hat{\bar{\sigma}}_{t}^{n} = \frac{1}{h_{n}'}\bigg[\sum_{i=1}^{n-k_{n}+2}\left(K\left(\frac{t_{i-1}-t}{h'_{n}}\right)\Delta_{i-1}^{n}\overline{Y}\right)^{2} \\ + 2\sum_{L=1}^{L_{n}}\omega\left(\frac{L}{L_{n}}\right)\sum_{i=1}^{n-k_{n}-L+2}K\left(\frac{t_{i-1}-t}{h_{n}'}\right)K\left(\frac{t_{i+L-1}-t}{h_{n}'}\right)\Delta_{i-1}^{n}\overline{Y}\Delta_{i-1+L}^{n}\overline{Y}\bigg], \)

where \(\omega(\cdot)\) is a smooth kernel function, in this case the Parzen kernel. \(L_{n}\) is the lag length for adjusting for auto-correlation and \(K(\cdot)\) is a kernel weighting function, which in this case is the left-sided exponential kernel.

Usage

driftBursts(
  pData,
  testTimes = seq(34260, 57600, 60),
  preAverage = 5,
  ACLag = -1L,
  meanBandwidth = 300L,
  varianceBandwidth = 900L,
  parallelize = FALSE,
  nCores = NA,
  warnings = TRUE
)

Value

An object of class DBH and list containing the series of the drift burst hypothesis test-statistic as well as the estimated spot drift and variance series. The list also contains some information such as the variance and mean bandwidths along with the pre-averaging setting and the amount of observations. Additionally, the list will contain information on whether testing happened for all testTimes entries. Objects of class DBH has the methods print.DBH, plot.DBH, and getCriticalValues.DBH which prints, plots, and retrieves critical values for the test described in appendix B of Christensen, Oomen, and Reno (2020).

Arguments

pData

Either a data.table or an xts object. If pData is a data.table, columns DT and PRICE must be present, containing timestamps of the trades and the price of the trades (in levels) respectively. If pData is an xts object and the number of columns is greater than one, PRICE must be present.

testTimes

A numeric containing the times at which to calculate the tests. The standard of seq(34260, 57600, 60) denotes calculating the test-statistic once per minute, i.e. 390 times for a typical 6.5 hour trading day from 9:31:00 to 16:00:00. See details. Additionally, testTimes can be set to 'all' where the test statistic will be calculated on each tick more than 5 seconds after opening

preAverage

A positive integer denoting the length of pre-averaging window for the log-prices. Default is 5

ACLag

A positive integer greater than 1 denoting how many lags are to be used for the HAC estimator of the variance - the default of -1 denotes using an automatic lag selection algorithm for each iteration. Default is -1L

meanBandwidth

An integer denoting the bandwidth for the left-sided exponential kernel for the mean. Default is 300L

varianceBandwidth

An integer denoting the bandwidth for the left-sided exponential kernel for the variance. Default is 900L

parallelize

A logical to determine whether to parallelize the underlying C++ code (Using OpenMP). Default is FALSE. Note that the parallelized code is not interruptable, while the non-parallel code is interruptable and it's checked every 100 iterations.

nCores

An integer denoting the number of cores to use for calculating the code when parallelized. If this argument is not provided, sequential evaluation will be used even though parallelize is TRUE. Default is NA

warnings

A logical denoting whether warnings should be shown. Default is TRUE

Author

Emil Sjoerup

Details

If the testTimes vector contains instructions to test before the first trade, or more than 15 minutes after the last trade, these entries will be deleted, as not doing so may cause crashes. The test statistic is unstable before max(meanBandwidth , varianceBandwidth) seconds has passed. The lags from the Newey-West algorithm is increased by 2 * (preAveage-1) due to the pre-averaging we know at least this many lags should be corrected for. The maximum of 20 lags is also increased by this factor for the same reason.

References

Christensen, K., Oomen, R., and Reno, R. (2020) The drift burst hypothesis. Journal of Econometrics. Forthcoming.

Examples

Run this code
data.table::setDTthreads(2)
# Usage with data.table object
dat <- sampleTData[as.Date(DT) == "2018-01-02"]
# Testing every 60 seconds after 09:45:00
DBH1 <- driftBursts(dat, testTimes = seq(35100, 57600, 60), preAverage = 2, ACLag = -1L,
                    meanBandwidth = 300L, varianceBandwidth = 900L)
print(DBH1)

plot(DBH1, pData = dat)
# Usage with xts object (1 column)
library("xts")
dat <- xts(sampleTData[as.Date(DT) == "2018-01-03"]$PRICE, 
           order.by = sampleTData[as.Date(DT) == "2018-01-03"]$DT)
# Testing every 60 seconds after 09:45:00
DBH2 <- driftBursts(dat, testTimes = seq(35100, 57600, 60), preAverage = 2, ACLag = -1L,
                    meanBandwidth = 300L, varianceBandwidth = 900L)
plot(DBH2, pData = dat)

if (FALSE) { 
# This block takes some time
dat <- xts(sampleTDataEurope$PRICE, 
           order.by = sampleTDataEurope$DT)
# Testing every 60 seconds after 09:00:00
system.time({DBH4 <- driftBursts(dat, testTimes = seq(32400 + 900, 63000, 60), preAverage = 2, 
             ACLag = -1L, meanBandwidth = 300L, varianceBandwidth = 900L)})

system.time({DBH4 <- driftBursts(dat, testTimes = seq(32400 + 900, 63000, 60), preAverage = 2, 
                                 ACLag = -1L, meanBandwidth = 300L, varianceBandwidth = 900L,
                                 parallelize = TRUE, nCores = 8)})
plot(DBH4, pData = dat)

# The print method for DBH objects takes an argument alpha that determines the confidence level
# of the test performed
print(DBH4, alpha = 0.99)
# Additionally, criticalValue can be passed directly
print(DBH4, criticalValue = 3)
max(abs(DBH4$tStat)) > getCriticalValues(DBH4, 0.99)$quantile
}

Run the code above in your browser using DataLab