Learn R Programming

highfrequency (version 0.7.0.1)

quotesCleanup: Cleans quote data

Description

This is a wrapper function for cleaning the quote data in the entire folder dataSource. The result is saved in the folder dataDestination.

In case you supply the argument "qDataRaw", the on-disk functionality is ignored and the function returns the cleaned quotes as xts or data.table object (see examples).

The following cleaning steps are performed sequentially: noZeroQuotes, selectExchange, rmLargeSpread, mergeQuotesSameTimestamp, rmOutliersQuotes.

Usage

quotesCleanup(
  dataSource = NULL,
  dataDestination = NULL,
  exchanges,
  qDataRaw = NULL,
  report = TRUE,
  selection = "median",
  maxi = 50,
  window = 50,
  type = "advanced",
  rmoutliersmaxi = 10,
  saveAsXTS = TRUE,
  tz = "EST"
)

Arguments

dataSource

character indicating the folder in which the original data is stored.

dataDestination

character indicating the folder in which the cleaned data is stored.

exchanges

vector of stock exchange symbols for all data in dataSource, e.g. exchanges = c("T","N") retrieves all stock market data from both NYSE and NASDAQ. The possible exchange symbols are:

  • A: AMEX

  • N: NYSE

  • B: Boston

  • P: Arca

  • C: NSX

  • T/Q: NASDAQ

  • D: NASD ADF and TRF

  • X: Philadelphia

  • I: ISE

  • M: Chicago

  • W: CBOE

  • Z: BATS

qDataRaw

xts or data.table object containing (ONE stock only) raw quote data. This argument is NULL by default. Enabling it means the arguments from, to, dataSource and dataDestination will be ignored. (only advisable for small chunks of data)

report

boolean and TRUE by default. In case it is true the function returns (also) a vector indicating how many quotes remained after each cleaning step.

selection

argument to be passed on to the cleaning routine mergeQuotesSameTimestamp. The default is "median".

maxi

spreads which are greater than median(spreads of day) times maxi are excluded.

window

argument to be passed on to the cleaning routine rmOutliersQuotes.

type

argument to be passed on to the cleaning routine rmOutliersQuotes.

rmoutliersmaxi

argument to be passed on to the cleaning routine rmOutliersQuotes.

saveAsXTS

indicates whether data should be saved in xts format instead of data.table when using on-disk functionality. TRUE by default.

tz

timezone to use

Value

The function converts every csv file in dataSource into multiple xts or data.table files. In dataDestination, there will be one folder for each symbol containing .rds files with cleaned data stored either in data.table or xts format.

In case you supply the argument "qDataRaw", the on-disk functionality is ignored and the function returns a list with the cleaned quotes as an xts or data.table object depending on input (see examples).

References

Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard (2009). Realized kernels in practice: Trades and quotes. Econometrics Journal 12, C1-C32. Brownlees, C.T. and Gallo, G.M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns. Computational Statistics & Data Analysis, 51, pages 2232-2245. Falkenberry, T.N. (2002). High frequency data filtering. Unpublished technical report.

Examples

Run this code
# NOT RUN {
# Consider you have raw quote data for 1 stock for 2 days
head(sampleQDataRawMicroseconds)
dim(sampleQDataRawMicroseconds)
qDataAfterCleaning <- quotesCleanup(qDataRaw = sampleQDataRawMicroseconds, exchanges = "N")
qDataAfterCleaning$report
dim(qDataAfterCleaning$qData)

# In case you have more data it is advised to use the on-disk functionality
#via "dataSource" and "dataDestination" arguments

# }

Run the code above in your browser using DataLab