This is a wrapper function for cleaning the trade data of all stock data inside the folder dataSource. The result is saved in the folder dataDestination.
In case you supply the argument rawtData
, the on-disk functionality is ignored. The function returns a vector
indicating how many trades were removed at each cleaning step in this case.
and the function returns an xts
or data.table
object.
The following cleaning functions are performed sequentially:
noZeroPrices
, autoSelectExchangeTrades
or selectExchange
, tradesCondition
, and
mergeTradesSameTimestamp
.
Since the function rmTradeOutliersUsingQuotes
also requires cleaned quote data as input, it is not incorporated here and
there is a separate wrapper called tradesCleanupUsingQuotes
.
tradesCleanup(
dataSource = NULL,
dataDestination = NULL,
exchanges = "auto",
tDataRaw = NULL,
report = TRUE,
selection = "median",
validConds = c("", "@", "E", "@E", "F", "FI", "@F", "@FI", "I", "@I"),
marketOpen = "09:30:00",
marketClose = "16:00:00",
printExchange = TRUE,
saveAsXTS = FALSE,
tz = NULL
)
For each day an xts
or data.table
object is saved into the folder of that date, containing the cleaned data.
This procedure is performed for each stock in "ticker"
.
The function returns a vector indicating how many trades remained after each cleaning step.
In case you supply the argument rawtData
, the on-disk functionality is ignored
and the function returns a list with the cleaned trades as xts
object (see examples).
character indicating the folder in which the original data is stored.
character indicating the folder in which the cleaned data is stored.
vector of stock exchange symbols for all data in dataSource
,
e.g. exchanges = c("T","N")
retrieves all stock market data from both NYSE and NASDAQ.
The possible exchange symbols are:
A: AMEX
N: NYSE
B: Boston
P: Arca
C: NSX
T/Q: NASDAQ
D: NASD ADF and TRF
X: Philadelphia
I: ISE
M: Chicago
W: CBOE
Z: BATS
The default value is "auto"
which automatically selects the exchange for the stocks and days independently using the autoSelectExchangeTrades
xts
object containing raw trade data. This argument is NULL
by default. Enabling it means the arguments
from
, to
, dataSource
and dataDestination
will be ignored (only advisable for small chunks of data).
boolean and TRUE
by default. In case it is true the function returns (also) a vector indicating how many trades remained after each cleaning step.
argument to be passed on to the cleaning routine mergeTradesSameTimestamp
. The default is "median".
character vector containing valid sales conditions. Passed through to tradesCondition
.
character in the format of "HH:MM:SS"
,
specifying the opening time of the exchange(s).
character in the format of "HH:MM:SS"
,
specifying the closing time of the exchange(s).
Argument passed to autoSelectExchangeTrades
indicates whether the chosen exchange is printed on the console,
default is TRUE. This is only used when exchanges
is "auto"
indicates whether data should be saved in xts
format instead of data.table
when using on-disk functionality. FALSE by default.
fallback time zone used in case we we are unable to identify the timezone of the data, by default: tz = NULL
.
With the non-disk functionality, we attempt to extract the timezone from the DT column (or index) of the data, which may fail.
In case of failure we use tz
if specified, and if it is not specified, we use "UTC"
.
In the on-disk functionality, if tz
is not specified, the timezone used will be the system default.
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup
Using the on-disk functionality with .csv.zip files from the WRDS database will write temporary files on your machine in order to unzip the files - we try to clean up after it, but cannot guarantee that there won't be files that slip through the crack if the permission settings on your machine does not match ours.
If the input data.table
does not contain a DT column but it does contain DATE and TIME_M columns, we create the DT column by REFERENCE, altering the data.table
that may be in the user's environment.
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2009). Realized kernels in practice: Trades and quotes. Econometrics Journal, 12, C1-C32.
Brownlees, C.T. and Gallo, G.M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns. Computational Statistics & Data Analysis, 51, 2232-2245.
# Consider you have raw trade data for 1 stock for 2 days
head(sampleTDataRaw)
dim(sampleTDataRaw)
tDataAfterFirstCleaning <- tradesCleanup(tDataRaw = sampleTDataRaw,
exchanges = list("N"))
tDataAfterFirstCleaning$report
dim(tDataAfterFirstCleaning$tData)
# In case you have more data it is advised to use the on-disk functionality
# via "dataSource" and "dataDestination" arguments
Run the code above in your browser using DataLab