This function returns a filtered data set, i.e. a reduced user's data frame with the same columns and rows limited by a criterion defined by filters
.
data_filtering(
data,
start,
end,
filters = c(),
plimits = c(),
pquantiles = c(),
dplimits = c(),
lambda = 1.25,
interval = FALSE,
retailers = FALSE
)
This function returns a filtered data set (a reduced user's data frame). If the set of filters
is empty, then the function returns the original data frame (defined by the data
parameter) limited to considered months. On the other hand, if all filters are chosen, i.e. filters=c(extremeprices,dumpprices,lowsales)
, then these filters work independently and a summary result is returned. Please note that both variants of extremeprices
filter can be chosen at the same time, i.e. plimits
and pquantiles
, and they work also independently.
The user's data frame with information about products to be filtered. It must contain columns: time
(as Date in format: year-month-day, e.g. '2020-12-01'), prices
(as positive numeric) and quantities
(as positive numeric).
The base period (as character) limited to the year and month, e.g. "2020-03".
The research period (as character) limited to the year and month, e.g. "2020-04".
A vector of filter names (options are: extremeprices
, dumpprices
and/or lowsales
).
A two-dimensional vector of thresholds for minimum and maximum price change (it works if one of the chosen filters is extremeprices
filter).
A two-dimensional vector of quantile levels for minimum and maximum price change (it works if one of the chosen filters is extremeprices
filter).
A two-dimensional vector of thresholds for maximum price drop and maximum drop in sales value (it works if one of the chosen filters is dumpprices
filter).
The lambda parameter for lowsales
filter (see References
below).
A logical value indicating whether the filtering process concerns only two periods defined by start
and end
parameters (then the interval
is set to FALSE) or whether that function is to filter products sold during the whole time interval <start, end>, i.e. any subsequent months are compared.
A logical parameter indicating whether filtering should be done for each outlet (retID
) separately. If it is set to FALSE, then there is no need to consider the retID
column.
Van Loon, K., Roels, D. (2018) Integrating big data in Belgian CPI. Meeting of the Group of Experts on Consumer Price Indices, Geneva.
data_filtering(milk,start="2018-12",end="2019-03",
filters=c("extremeprices"),pquantiles=c(0.01,0.99),interval=TRUE)
data_filtering(milk,start="2018-12",end="2019-03",
filters=c("extremeprices","lowsales"), plimits=c(0.25,2))
Run the code above in your browser using DataLab