This function returns a prepared data frame based on the user's data set. The resulting data frame is ready for further data processing (such as data selecting, matching or filtering) and it is also ready for price index calculations (if only it contains required columns).
data_preparing(
data,
time = NULL,
prices = NULL,
quantities = NULL,
prodID = NULL,
retID = NULL,
description = NULL,
codeIN = NULL,
codeOUT = NULL,
grammage = NULL,
unit = NULL,
additional = c(),
zero_prices = FALSE,
zero_quantities = TRUE
)
The resulting data frame is free from: missing values, negative prices (if zero_prices
is set to TRUE), zero or negative prices (if zero_prices
is set to FALSE), negative quantities (if zero_quantities
is set to TRUE) and zero and negative quantities (if zero_prices
is set to FALSE). As a result, column time
is set to be Date type (in format: `Year-Month-01`), columns prices
and quantities
are set to be numeric. If the column description
is selected, then it is set to be character type. If columns: prodID
, retID
, codeIN
or codeOUT
are selected, then they are set to be factor type.
The user's data frame to be prepared. The user must indicate columns: time
(as Date or character type, allowed formats are, eg.: `2020-03` or `2020-12-28`), prices
and quantities
(as numeric). Optionally, the user may also indicate columns: prodID
, codeIN
, codeOUT
, retID
(as numeric, factor or character), description
(as character), grammage
(as numeric or character), unit
(as character) and other columns specified by the additional
parameter.
A character name of the column which provides transaction dates.
A character name of the column which provides product prices.
A character name of the column which provides product quantities.
A character name of the column which provides product IDs. The prodID
column should include unique product IDs used for product matching (as numeric or character). It is not obligatory to consider this column while data preparing but it is required while price index calculating (to obtain it, please see data_matching
).
A character name of the column which provides outlet IDs (retailer sale points). The retID
column should include unique outlet IDs used for aggregating subindices over outlets. It is not obligatory to consider this column while data preparing but it is required while final price index calculating (to obtain it, please see the final_index
function).
A character name of the column which provides product descriptions. It is not obligatory to consider this column while data preparing but it is required while product selecting (please see the data_selecting
function).
A character name of the column which provides internal product codes (from the retailer). It is not obligatory to consider this column while data preparing but it may be required while product matching (please see the data_matching
function).
A character name of the column which provides external product codes (e.g. GTIN or SKU). It is not obligatory to consider this column while data preparing but it may be required while product matching (please see the data_matching
function).
A character name of the numeric column which provides the grammage of products
A character name of the column which provides the unit of the grammage of products
A character vector of names of additional columns to be considered while data preparing (records with missing values are deleted).
A logical parameter indicating whether zero prices are to be acceptable.
A logical parameter indicating whether zero quantities are to be acceptable.
data_preparing(milk, time="time",prices="prices",quantities="quantities")
data_preparing(dataCOICOP, time="time",
prices="prices",quantities="quantities",additional="coicop6")
Run the code above in your browser using DataLab