Learn R Programming

PriceIndices – a Package for Bilateral and Multilateral Price Index Calculations

author: Jacek Białek, University of Lodz, Statistics Poland

Goals of PriceIndices are as follows: a) data processing before price index calculations; b) bilateral and multilateral price index calculations; c) extending multilateral price indices. You can download the package documentation from here. Too read more about the package please see (and cite :)) papers:

Białek, J. (2021). PriceIndices – a New R Package for Bilateral and Multilateral Price Index Calculations, Statistika – Statistics and Economy Journal, Vol. 2/2021, 122-141, Czech Statistical Office, Praga.

Białek, J. (2022). Scanner data processing in a newest version of the PriceIndices package, Statistical Journal of the IAOS, 38 (4), 1369-1397, DOI: 10.3233/SJI-220963.

Białek, J. (2023). Scanner data processing and price index calculations in the PriceIndices R package, Slovak Statistics and Demography, 3, 7-20, ISNN: 1210-1095.

Installation

You can install the released version of PriceIndices from CRAN with:

install.packages("PriceIndices")

You can install the development version of PriceIndices from GitHub with:

library("remotes")
remotes::install_github("JacekBialek/PriceIndices")

The functionality of this package can be categorized as follows:

  1. Data sets included in the package and generating artificial scanner data sets
  2. Functions for data processing
  3. Functions providing dataset characteristics
  4. Functions for bilateral unweighted price index calculations
  5. Functions for bilateral weighted price index calculations
  6. Functions for chain price index calculations
  7. Functions for multilateral price index calculations
  8. Functions for extending multilateral price indices by using splicing methods
  9. Functions for extending multilateral price indices by using the FBEW method
  10. Functions for extending multilateral price indices by using the FBMW method
  11. General functions for price index calculations
  12. Functions for comparisons of price indices
  13. Functions for price and quantity indicator calculations

Data sets included in the package and generating artificial scanner data sets

This package includes seven data sets: artificial and real.

1) dataAGGR

The first one, dataAGGR, can be used to demonstrate the data_aggregating function. This is a collection of artificial scanner data on milk products sold in three different months and it contains the following columns: time - dates of transactions (Year-Month-Day: 4 different dates); prices - prices of sold products (PLN); quantities - quantities of sold products (liters); prodID - unique product codes (3 different prodIDs); retID - unique codes identifying outlets/retailer sale points (4 different retIDs); description - descriptions of sold products (two subgroups: goat milk, powdered milk).

2) dataMATCH

The second one, dataMATCH, can be used to demonstrate the data_matching function and it will be described in the next part of the guidelines. Generally, this artificial data set contains the following columns: time - dates of transactions (Year-Month-Day); prices - prices of sold products; quantities - quantities of sold products; codeIN - internal product codes from the retailer; codeOUT - external product codes, e.g. GTIN or SKU in the real case; description - descriptions of sold products, eg. 'product A', 'product B', etc.

3) dataCOICOP

The third one, dataCOICOP, is a ollection of real scanner data on the sale of milk products sold in a period: Dec, 2020 - Feb, 2022. It is a data frame with 10 columns and 139600 rows. The used variables are as follows: time - dates of transactions (Year-Month-Day); prices - prices of sold products (PLN); quantities - quantities of sold products; description - descriptions of sold products (original: in Polish); codeID - retailer product codes; retID - IDs of retailer outlets; grammage - product grammages; unit - sales units, e.g. 'kg', 'ml', etc.; category - product categories (in English) corresponding to COICOP 6 levels; coicop6 - identifiers of local COICOP 6 groups (6 levels). Please note that this data set can serve as a training or testing set in product classification using machine learning methods (see the functions: model_classification and data_classifying).

4) data_DOWN_UP_SIZED

This data set, data_DOWN_UP_SIZED, is a collection of scanner data on the sale of coffee in the period from January 2024 to February 2024 and it contains downsized products (see the shrinkflation function). It is a data frame with 6 columns and 51 rows. The used variables are as follows: time - dates of transactions (Year-Month-Day), prices - prices of sold products [PLN], quantities - quantities of sold products (in units resulting the product description), codeIN - unique internal product codes (retaler product codes), codeOUT - unique external product codes (e.g. GTIN, EAN, SKU), description - descriptions of sold coffee products.

5) milk

This data set, milk, is a collection of scaner data on the sale of milk in one of Polish supermarkets in the period from December 2018 to August 2020. It is a data frame with 6 columns and 4386 rows. The used variables are as follows: time - dates of transactions (Year-Month-Day); prices - prices of sold products (PLN); quantities - quantities of sold products (liters); prodID - unique product codes obtained after product matching (data set contains 68 different prodIDs); retID - unique codes identifying outlets/retailer sale points (data set contains 5 different retIDs); description - descriptions of sold milk products (data set contains 6 different product descriptions corresponding to subgroups of the milk group).

6) coffee

This data set, coffee, is a collection of scanner data on the sale of coffee in one of Polish supermarkets in the period from December 2017 to October 2020. It is a data frame with 6 columns and 42561 rows. The used variables are as follows: time - dates of transactions (Year-Month-Day); prices - prices of sold products (PLN); quantities - quantities of sold products (kg); prodID - unique product codes obtained after product matching (data set contains 79 different prodIDs); retID - unique codes identifying outlets/retailer sale points (data set contains 20 different retIDs); description - descriptions of sold coffee products (data set contains 3 different product descriptions corresponding to subgroups of the coffee group).

7) sugar

This data set, sugar, is a collection of scanner data on the sale of coffee in one of Polish supermarkets in the period from December 2017 to October 2020. It is a data frame with 6 columns and 7666 rows. The used variables are as follows: time - dates of transactions (Year-Month-Day); prices - prices of sold products (PLN); quantities - quantities of sold products (kg); prodID - unique product codes obtained after product matching (data set contains 11 different prodIDs); retID - unique codes identifying outlets/retailer sale points (data set contains 20 different retIDs); description - descriptions of sold sugar products (data set contains 3 different product descriptions corresponding to subgroups of the sugar group).

8) dataU

This data set, dataU, is a collection of artificial scanner data on 6 products sold in Dec, 2018. Product descriptions contain the information about their grammage and unit. It is a data frame with 5 columns and 6 rows. The used variables are as follows: time - dates of transactions (Year-Month-Day); prices - prices of sold products (PLN); quantities - quantities of sold products (item); prodID - unique product codes; description - descriptions of sold products (data set contains 6 different product descriptions).

The set milk represents a typical data frame used in the package for most calculations and is organized as follows:

library(PriceIndices)
head(milk)
#>         time prices quantities prodID retID   description
#> 1 2018-12-01   8.78        9.0  14215  2210 powdered milk
#> 2 2019-01-01   8.78       13.5  14215  2210 powdered milk
#> 3 2019-02-01   8.78        0.5  14215  1311 powdered milk
#> 4 2019-02-01   8.78        8.0  14215  2210 powdered milk
#> 5 2019-03-01   8.78        0.5  14215  1311 powdered milk
#> 6 2019-03-01   8.78        1.5  14215  2210 powdered milk

Available subgroups of sold milk are

unique(milk$description)
#> [1] "powdered milk"             "low-fat milk pasteurized" 
#> [3] "low-fat milk UHT"          "full-fat milk pasteurized"
#> [5] "full-fat milk UHT"         "goat milk"

Generating artificial scanner data sets in the package

The package includes the generate function which provides an artificial scanner data sets where prices and quantities are lognormally distributed. The characteristics for these lognormal distributions are set by pmi, sigma, qmi and qsigma parameters. This function works for the fixed number of products and outlets (see n and r parameters). The generated data set is ready for further price index calculations. For instance:

dataset<-generate(pmi=c(1.02,1.03,1.04),psigma=c(0.05,0.09,0.02),
                  qmi=c(3,4,4),qsigma=c(0.1,0.1,0.15),
                  start="2020-01")
head(dataset)
#>         time prices quantities prodID retID
#> 1 2020-01-01   3.03         18      1     1
#> 2 2020-01-01   2.92         19      2     1
#> 3 2020-01-01   2.80         20      3     1
#> 4 2020-01-01   2.81         20      4     1
#> 5 2020-01-01   2.88         20      5     1
#> 6 2020-01-01   2.69         19      6     1

From the other hand you can use tindex function to obtain the theoretical value of the unweighted price index for lognormally distributed prices (the month defined by start parameter plays a role of the fixed base period). The characteristics for these lognormal distributions are set by pmi and sigma parameters. The ratio parameter is a logical parameter indicating how we define the theoretical unweighted price index. If it is set to TRUE then the resulting value is a ratio of expected price values from compared months; otherwise the resulting value is the expected value of the ratio of prices from compared months.The function provides a data frame consisting of dates and corresponding expected values of the theoretical unweighted price index. For example:

tindex(pmi=c(1.02,1.03,1.04),psigma=c(0.05,0.09,0.02),start="2020-01",ratio=FALSE)
#>      date   tindex
#> 1 2020-01 1.000000
#> 2 2020-02 1.012882
#> 3 2020-03 1.019131

The User may also generate an artificial scanner dataset where prices are lognormally distributed and quantities are calculated under the assumption that consumers have CES (Constant Elasticity of Substitution) preferences and their spending on all products is fixed (see the generate_CES function). Please watch the following example:

#Generating an artificial dataset (the elasticity of substitution is 1.25)
df<-generate_CES(pmi=c(1.02,1.03),psigma=c(0.04,0.03),
elasticity=1.25,start="2020-01",n=100,days=TRUE)
head(df)
#>         time prices quantities prodID retID
#> 1 2020-01-01   2.79  3.8527116      1     1
#> 2 2020-01-10   2.74  0.7998498      2     1
#> 3 2020-01-17   2.76  4.5477729      3     1
#> 4 2020-01-13   2.83  2.0286100      4     1
#> 5 2020-01-11   2.89  2.4075804      5     1
#> 6 2020-01-09   2.75  7.7025230      6     1

Now, we can verify the value of elasticity of substitution using this generated dataset:

#Verifying the elasticity of substitution
elasticity(df, start="2020-01",end="2020-02")
#> [1] 1.25

Functions for data processing

data_preparing

This function returns a prepared data frame based on the user's data set (you can check if your data set it is suitable for further price index calculation by using data_check function). The resulting data frame is ready for further data processing (such as data selecting, matching or filtering) and it is also ready for price index calculations (if only it contains the required columns). The resulting data frame is free from missing values, negative and (optionally) zero prices and quantities. As a result, the column time is set to be Date type (in format: 'Year-Month-01'), while the columns prices and quantities are set to be numeric. If the description parameter is set to TRUE then the column description is set to be character type (otherwise it is deleted). Please note that the milk set is an already prepared dataset but let us assume for a moment that we want to make sure that it does not contain missing values and we do not need the column description for further calculations. For this purpose, we use the data_preparing function as follows:

head(data_preparing(milk, time="time",prices="prices",quantities="quantities"))
#>         time prices quantities
#> 1 2018-12-01   8.78        9.0
#> 2 2019-01-01   8.78       13.5
#> 3 2019-02-01   8.78        0.5
#> 4 2019-02-01   8.78        8.0
#> 5 2019-03-01   8.78        0.5
#> 6 2019-03-01   8.78        1.5

data_imputing

This function imputes missing prices (unit values) and (optionally) zero prices by using carry forward/backward prices. The imputation can be done for each outlet separately or for aggragated data (see the outlets parameter). If a missing product has a previous price then that previous price is carried forward until the next real observation. If there is no previous price then the next real observation is found and carried backward. The quantities for imputed prices are set to zeros. The function returns a data frame which is ready for price index calculations, for instance:

#Creating a data frame with zero prices (df)
data<-dplyr::filter(milk,time>=as.Date("2018-12-01") & time<=as.Date("2019-03-01"))
sample<-dplyr::sample_n(data, 100)
rest<-setdiff(data, sample)
sample$prices<-0
df<-rbind(sample, rest)
#The Fisher price index calculated for the original data set
fisher(df, "2018-12","2019-03")
#> [1] 0.9847432
#Zero price imputations:
df2<-data_imputing(df, start="2018-12", end="2019-03",
              zero_prices=TRUE,
              outlets=TRUE)
#The Fisher price index calculated for the data set with imputed prices:
fisher(df2, "2018-12","2019-03")
#> [1] 0.984159

data_aggregating

The function aggregates the user's data frame over time and/or over outlets. Consequently, we obtain monthly data, where the unit value is calculated instead of a price for each prodID observed in each month (the time column gets the Date format: "Year-Month-01"). If paramter join_outlets is TRUE, then the function also performs aggregation over outlets (retIDs) and the retID column is removed from the data frame. The main advantage of using this function is the ability to reduce the size of the data frame and the time needed to calculate the price index. For instance, let us consider the following data set:

dataAGGR
#>         time prices quantities prodID retID   description
#> 1 2018-12-01     10        100 400032  4313     goat milk
#> 2 2018-12-01     15        100 400032  1311     goat milk
#> 3 2018-12-01     20        100 400032  1311     goat milk
#> 4 2020-07-01     20        100 400050  1311     goat milk
#> 5 2020-08-01     30         50 400050  1311     goat milk
#> 6 2020-08-01     40         50 400050  2210     goat milk
#> 7 2018-12-01     15        200 403249  2210 powdered milk
#> 8 2018-12-01     15        200 403249  2210 powdered milk
#> 9 2018-12-01     15        300 403249  2210 powdered milk

After aggregating this data set over time and outlets we obtain:

data_aggregating(dataAGGR)
#> # A tibble: 4 x 4
#>   time       prodID prices quantities
#>   <date>      <int>  <dbl>      <int>
#> 1 2018-12-01 400032     15        300
#> 2 2018-12-01 403249     15        700
#> 3 2020-07-01 400050     20        100
#> 4 2020-08-01 400050     35        100

data_unit

The function returns the user's data frame with two additional columns: grammage and unit (both are character type). The values of these columns are extracted from product descriptions on the basis of provided units. Please note, that the function takes into consideration a sign of the multiplication, e.g. if the product description contains: '2x50 g', we will obtain: grammage: 100 and unit: g for that product (for multiplication set to 'x'). For example:

data_unit(dataU,units=c("g|ml|kg|l"),multiplication="x")
#>         time prices quantities prodID          description grammage unit
#> 1 2018-12-01   8.00        200  40033 drink 0,75l 3% corma     0.75    l
#> 2 2018-12-01   5.20        300  12333          sugar 0.5kg     0.50   kg
#> 3 2018-12-01  10.34        100  20345         milk 4x500ml  2000.00   ml
#> 4 2018-12-01   2.60        500  15700 xyz 3 4.34 xyz 200 g   200.00    g
#> 5 2018-12-01  12.00       1000  13022                  abc     1.00 item
#> 6 2019-01-01   3.87        250  10011  ABC 2A/45 350 g mnk   350.00    g

data_norm

The function returns the user's data frame with two transformed columns: grammage and unit, and two rescaled columns: prices and quantities. The above-mentioned transformation and rescaling take into consideration the user rules. Recalculated prices and quantities concern grammage units defined as the second parameter in the given rule. For instance:

# Preparing a data set
data<-data_unit(dataU,units=c("g|ml|kg|l"),multiplication="x")
# Normalization of grammage units
data_norm(data, rules=list(c("ml","l",1000),c("g","kg",1000)))
#>         time   prices quantities prodID          description grammage unit
#> 1 2018-12-01  5.17000      200.0  20345         milk 4x500ml     2.00    l
#> 2 2018-12-01 10.66667      150.0  40033 drink 0,75l 3% corma     0.75    l
#> 3 2018-12-01 13.00000      100.0  15700 xyz 3 4.34 xyz 200 g     0.20   kg
#> 4 2019-01-01 11.05714       87.5  10011  ABC 2A/45 350 g mnk     0.35   kg
#> 5 2018-12-01 10.40000      150.0  12333          sugar 0.5kg     0.50   kg
#> 6 2018-12-01 12.00000     1000.0  13022                  abc     1.00 item

data_selecting

The function returns a subset of the user's data set obtained by selection based on keywords and phrases defined by parameters: include, must and exclude (an additional column coicop is optional). Providing values of these parameters, please remember that the procedure distinguishes between uppercase and lowercase letters only when sensitivity is set to TRUE.

For instance, please use

subgroup1<-data_selecting(milk, include=c("milk"), must=c("UHT"))
head(subgroup1)
#>         time prices quantities prodID retID      description
#> 1 2018-12-01   2.97         78  17034  1311 low-fat milk uht
#> 2 2018-12-01   2.97        167  17034  2210 low-fat milk uht
#> 3 2018-12-01   2.97        119  17034  6610 low-fat milk uht
#> 4 2018-12-01   2.97         32  17034  7611 low-fat milk uht
#> 5 2018-12-01   2.97         54  17034  8910 low-fat milk uht
#> 6 2019-01-01   2.95         71  17034  1311 low-fat milk uht

to obtain the subset of milk limited to UHT category:

unique(subgroup1$description)
#> [1] "low-fat milk uht"  "full-fat milk uht"

You can use

subgroup2<-data_selecting(milk, must=c("milk"), exclude=c("past","goat"))
head(subgroup2)
#>         time prices quantities prodID retID   description
#> 1 2018-12-01   8.78        9.0  14215  2210 powdered milk
#> 2 2019-01-01   8.78       13.5  14215  2210 powdered milk
#> 3 2019-02-01   8.78        0.5  14215  1311 powdered milk
#> 4 2019-02-01   8.78        8.0  14215  2210 powdered milk
#> 5 2019-03-01   8.78        0.5  14215  1311 powdered milk
#> 6 2019-03-01   8.78        1.5  14215  2210 powdered milk

to obtain the subset of milk with products which are not pasteurized and which are not goat:

unique(subgroup2$description)
#> [1] "powdered milk"     "low-fat milk uht"  "full-fat milk uht"

data_classifying

This function predicts product COICOP levels (or any other defined product levels) using the selected machine learning model (see the model parameter). It provides the indicated data set with an additional column, i.e. class_predicted. The selected model must be built previously (see the model_classification function) and after the training process it can be saved on your disk (see the save_model function) and then loaded at any time (see the load_model function). Please note that the machine learning process is based on the XGBoost algorithm (from the XGBoost package) which is an implementation of gradient boosted decision trees designed for speed and performance. For example, let us build a machine learning model

my.grid=list(eta=c(0.01,0.02,0.05),subsample=c(0.5,0.8))
data_train<-dplyr::filter(dataCOICOP,dataCOICOP$time<=as.Date("2021-10-01"))
data_test<-dplyr::filter(dataCOICOP,dataCOICOP$time==as.Date("2021-11-01"))
ML<-model_classification(data_train,
                         data_test,
                         class="coicop6",
                         grid=my.grid,
                         indicators=c("description","codeIN","grammage"),
                         key_words=c("uht"), 
                         rounds=60)

We can watch the results of the whole training process:

ML$figure_training

or we can observe the importance of the used indicators:

ML$figure_importance

Now, let us save the model on the disk. After saving the model we can load it and use at any time:

#Setting a temporary directory as a working directory
wd<-tempdir()
setwd(wd)
#Saving and loading the model
save_model(ML, dir="My_model")
ML_fromPC<-load_model("My_model")
#Prediction
data_predicted<-data_classifying(ML_fromPC, data_test)
head(data_predicted)
#>         time prices quantities                            description codeIN
#> 1 2021-11-01   3.03        379 g/wydojone mleko bez laktozyuht 3,2%1l  60001
#> 2 2021-11-01   3.03        856 g/wydojone mleko bez laktozyuht 3,2%1l  60001
#> 3 2021-11-01   3.03        369 g/wydojone mleko bez laktozyuht 3,2%1l  60001
#> 4 2021-11-01   3.03        617 g/wydojone mleko bez laktozyuht 3,2%1l  60001
#> 5 2021-11-01   3.03        613 g/wydojone mleko bez laktozyuht 3,2%1l  60001
#> 6 2021-11-01   3.03        261 g/wydojone mleko bez laktozyuht 3,2%1l  60001
#>   retID grammage unit       category coicop6 class_predicted
#> 1     2        1    l UHT whole milk 11411_1         11411_1
#> 2     3        1    l UHT whole milk 11411_1         11411_1
#> 3     4        1    l UHT whole milk 11411_1         11411_1
#> 4     5        1    l UHT whole milk 11411_1         11411_1
#> 5     6        1    l UHT whole milk 11411_1         11411_1
#> 6     7        1    l UHT whole milk 11411_1         11411_1

data_matching

If you have a dataset with information about products sold but they are not matched you can use the data_matching function. In an optimal situation, your data frame contains the codeIN, codeOUT and description columns (see documentation), which in practice will contain retailer codes, GTIN or SKU codes and product labels, respectively. The data_matching function returns a data set defined in the first parameter (data) with an additional column (prodID). Two products are treated as being matched if they have the same prodID value. The procedure of generating the above-mentioned additional column depends on the set of chosen columns for matching (see documentation for details). For instance, let us suppose you want to obtain matched products from the following, artificial data set:

head(dataMATCH)
#>         time    prices quantities codeIN codeOUT retID description
#> 1 2018-12-01  9.416371        309      1       1     1   product A
#> 2 2019-01-01  9.881875        325      1       5     1   product A
#> 3 2019-02-01 12.611826        327      1       1     1   product A
#> 4 2018-12-01  9.598252        309      3       2     1   product A
#> 5 2019-01-01  9.684900        325      3       2     1   product A
#> 6 2019-02-01  9.358420        327      3       2     1   product A

Let us assume that products with two identical codes (codeIN and codeOUT) or one of the codes identical and an identical description are automatically matched. Products are also matched if they have one of the codes identical and the Jaro-Winkler similarity of their descriptions is bigger than the fixed precision value (see documentation - Case 1). Let us also suppose that you want to match all products sold in the interval: December 2018 - February 2019. If you use the data_matching function (as below), an additional column (prodID) will be added to your data frame:

data1<-data_matching(dataMATCH, start="2018-12",end="2019-02", codeIN=TRUE, codeOUT=TRUE, precision=.98, interval=TRUE)
head(data1)
#>         time    prices quantities codeIN codeOUT retID description prodID
#> 1 2018-12-01  9.416371        309      1       1     1   product A      4
#> 2 2019-01-01  9.881875        325      1       5     1   product A      4
#> 3 2019-02-01 12.611826        327      1       1     1   product A      4
#> 4 2018-12-01  9.598252        309      3       2     1   product A      8
#> 5 2019-01-01  9.684900        325      3       2     1   product A      8
#> 6 2019-02-01  9.358420        327      3       2     1   product A      8

Let us now suppose you do not want to consider codeIN while matching and that products with an identical description are to be matched too:

data2<-data_matching(dataMATCH, start="2018-12",end="2019-02", 
                     codeIN=FALSE, onlydescription=TRUE, interval=TRUE)
head(data2)
#>         time    prices quantities codeIN codeOUT retID description prodID
#> 1 2018-12-01  9.416371        309      1       1     1   product A      7
#> 2 2019-01-01  9.881875        325      1       5     1   product A      7
#> 3 2019-02-01 12.611826        327      1       1     1   product A      7
#> 4 2018-12-01  9.598252        309      3       2     1   product A      7
#> 5 2019-01-01  9.684900        325      3       2     1   product A      7
#> 6 2019-02-01  9.358420        327      3       2     1   product A      7

Now, having a prodID column, your datasets are ready for further price index calculations, e.g.:

fisher(data1, start="2018-12", end="2019-02")
#> [1] 1.018419
jevons(data2, start="2018-12", end="2019-02")
#> [1] 1.074934

data_filtering

This function returns a filtered data set, i.e. a reduced user's data frame with the same columns and rows limited by a criterion defined by the filters parameter (see documentation). If the set of filters is empty then the function returns the original data frame (defined by the data parameter). On the other hand, if all filters are chosen, i.e. filters=c(extremeprices, dumpprices, lowsales), then these filters work independently and a summary result is returned. Please note that both variants of the extremeprices filter can be chosen at the same time, i.e. plimits and pquantiles, and they work also independently. For example, let us assume we consider three filters: filter1 is to reject 1% of the lowest and 1% of the highest price changes comparing March 2019 to December 2018, filter2 is to reject products with the price ratio being less than 0.5 or bigger than 2 in the same time, filter3 rejects the same products as filter2 rejects and also products with relatively low sale in compared months, filter4 rejects products with the price ratio being less than 0.9 and with the expenditure ratio being less than 0.8 in the same time.

filter1<-data_filtering(milk,start="2018-12",end="2019-03",
                        filters=c("extremeprices"),pquantiles=c(0.01,0.99))
filter2<-data_filtering(milk,start="2018-12",end="2019-03",
                        filters=c("extremeprices"),plimits=c(0.5,2))
filter3<-data_filtering(milk,start="2018-12",end="2019-03",
                        filters=c("extremeprices","lowsales"),plimits=c(0.5,2))
filter4<-data_filtering(milk,start="2018-12",end="2019-03",
                        filters=c("dumpprices"),dplimits=c(0.9,0.8))

These three filters differ from each other with regard to the data reduction level:

data_without_filters<-data_filtering(milk,start="2018-12",end="2019-03",filters=c())
nrow(data_without_filters)
#> [1] 413
nrow(filter1)
#> [1] 378
nrow(filter2)
#> [1] 381
nrow(filter3)
#> [1] 170
nrow(filter4)
#> [1] 374

You can also use data_filtering for each pair of subsequent months from the considered time interval under the condition that this filtering is done for each outlet (retID) separately, e.g.

filter1B<-data_filtering(milk,start="2018-12",end="2019-03",
                         filters=c("extremeprices"),pquantiles=c(0.01,0.99),
                         interval=TRUE, retailers=TRUE)
nrow(filter1B)
#> [1] 773

Two more useful functions are included for the procedure of scanner data. The first, data_reducing, returns a data set containing sufficiently numerous matched products in the indicated groups (see documentation). It reduces the dataset to only a representative set of products that have appeared in sufficient numbers in the sales offer:

sugar.<-dplyr::filter(sugar, time==as.Date("2018-12-01") | time==as.Date("2019-12-01"))
nrow(sugar.)
#> [1] 435
sugar_<-data_reducing(sugar., start="2018-12", end="2019-12",by="description", minN=5)
nrow(sugar_)
#> [1] 275

The second function, shrinkflation, detects and summarises downsized and upsized products. The function detects phenomena such as: shrinkflation, shrinkdeflation, sharkflation,unshrinkdeflation, unshrinkflation, sharkdeflation (see the type parameter). It returns a list containing the following objects: df_changes - data frame with detailed information on downsized and upsized products with the whole history of size changes, df_type - data frame with recognized type of products, df_overview - a table with basic summary of all detected products grouped by the type parameter, products_detected with prodIDs of products indicated by the type parameter, df_detected being a subset of the data frame with only detected products, df_reduced which is the difference of the input data frame and the data frame containing the detected products, and df_summary which provides basic statistics for all detected downsized and upsized products (including their share in the total number of products and mean price and size changes). For instance:

#Data matching over time
df<-data_matching(data=data_DOWN_UP_SIZED, start="2024-01", end="2024-02", 
                  codeIN=TRUE,codeOUT=TRUE,description=TRUE,
                  onlydescription=FALSE,precision=0.9,interval=FALSE)
# Extraction of information about grammage
df<-data_unit(df,units=c("g|ml|kg|l"),multiplication="x")
# Price standardization
df<-data_norm(df, rules=list(c("ml","l",1000),c("g","kg",1000)))
# Downsized and upsized products detection
result<-shrinkflation(data=df, start="2024-01","2024-02", prec=3, interval=FALSE, type="shrinkflation")
# result$df_changes
result$df_type
#>    IDs size_change price_orig_change price_norm_change     detected_type
#> 1    7     -20.000           -18.824             1.471     shrinkflation
#> 2   10     -15.000           -10.000             5.882     shrinkflation
#> 3   10     -19.048           -10.000            11.176     shrinkflation
#> 4   10     -14.286           -10.000             5.000     shrinkflation
#> 5   11      -2.500             1.040             3.632      sharkflation
#> 6   12      -4.000            -0.794             3.340     shrinkflation
#> 7   14     -10.000           -40.000           -33.333   shrinkdeflation
#> 8   16     -15.000            15.000            35.294      sharkflation
#> 9   18      20.000             5.000           -12.500 unshrinkdeflation
#> 10  20      25.000             5.557           -15.556 unshrinkdeflation
#> 11  22      12.500            37.777            22.469   unshrinkflation
#> 12  24      33.333            50.000            12.500   unshrinkflation
#> 13  26       5.000           -12.500           -16.666    sharkdeflation
#>                                                             descriptions
#> 1           coffee super 0,4 l ; coffee super 0,5 l , coffee super 0,4 l
#> 2  coffee ABC 200g ; coffee ABC 210g , coffee ABC 170g ; coffee ABC 180g
#> 3  coffee ABC 200g ; coffee ABC 210g , coffee ABC 170g ; coffee ABC 180g
#> 4  coffee ABC 200g ; coffee ABC 210g , coffee ABC 170g ; coffee ABC 180g
#> 5                              coffee GHI 2 x 400g , coffee GHI 2 x 390g
#> 6              coffee JKL 250 ml , coffee JKL 240 ml ; coffee JKL 250 ml
#> 7                                          coffee F 200g , coffee F 180g
#> 8                                          coffee G 200g , coffee G 170g
#> 9                                          coffee H 200g , coffee H 240g
#> 10                                         coffee M 400g , coffee M 500g
#> 11                                         coffee K 400g , coffee K 450g
#> 12                                         coffee L 300g , coffee L 400g
#> 13                                       coffee LX 200g , coffee LX 210g
#>                dates
#> 1  2024-01 , 2024-02
#> 2  2024-01 , 2024-02
#> 3  2024-01 , 2024-02
#> 4  2024-01 , 2024-02
#> 5  2024-01 , 2024-02
#> 6  2024-01 , 2024-02
#> 7  2024-01 , 2024-02
#> 8  2024-01 , 2024-02
#> 9  2024-01 , 2024-02
#> 10 2024-01 , 2024-02
#> 11 2024-01 , 2024-02
#> 12 2024-01 , 2024-02
#> 13 2024-01 , 2024-02
result$df_overview
#> # A tibble: 6 x 3
#>   `type of phenomenon detected` number of detected prod~1 shares [%] of detect~2
#>   <chr>                                             <int>                  <dbl>
#> 1 sharkdeflation                                        1                   7.69
#> 2 sharkflation                                          2                  15.4 
#> 3 shrinkdeflation                                       1                   7.69
#> 4 shrinkflation                                         3                  23.1 
#> 5 unshrinkdeflation                                     2                  15.4 
#> 6 unshrinkflation                                       2                  15.4 
#> # i abbreviated names: 1: `number of detected products`,
#> #   2: `shares [%] of detected products`
# result$products_detected
# result$df_detected
# result$df_reduced
result$df_summary
#>                                            stats           value
#> 1                       Detected product shares: ---------------
#> 2                         number of all products              13
#> 3                    number of detected products               3
#> 4                     share of detected products        23.077 %
#> 5                       turnover of all products          289430
#> 6                  turnover of detected products           73380
#> 7            turnover share of detected products        25.353 %
#> 8                              Average measures: ---------------
#> 9          mean size change of detected products       -14.467 %
#> 10        mean price change of detected products        -9.923 %
#> 11   mean unit price change of detected products         5.374 %
#> 12       median size change of detected products           -15 %
#> 13      median price change of detected products           -10 %
#> 14 median unit price change of detected products             5 %
#> 15                          Volatility measures: ---------------
#> 16             standard deviation of size change         6.354 %
#> 17            standard deviation of price change         6.375 %
#> 18       standard deviation of unit price change         3.655 %
#> 19         volatility coefficient of size change          -0.439
#> 20        volatility coefficient of price change          -0.642
#> 21   volatility coefficient of unit price change            0.68

Functions providing dataset characteristics

available

The function returns all values from the indicated column (defined by the type parameter) which occur at least once in one of compared periods or in a given time interval. Possible values of the type parameter are: retID, prodID, codeIN, codeOUT or description (see documentation). If the interval parameter is set to FALSE, then the function compares only periods defined by period1 and period2. Otherwise the whole time period between period1 and period2 is considered. For example:

available(milk, period1="2018-12", period2="2019-12", type="retID",interval=TRUE)
#> [1] 2210 1311 6610 7611 8910

matched

The function returns all values from the indicated column (defined by the type parameter) which occur simultaneously in the compared periods or in a given time interval.Possible values of the type parameter are: retID, prodID, codeIN, codeOUT or description (see documentation). If the interval parameter is set to FALSE, then the function compares only periods defined by period1 and period2. Otherwise the whole time period between period1 and period2 is considered. For example:

matched(milk, period1="2018-12", period2="2019-12", type="prodID",interval=TRUE)
#>  [1]  14216  15404  17034  34540  60010  70397  74431  82827  82830  82919
#> [11]  94256 400032 400033 400189 400194 400195 400196 401347 401350 402263
#> [21] 402264 402293 402569 402570 402601 402602 402609 403249 404004 404005
#> [31] 405419 405420 406223 406224 406245 406246 406247 407219 407220 407669
#> [41] 407670 407709 407859 407860 400099

matched_index

The function returns a ratio of values from the indicated column that occur simultaneously in the compared periods or in a given time interval to all available values from the above-mentioned column (defined by the type parameter) at the same time. Possible values of the type parameter are: retID, prodID, codeIN, codeOUT or description (see documentation). If the interval parameter is set to FALSE, then the function compares only periods defined by period1 and period2. Otherwise the whole time period between period1 and period2 is considered. The returned value is from 0 to 1. For example:

matched_index(milk, period1="2018-12", period2="2019-12", type="prodID",interval=TRUE)
#> [1] 0.7258065

matched_fig

The function returns a data frame or a figure presenting the matched_index function calculated for the column defined by the type parameter and for each month from the considered time interval. The interval is set by the start and end parameters. The returned object (data frame or figure) depends on the value of the figure parameter. Examples:

matched_fig(milk, start="2018-12", end="2019-12", type="prodID")
matched_fig(milk, start="2018-12", end="2019-04", type="prodID", figure=FALSE)
#>      date  fraction
#> 1 2018-12 1.0000000
#> 2 2019-01 0.9629630
#> 3 2019-02 0.9444444
#> 4 2019-03 0.9074074
#> 5 2019-04 0.8727273

products

This function detects and summarises available, matched, new and disappearing products on the basis of their prodIDs. It compares products from the base period (start) with products from the current period (end). It returns a list containing the following objects: details with prodIDs of available, matched, new and disappearing products, statistics with basic statistics for them and figure with a pie chart describing a contribution of matched, new and disappearing products in a set of available products. Please see the following example:

list<-products(milk, "2018-12","2019-12")
list$statistics
#>       products volume shares
#> 1    available     61 100.00
#> 2      matched     47  77.05
#> 3          new      8  13.11
#> 4 disappearing      6   9.84
list$figure

products_fig

This function returns a figure with plots of volume (or contributions) of available, matched, new as well as disappearing products. The User may control which groups of products are to be taken into consideration. Available options are available, matched, new and disappearing. Please follow the example:

products_fig(milk, "2018-12","2019-12", 
fixed_base=TRUE, contributions=FALSE,
show=c("new","disappearing","matched","available"))

prices

The function returns prices (unit value) of products with a given ID (prodID column) and being sold in the time period indicated by the period parameter. The set parameter means a set of unique product IDs to be used for determining prices of sold products. If the set is empty the function returns prices of all products being available in the period. Please note that the function returns the price values for sorted prodIDs and in the absence of a given prodID in the data set, the function returns nothing (it does not return zero).To get prices (unit values) of all available milk products sold in July, 2019, please use:

prices(milk, period="2019-06")
#>  [1]  8.700000  8.669455  1.890000  2.950000  1.990000  2.990000  2.834464
#>  [8]  4.702051  2.163273  2.236250  2.810000  2.860000  2.400000  2.588644
#> [15]  3.790911  7.980000 64.057143  7.966336 18.972121 12.622225  9.914052
#> [22]  7.102823  3.180000  2.527874  1.810000  1.650548  2.790000  2.490000
#> [29]  2.590000  7.970131  9.901111 15.266667 19.502286  2.231947  2.674401
#> [36]  2.371819  2.490000  6.029412  6.441176  2.090000  1.990000  1.890000
#> [43]  1.450000  2.680000  2.584184  2.683688  2.390000  3.266000  2.813238

quantities

The function returns quantities of products with a given ID (prodID column) and being sold in the time period indicated by the period parameter. The set parameter means a set of unique product IDs to be used for determining prices of sold products. If the set is empty the function returns quantities of all products being available in the period. Please note that the function returns the quantity values for sorted prodIDs and in the absence of a given prodID in the data set, the function returns nothing (it does not return zero). To get a data frame containing quantities of milk products with prodIDs: 400032, 71772 and 82919, and sold in July, 2019, please use:

quantities(milk, period="2019-06", set=c(400032, 71772, 82919), ID=TRUE)
#> # A tibble: 3 x 2
#>       by     q
#>    <int> <dbl>
#> 1  71772  117 
#> 2  82919  102 
#> 3 400032  114.

sales

The function returns values of sales of products with a given ID (prodID column) and being sold in the time period indicated by period parameter. The set parameter means a set of unique product IDs to be used for determining prices of sold products. If the set is empty the function returns values of sales of all products being available in the period (see also expenditures function which returns the expenditure values for sorted prodIDs). To get values of sales of milk products with prodIDs: 400032, 71772 and 82919, and sold in July, 2019, please use:

sales(milk, period="2019-06", set=c(400032, 71772, 82919))
#> [1] 913.71 550.14 244.80

sales_groups

The function returns values of sales of products from one or more datasets or the corresponding barplot for these sales (if barplot is set to TRUE). Alternatively, it calculates the sale shares (if the shares parameter is set to TRUE). Please see also the sales_groups2 function. As an example, let us create 3 subgroups of milk products and let us find out their sale shares for the time interval: April, 2019 - July, 2019. We can obtain precise values for the given period:

ctg<-unique(milk$description)
categories<-c(ctg[1],ctg[2],ctg[3])
milk1<-dplyr::filter(milk, milk$description==categories[1])
milk2<-dplyr::filter(milk, milk$description==categories[2])
milk3<-dplyr::filter(milk, milk$description==categories[3])
sales_groups(datasets=list(milk1,milk2,milk3),start="2019-04", end="2019-07")
#> [1]  44400.76 152474.55 101470.76
sales_groups(datasets=list(milk1,milk2,milk3),start="2019-04", end="2019-07", shares=TRUE)
#> [1] 0.1488230 0.5110661 0.3401109

or a barplot presenting these results:

sales_groups(datasets=list(milk1,milk2,milk3),start="2019-04", end="2019-07", 
             barplot=TRUE, shares=TRUE, names=categories)

pqcor

The function returns Pearson's correlation coefficient for price and quantity of products with given IDs (defined by the set parameter) and sold in the period. If the set is empty, the function works for all products being available in the period. The figure parameter indicates whether the function returns a figure with a correlation coefficient (TRUE) or just a correlation coefficient (FALSE). For instance:

pqcor(milk, period="2019-05")
#> [1] -0.2047
pqcor(milk, period="2019-05",figure=TRUE)

pqcor_fig

The function returns Pearson's correlation coefficients between price and quantity of products with given IDs (defined by the set parameter) and sold in the time interval defined by the start and end parameters. If the set is empty the function works for all available products. Correlation coefficients are calculated for each month separately. Results are presented in tabular or graphical form depending on the figure parameter. Both cases are presented below:

pqcor_fig(milk, start="2018-12", end="2019-06", figure=FALSE)
#>      date correlation
#> 1 2018-12     -0.1835
#> 2 2019-01     -0.1786
#> 3 2019-02     -0.1805
#> 4 2019-03     -0.1956
#> 5 2019-04     -0.1972
#> 6 2019-05     -0.2047
#> 7 2019-06     -0.2037
pqcor_fig(milk, start="2018-12", end="2019-06")

dissimilarity

This function returns a value of the relative price (dSP) and/or quantity (dSQ) dissimilarity measure. In a special case, when the type parameter is set to pq, the function provides the value of dSPQ measure (relative price and quantity dissimilarity measure calculated as min(dSP,dSQ). For instance:

dissimilarity(milk, period1="2018-12",period2="2019-12",type="pq")
#> [1] 0.00004175192

dissimilarity_fig

This function presents values of the relative price and/or quantity dissimilarity measure over time. The user can choose a benchmark period (defined by benchmark) and the type of dissimilarity measure is to be calculated (defined by type). The obtained results of dissimilarities over time can be presented in a dataframe form or via a figure (the default value of figure is TRUE which results a figure). For instance:

dissimilarity_fig(milk, start="2018-12",end="2019-12",type="pq",benchmark="start")

elasticity

This function returns a value of the elasticity of substitution. If the method parameter is set to lm (it is a default value), the procedure of estimation solves the equation: LM(sigma)-CW(sigma)=0 numerically, where LM denotes the Lloyd-Moulton price index, the CW denotes a current weight counterpart of the Lloyd-Moulton price index, and sigma is the elasticity of substitution parameter, which is estimated. If the method parameter is set to f, the Fisher price index formula is used instead of the CW price index. If the method parameter is set to t, the Tornqvist price index formula is used instead of the CW price index. If the method parameter is set to w, the Walsh price index formula is used instead of the CW price index. If the method parameter is set to sv, the Sato-Vartia price index formula is used instead of the CW price index.The procedure continues until the absolute value of this difference is greater than the value of the 'precision' parameter. For example:

elasticity(coffee, start = "2018-12", end = "2019-01")
#> [1] 4.241791

elasticity_fig

The function provides a data frame or a figure presenting elasticities of substitution calculated for time interval (see the figure parameter). The elasticities of substitution can be calculated for subsequent months or for a fixed base month (see the start parameter) and rest of months from the given time interval (it depends on the fixedbase parameter). The presented function is based on the elasticity function. For instance, to get elasticities of substitution calculated for milk products for subsequent months we run:

elasticity_fig (milk,start="2018-12",end="2019-04",figure=TRUE, 
method=c("lm","f","sv"),names=c("LM","Fisher", "SV"))

Functions for bilateral unweighted price index calculation

This package includes 7 functions for calculating the following bilateral unweighted price indices:

Price IndexFunction
BMW (2007)bmw
Carli (1804)carli
CSWD (1980,1992)cswd
Dutot (1738)dutot
Jevons (1865)jevons
Harmonicharmonic
Dikhanov (2021, 2024)dikhanov

Each of these functions returns a value (or vector of values) of the choosen unweighted bilateral price index depending on the interval parameter. If the interval parameter is set to TRUE, the function returns a vector of price index values without dates. To get information about both price index values and corresponding dates please see general functions: price_indices or final_index. None of these functions takes into account aggregating over outlets or product subgroups (to consider these types of aggregating please use the final_index function.) Below are examples of calculations for the Jevons index (in the second case a fixed base month is set to December 2018):

jevons(milk, start="2018-12", end="2020-01")
#> [1] 1.028223
jevons(milk, start="2018-12", end="2020-01", interval=TRUE)
#>  [1] 1.0000000 1.0222661 1.0300191 1.0353857 1.0075504 1.0395393 0.9853148
#>  [8] 1.0053100 1.0033727 1.0177604 1.0243906 1.0086291 1.0249373 1.0282234

Functions for bilateral weighted price index calculation

This package includes 30 functions for calculating the following bilateral weighted price indices:

Price IndexFunction
AG Mean (2009)agmean
Banajree (1977)banajree
Bialek (2012,2013)bialek
Davies (1924)davies
Drobisch (1871)drobisch
Fisher (1922)fisher
Geary-Khamis (1958,1970)geary_khamis
Geo-Laspeyresgeolaspeyres
Geo-Lowegeolowe
Geo-Paaschegeopaasche
Geo-Younggeoyoung
Geo-hybrid (2020)geohybrid
Hybrid (2020)hybrid
Laspeyres (1871)laspeyres
Lehr (1885)lehr
Lloyd-Moulton (1975,1996)lloyd_moulton
Lowelowe
Marshall-Edgeworth (1887)marshall_edgeworth
Paasche (1874)paasche
Palgrave (1886)palgrave
Sato-Vartia (1976)sato_vartia
Stuvel (1957)stuvel
Tornqvist (1936)tornqvist
Vartia (1976)vartia
Walsh (1901)walsh
Youngyoung
Quadratic mean of order r price indexQMp
Implicit quadratic mean of order r price indexIQMp
Value Indexvalue_index
Unit Value Indexunit_value_index

and the general quadratic mean of order r quantity index: QMq.

Each of these functions returns a value (or vector of values) of the choosen weighted bilateral price index depending on the interval parameter. If interval parameter is set to TRUE, the function returns a vector of price index values without dates. To get information about both price index values and corresponding dates please see general functions: price_indices or final_index. None of these functions takes into account aggregating over outlets or product subgroups (to consider these types of aggregating please use the final_index function.) Below are examples of calculations for the Fisher, the Lloyd-Moulton and the Lowe indices (in the last case, the fixed base month is set to December 2019 and the prior period is December 2018):

fisher(milk, start="2018-12", end="2020-01")
#> [1] 0.9615501
lloyd_moulton(milk, start="2018-12", end="2020-01", sigma=0.9)
#> [1] 0.9835069
lowe(milk, start="2019-12", end="2020-02", base="2018-12", interval=TRUE)
#> [1] 1.0000000 0.9880546 1.0024443

Functions for chain price index calculation

This package includes 36 functions for calculating the following chain indices (weighted and unweighted):

Price IndexFunction
Chain BMWchbmw
Chain Carlichcarli
Chain CSWDchcswd
Chain Dutotchdutot
Chain Jevonschjevons
Chain Harmonicchharmonic
Chain Dikhanovchdikhanov
Chain AG Meanchagmean
Chain Banajreechbanajree
Chain Bialekchbialek
Chain Davieschdavies
Chain Drobischchdrobisch
Chain Fisherchfisher
Chain Geary-Khamischgeary_khamis
Chain Geo-Laspeyreschgeolaspeyres
Chain Geo-Lowechgeolowe
Chain Geo-Paaschechgeopaasche
Chain Geo-Youngchgeoyoung
Chain Geo-hybridchgeohybrid
Chain Hybridchhybrid
Chain Laspeyreschlaspeyres
Chain Lehrchlehr
Chain Lloyd-Moultonchlloyd_moulton
Chain Lowechlowe
Chain Marshall-Edgeworthchmarshall_edgeworth
Chain Paaschechpaasche
Chain Palgravechpalgrave
Chain Sato-Vartiachsato_vartia
Chain Stuvelchstuvel
Chain Tornqvistchtornqvist
Chain Vartiachvartia
Chain Walshchwalsh
Chain Youngchyoung
Chain quadratic mean of order r price indexchQMp
Chain implicit quadratic mean of order r price indexchIQMp
Chain quadratic mean of order r quantity indexchQMq

Each time, the interval parameter has a logical value indicating whether the function is to compare the research period defined by end to the base period defined by start (then interval is set to FALSE and it is a default value) or all fixed base indices are to be calculated. In this second case, all months from the time interval <start,end> are considered and start defines the base period (interval is set to TRUE). Here are examples for the Fisher chain index:

chfisher(milk, start="2018-12", end="2020-01")
#> [1] 0.9618094
chfisher(milk, start="2018-12", end="2020-01", interval=TRUE)
#>  [1] 1.0000000 1.0021692 1.0004617 0.9862756 0.9944042 0.9915704 0.9898026
#>  [8] 0.9876325 0.9981591 0.9968851 0.9786428 0.9771951 0.9874251 0.9618094

Functions for multilateral price index calculation

This package includes 22 functions for calculating multilateral price indices and one additional and general function (QU) which calculates the quality adjusted unit value index, i.e.:

Price IndexFunction
CCDIccdi
GEKSgeks
WGEKSwgeks
GEKS-Jgeksj
GEKS-Wgeksw
GEKS-Lgeksl
WGEKS-Lwgeksl
GEKS-GLgeksgl
WGEKS-GLwgeksgl
GEKS-AQUgeksaqu
WGEKS-AQUwgeksaqu
GEKS-AQIgeksaqi
WGEKS-AQIwgeksaqi
GEKS-GAQIgeksgaqi
GEKS-IQMgeksiqm
GEKS-QMgeksqm
GEKS-LMgekslm
WGEKS-GAQIwgeksgaqi
Geary-Khamisgk
Quality Adjusted Unit ValueQU
Time Product Dummytpd
Unweighted Time Product Dummyutpd
SPQSPQ

The above-mentioned 21 multilateral formulas (the SPQ index is an exception) consider the time window defined by the wstart and window parameters, where window is a length of the time window (typically multilateral methods are based on a 13-month time window). It measures the price dynamics by comparing the end period to the start period (both start and end must be inside the considered time window). To get information about both price index values and corresponding dates, please see functions: price_indices or final_index. These functions do not take into account aggregating over outlets or product subgroups (to consider these types of aggregating please use function: final_index ). Here are examples for the GEKS formula (see documentation):

geks(milk, start="2019-01", end="2019-04",window=10)
#> [1] 0.9912305
geksl(milk, wstart="2018-12", start="2019-03", end="2019-05")
#> [1] 1.002251

The user may decompose the GEKS-type indices. The m_decomposition function returns multiplicative decompositions of the selected GEKS-type indices. For instance:

milk.<-milk
milk.$prodID<-milk.$description
m_decomposition(milk., start="2018-12", end="2019-12",
                formula=c("geks","ccdi"))$multiplicative
#>                     product      GEKS      CCDI
#> 1         full-fat milk UHT 0.9901247 0.9901350
#> 2 full-fat milk pasteurized 0.9966904 0.9966895
#> 3                 goat milk 0.9999800 0.9999800
#> 4          low-fat milk UHT 1.0018916 1.0018830
#> 5  low-fat milk pasteurized 1.0035170 1.0035052
#> 6             powdered milk 1.0064838 1.0065987
#> 7     index value (product) 0.9986050 0.9987082

The QU function returns a value of the quality adjusted unit value index (QU index) for the given set of adjustment factors. An additional v parameter is a data frame with adjustment factors for at least all matched prodIDs. It must contain two columns: prodID with unique product IDs and value with corresponding adjustment factors (see documentation). The following example starts from creating a data frame which includes sample adjusted factors:

prodID<-base::unique(milk$prodID)
values<-stats::runif(length(prodID),1,2)
v<-data.frame(prodID,values)
head(v)
#>   prodID   values
#> 1  14215 1.696448
#> 2  14216 1.998574
#> 3  15404 1.811070
#> 4  17034 1.034246
#> 5  34540 1.176572
#> 6  51583 1.415943

and the next step is calculating the QU index which compares December 2019 to December 2018:

QU(milk, start="2018-12", end="2019-12", v)
#> [1] 0.9824469

Functions for extending multilateral price indices by using splicing methods

This package includes 21 functions for calculating splice indices:

Price IndexFunction
Splice CCDIccdi_splcie
Splice GEKSgeks_splice
Splice weighted GEKSwgeks_splice
Splice GEKS-Jgeksj_splice
Splice GEKS-Wgeksw_splice
Splice GEKS-Lgeksl_splice
Splice weighted GEKS-Lwgeksl_splice
Splice GEKS-GLgeksgl_splice
Splice weighted GEKS-GLwgeksgl_splice
Splice GEKS-AQUgeksaqu_splice
Splice weighted GEKS-AQUwgeksaqu_splice
Splice GEKS-AQIgeksaqi_splice
Splice weighted GEKS-AQIwgeksaqi_splice
Splice GEKS-GAQIgeksgaqi_splice
Splice weighted GEKS-GAQIwgeksgaqi_splice
Splice GEKS-IQMgeksiqm_splice
Splice GEKS-QMgeksqm_splice
Splice GEKS-LMgekslm_splice
Splice Geary-Khamisgk_splice
Splice Time Product Dummytpd_splice
Splice unweighted Time Product Dummyutpd_splice

These functions return a value (or values) of the selected multilateral price index extended by using window splicing methods (defined by the splice parameter

Copy Link

Version

Install

install.packages('PriceIndices')

Monthly Downloads

386

Version

0.2.3

License

GPL-3

Maintainer

Jacek Bia<c5><82>ek

Last Published

January 24th, 2025

Functions in PriceIndices (0.2.3)

bialek

Calculating the bilateral Bialek price index
ccdi_fbew

Extending the multilateral CCDI price index by using the FBEW method.
chQMp

Calculating the monthly chained quadratic mean of order r price index
ccdi_fbmw

Extending the multilateral CCDI price index by using the FBMW method.
chQMq

Calculating the monthly chained quadratic mean of order r quantity index
carli

Calculating the unweighted Carli price index
chIQMp

Calculating the monthly chained implicit quadratic mean of order r price index
chgeary_khamis

Calculating the monthly chained Geary-Khamis price index
chdutot

Calculating the monthly chained Dutot price index
chbmw

Calculating the monthly chained BMW price index
chcarli

Calculating the monthly chained Carli price index
chgeolaspeyres

Calculating the monthly chained geo-logarithmic Laspeyres price index
chdrobisch

Calculating the monthly chained Drobisch price index
chfisher

Calculating the monthly chained Fisher price index
chgeohybrid

Calculating the the monthly chained geohybrid price index
chgeopaasche

Calculating the monthly chained geo-logarithmic Paasche price index
chbialek

Calculating the monthly chained Bialek price index
chcswd

Calculating the monthly chained CSWD price index
chgeolowe

Calculating the monthly chained geometric Lowe price index
chagmean

Calculating the monthly chained AG Mean price index
chlaspeyres

Calculating the monthly chained Laspeyres price index
chgeoyoung

Calculating the monthly chained geometric Young price index
chpalgrave

Calculating the monthly chained Palgrave price index
chdikhanov

Calculating the monthly chained Dikhanov price index
chlehr

Calculating the monthly chained Lehr price index
chstuvel

Calculating the monthly chained Stuvel price index
chlloyd_moulton

Calculating the monthly chained Lloyd-Moulton price index
chsato_vartia

Calculating the monthly chained Vartia-II (Sato-Vartia) price index
chdavies

Calculating the monthly chained Davies price index
chharmonic

Calculating the monthly chained harmonic price index
chtornqvist

Calculating the monthly chained Tornqvist price index
chlowe

Calculating the monthly chained Lowe price index
dataAGGR

A small artificial scanner data set for a demonstration of data aggregation
compare_indices_jk

A general function to compare indices by using the jackknife method
data_preparing

Preparing a data set for further data processing or price index calculations
data_aggregating

Aggregating the user's data frame
compare_indices_list

A general function for graphical comparison of price indices
data_DOWN_UP_SIZED

An artificial data set on sold coffee
chvartia

Calculating the monthly chained Vartia-I price index
chwalsh

Calculating the monthly chained Walsh price index
data_reducing

Reducing products
chjevons

Calculating the monthly chained Jevons price index
chhybrid

Calculating the the monthly chained hybrid price index
data_classifying

Predicting product classes via the machine learning model
dissimilarity

Calculating the relative price and/or quantity dissimilarity measure between periods
chyoung

Calculating the monthly chained Young price index
dataCOICOP

A real scanner data set for the product classification
data_check

Checking the user's data frame
geks

Calculating the multilateral GEKS price index
dissimilarity_fig

Presenting the relative price and/or quantity dissimilarity measure over time
elasticity

Calculating the elasticity of substitution
cswd

Calculating the unweighted CSWD price index
compare_to_target

Calculating distances between considered price indices and the target price index
data_selecting

Selecting products from the user's data set for further price index calculations
dataMATCH

An artificial scanner data set for product matching
coffee

A real data set on sold coffee
elasticity_fig

Presenting elasticities of substitution for time interval
geksaqu_fbmw

Extending the multilateral GEKS-AQU price index by using the FBMW method.
geks_fbew

Extending the multilateral GEKS price index by using the FBEW method.
dataU

An artificial, small scanner data set
geks_splice

Extending the multilateral GEKS price index by using window splicing methods.
dikhanov

Calculating the unweighted Dikhanov price index
davies

Calculating the bilateral Davies price index
geksaqi_splice

Extending the multilateral GEKS-AQI price index by using window splicing methods.
geksj

Calculating the multilateral GEKS price index based on the Jevons formula (typical notation: GEKS-J)
data_unit

Providing information about the grammage and unit of products
expenditures

Providing expenditures of sold products
drobisch

Calculating the bilateral Drobisch price index
final_index

A general function to compute a final price index
geksgaqi_fbew

Extending the multilateral GEKS-GAQI price index by using the FBEW method.
geksgaqi

Calculating the multilateral GEKS-GAQI price index
geksaqi_fbmw

Extending the multilateral GEKS-AQI price index by using the FBMW method.
geksl_fbmw

Extending the multilateral GEKS-L price index by using the FBMW method.
geks_fbmw

Extending the multilateral GEKS price index by using the FBMW method.
geksj_fbew

Extending the multilateral GEKS-J price index by using the FBEW method.
chpaasche

Calculating the monthly chained Paasche price index
geksiqm

Calculating the multilateral GEKS-IQM price index
chbanajree

Calculating the monthly chained Banajree price index
dutot

Calculating the unweighted Dutot price index
geksl_splice

Extending the multilateral GEKS-L price index by using window splicing methods.
geksaqu_splice

Extending the multilateral GEKS-AQU price index by using window splicing methods.
geksgl

Calculating the multilateral GEKS-GL price index
chmarshall_edgeworth

Calculating the monthly chained Marshall-Edgeworth price index
data_norm

Normalization of grammage units and recalculation of prices and quantities with respect to these units
gekslm_fbew

Extending the multilateral GEKS-LM price index by using the FBEW method.
geksiqm_fbew

Extending the multilateral GEKS-IQM price index by using the FBEW method.
geksj_splice

Extending the multilateral GEKS-J price index by using window splicing methods.
geksj_fbmw

Extending the multilateral GEKS-J price index by using the FBMW method.
compare_distances

Calculating distances between price indices
geksw

Calculating the multilateral GEKS price index based on the Walsh formula (GEKS-W)
gk_splice

Extending the multilateral Geary-Khamis price index by using window splicing methods.
generate_CES

Generating an artificial scanner dataset in the CES model
geksl

Calculating the multilateral GEKS-L price index
gekslm

Calculating the multilateral GEKS-LM price index
geksgl_fbew

Extending the multilateral GEKS-GL price index by using the FBEW method.
geksqm

Calculating the multilateral GEKS-QM price index
generate

Generating an artificial scanner dataset
geksl_fbew

Extending the multilateral GEKS-L price index by using the FBEW method.
harmonic

Calculating the unweighted harmonic price index
geksw_fbew

Extending the multilateral GEKS-W price index by using the FBEW method.
mmontgomery

Calculating the multilateral Montgomery price and quantity indicators
geoyoung

Calculating the bilateral geometric Young price index
gk

Calculating the multilateral Geary-Khamis price index
model_classification

Building the machine learning model for product classification
geksqm_fbew

Extending the multilateral GEKS-QM price index by using the FBEW method.
geohybrid

Calculating the bilateral geohybrid price index
palgrave

Calculating the bilateral Palgrave price index
hybrid

Calculating the bilateral hybrid price index
geolaspeyres

Calculating the bilateral geo-logarithmic Laspeyres price index
pqcor

Providing a correlation coefficient for price and quantity of sold products
jevons

Calculating the unweighted Jevons price index
matched_fig

Providing a time dependent matched_index() function
lloyd_moulton

Calculating the bilateral Lloyd-Moulton price index
load_model

Loading the machine learning model from the disk
mbennet

Calculating the multilateral Bennet price and quantity indicators
matched_index

Providing the ratio of number of matched values from the indicated column to the number of all available values from this column
stuvel

Calculating the bilateral Stuvel price index
compare_indices_df

A function for graphical comparison of price indices
quantities

Providing quantities of sold products
sales_groups2

Providing information about sales of products
products_fig

Function for graphical comparison of available, matched, new as well as disappearing products.
sato_vartia

Calculating the bilateral Vartia-II (Sato-Vartia) price index
data_filtering

Filtering a data set for further price index calculations
tpd

Calculating the multilateral TPD price index
sugar

A real data set on sold sugar
geksaqi

Calculating the multilateral GEKS-AQI price index
data_matching

Matching products
geksgaqi_fbmw

Extending the multilateral GEKS-GAQI price index by using the FBMW method.
geksaqi_fbew

Extending the multilateral GEKS-AQI price index by using the FBEW method.
geksaqu

Calculating the multilateral GEKS-AQU price index
montgomery

Calculating the Montgomery price and quantity indicators
tpd_fbew

Extending the multilateral TPD price index by using the FBEW method.
tpd_fbmw

Extending the multilateral TPD price index by using the FBMW method.
tpd_splice

Extending the multilateral TPD price index by using window splicing methods.
milk

A real data set on sold milk
fisher

Calculating the bilateral Fisher price index
data_imputing

Imputing missing and (optionally) zero prices.
geary_khamis

Calculating the bilateral Geary-Khamis price index
geksgaqi_splice

Extending the multilateral GEKS-GAQI price index by using window splicing methods.
wgeks

Calculating the multilateral weighted WGEKS price index
paasche

Calculating the bilateral Paasche price index
wgeks_fbew

Extending the multilateral weighted GEKS price index by using the FBEW method.
gk_fbmw

Extending the multilateral Geary-Khamis price index by using the FBMW method.
wgeks_fbmw

Extending the multilateral weighted GEKS price index by using the FBMW method.
save_model

Saving the machine learning model on the disk
wgeksaqu_splice

Extending the multilateral weighted GEKS-AQU price index by using window splicing methods.
utpd_fbew

Extending the unweighted multilateral TPD price index by using the FBEW method.
wgeksaqi

Calculating the multilateral weighted WGEKS-AQI price index
geksiqm_fbmw

Extending the multilateral GEKS-IQM price index by using the FBMW method.
gekslm_fbmw

Extending the multilateral GEKS-LM price index by using the FBMW method.
prices

Providing prices (unit values) of sold products
wgeksaqu_fbmw

Extending the multilateral weighted GEKS-AQU price index by using the FBMW method.
geolowe

Calculating the bilateral geometric Lowe price index
utpd_fbmw

Extending the unweighted multilateral TPD price index by using the FBMW method.
shrinkflation

Detecting and summarising downsized and upsized products.
wgeksaqi_fbew

Extending the multilateral weighted GEKS-AQI price index by using the FBEW method.
gekslm_splice

Extending the multilateral GEKS-LM price index by using window splicing methods.
geksiqm_splice

Extending the multilateral GEKS-IQM price index by using window splicing methods.
tindex

Calculating theoretical (expected) values of the unweighted price index
wgeks_splice

Extending the multilateral weighted GEKS price index by using window splicing methods.
geopaasche

Calculating the bilateral geo-logarithmic Paasche price index
geksaqu_fbew

Extending the multilateral GEKS-AQU price index by using the FBEW method.
wgeksaqi_fbmw

Extending the multilateral weighted GEKS-AQI price index by using the FBMW method.
geksgl_splice

Extending the multilateral GEKS-GL price index by using window splicing methods.
vartia

Calculating the bilateral Vartia-I price index
wgeksaqi_splice

Extending the multilateral weighted GEKS-AQI price index by using window splicing methods.
geksgl_fbmw

Extending the multilateral GEKS-GL price index by using the FBMW method.
gk_fbew

Extending the multilateral Geary-Khamis price index by using the FBEW method.
marshall_edgeworth

Calculating the bilateral Marshall-Edgeworth price index
wgeksgl_fbmw

Extending the multilateral weighted GEKS-GL price index by using the FBMW method.
wgeksaqu

Calculating the multilateral weighted WGEKS-AQU price index
wgeksl_fbmw

Extending the multilateral weighted GEKS-L price index by using the FBMW method.
wgeksaqu_fbew

Extending the multilateral weighted GEKS-AQU price index by using the FBEW method.
wgeksgl

Calculating the multilateral weighted WGEKS-GL price index
wgeksgl_fbew

Extending the multilateral weighted GEKS-GL price index by using the FBEW method.
geksqm_splice

Extending the multilateral GEKS-QM price index by using window splicing methods.
geksqm_fbmw

Extending the multilateral GEKS-QM price index by using the FBMW method.
geksw_splice

Extending the multilateral GEKS-W price index by using window splicing methods.
geksw_fbmw

Extending the multilateral GEKS-W price index by using the FBMW method.
tornqvist

Calculating the bilateral Tornqvist price index
wgeksl_splice

Extending the multilateral weighted GEKS-L price index by using window splicing methods.
sales_groups

Providing information about sales of products from one or more datasets
laspeyres

Calculating the bilateral Laspeyres price index
wgeksgl_splice

Extending the multilateral weighted GEKS-GL price index by using window splicing methods.
lowe

Calculating the bilateral Lowe price index
lehr

Calculating the bilateral Lehr price index
utpd_splice

Extending the multilateral unweighted TPD price index by using window splicing methods.
wgeksgaqi_splice

Extending the multilateral weighted GEKS-GAQI price index by using window splicing methods.
matched

Providing values from the indicated column that occur simultaneously in the compared periods or in a given time interval.
young

Calculating the bilateral Young price index
price_indices

A general function to compute one or more price indices
pqcor_fig

Providing correlations between price and quantity of sold products
value_index

Calculating the value index
m_decomposition

Multiplicative decomposing the GEKS-type indices
wgeksgaqi_fbmw

Extending the multilateral weighted GEKS-GAQI price index by using the FBMW method.
walsh

Calculating the bilateral Walsh price index
wgeksgaqi

Calculating the multilateral weighted WGEKS-GAQI price index
products

Detecting and summarising available, matched, new and disappearing products.
sales

Providing values of product sales
unit_value_index

Calculating the unit value index
wgeksgaqi_fbew

Extending the multilateral weighted GEKS-GAQI price index by using the FBEW method.
utpd

Calculating the unweighted multilateral TPD price index
wgeksl

Calculating the multilateral weighted WGEKS-L price index
wgeksl_fbew

Extending the multilateral weighted GEKS-L price index by using the FBEW method.
SPQ

Calculating the multilateral SPQ price index
QU

Calculating the quality adjusted unit value index (QU index)
IQMp

Calculating the implicit quadratic mean of order r price index
available

Providing values from the indicated column that occur at least once in one of the compared periods or in a given time interval
QMp

Calculating the quadratic mean of order r price index
agmean

Calculating the bilateral AG Mean price index
ccdi_splice

Extending the multilateral CCDI price index by using window splicing methods.
bmw

Calculating the unweighted BMW price index
bennet

Calculating the Bennet price and quantity indicators
banajree

Calculating the bilateral Banajree price index
QMq

Calculating the quadratic mean of order r quantity index
ccdi

Calculating the multilateral GEKS price index based on the Tornqvist formula (typical notation: GEKS-T or CCDI)
PriceIndices

The list of package functions and their demonstration