Learn R Programming

regions

"

Installation

You can install the development version from GitHub with:

devtools::install_github("rOpenGov/regions")

or the released version from CRAN:

install.packages("regions")

You can review the complete package documentation on regions.dataobservaotry.eu. If you find any problems with the code, please raise an issue on Github. Pull requests are welcome if you agree with the Contributor Code of Conduct

If you use regions in your work, please cite the package.

Working with Sub-national Statistics

In international comparison, using nationally aggregated indicators often have many disadvantages, which result from the very different levels of homogeneity, but also from the often very limited observation numbers in a cross-sectional analysis. When comparing European countries, a few missing cases can limit the cross-section of countries to around 20 cases which disallows the use of many analytical methods. Working with sub-national statistics has many advantages: the similarity of the aggregation level and high number of observations can allow more precise control of model parameters and errors, and the number of observations grows from 20 to 200-300.

Yet the change from national to sub-national level comes with a huge data processing price. While national boundaries are relatively stable, with only a handful of changes in each recent decade. The change of national boundaries requires a more-or-less global consensus. But states are free to change their internal administrative boundaries, and they do it with large frequency. This means that the names, identification codes and boundary definitions of sub-national regions change very frequently. Joining data from different sources and different years can be very difficult.

There are numerous advantages of switching from a national level of the analysis to a sub-national level comes with a huge price in data processing, validation and imputation. The regions package aims to help this process.

This package is an offspring of the eurostat package on rOpenGov. It started as a tool to validate and re-code regional Eurostat statistics, but it aims to be a general solution for all sub-national statistics. It will be developed parallel with other rOpenGov packages.

Sub-national Statistics Have Many Challenges

Frequent boundary changes: as opposed to national boundaries, the territorial units, typologies are often change, and this makes the validation and recoding of observation necessary across time. For example, in the European Union, sub-national typologies change about every three years and you have to make sure that you compare the right French region in time, or, if you can make the time-wise comparison at all.

library(regions)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
example_df <- data.frame ( 
  geo  =  c("FR", "DEE32", "UKI3" ,
            "HU12", "DED", 
            "FRK"), 
  values = runif(6, 0, 100 ),
  stringsAsFactors = FALSE )

recode_nuts(dat = example_df, 
            nuts_year = 2013) %>%
  select ( .data$geo, .data$values, .data$code_2013) %>%
  knitr::kable()
geovaluescode_2013
FR25.502412FR
UKI32.070591UKI3
DED40.306681DED
FRK42.378169FR7
HU1248.944542NA
DEE3298.442229NA

Hierarchical aggregation and special imputation: missingness is very frequent in sub-national statistics, because they are created with a serious time-lag compared to national ones, and because they are often not back-casted after boundary changes. You cannot use standard imputation algorithms because the observations are not similarly aggregated or averaged. Often, the information is seemingly missing, and it is present with an obsolete typology code. This is a basic example which shows you how to impute data from a larger territorial unit, such as a national statistic, to lower territorial units:

library(regions)

upstream <- data.frame ( 
   country_code =  rep("AU", 2),
   year         = c(2019:2020),
   my_var       = c(10,12)
   )

downstream <- australia_states

imputed <- impute_down ( 
   upstream_data  = upstream,
   downstream_data = downstream,
   country_var = "country_code",
   regional_code = "geo_code",
   values_var = "my_var",
   time_var = "year" )

knitr::kable(imputed)
geo_codeyeargeo_namecountry_codemy_varmethod
AU-NSW2019New South Wales stateAU10imputed from AU actual
AU-QLD2019Queensland stateAU10imputed from AU actual
AU-SA2019South Australia stateAU10imputed from AU actual
AU-TAS2019Tasmania stateAU10imputed from AU actual
AU-VIC2019Victoria stateAU10imputed from AU actual
AU-WA2019Western Australia stateAU10imputed from AU actual
AU-ACT2019Australian Capital Territory territoryAU10imputed from AU actual
AU-NT2019Northern Territory territoryAU10imputed from AU actual
AU-NSW2020New South Wales stateAU12imputed from AU actual
AU-QLD2020Queensland stateAU12imputed from AU actual
AU-SA2020South Australia stateAU12imputed from AU actual
AU-TAS2020Tasmania stateAU12imputed from AU actual
AU-VIC2020Victoria stateAU12imputed from AU actual
AU-WA2020Western Australia stateAU12imputed from AU actual
AU-ACT2020Australian Capital Territory territoryAU12imputed from AU actual
AU-NT2020Northern Territory territoryAU12imputed from AU actual

Package functionality

  • Generic vocabulary translation and joining functions for geographically coded data
  • Keeping track of the boundary changes within the European Union between 1999-2024
  • Vocabulary translation and joining functions for standardized European Union statistics
  • Vocabulary translation for the ISO-3166-2 based Google data and the European Union
  • Imputation functions from higher aggregation hierarchy levels to lower ones, for example from NUTS1 to NUTS2 or from ISO-3166-1 to ISO-3166-2 (impute down)
  • Imputation functions from lower hierarchy levels to higher ones (impute up)
  • Aggregation function from lower hierarchy levels to higher ones, for example from NUTS3 to NUTS1 or from ISO-3166-2 to ISO-3166-1 (aggregate; under development)
  • Disaggregation functions from higher hierarchy levels to lower ones, again, for example from NUTS1 to NUTS2 or from ISO-3166-1 to ISO-3166-2 (disaggregate; under development)

We started building an experimental APIs data is running regions regularly and improving known statistical data sources. See: Digital Music Observatory, Green Deal Data Observatory, Economy Data Observatory.

Vignettes / Articles

Contributors

Thanks for @KKulma for the improved continous integration on Github.

Code of Conduct

Please note that the regions project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('regions')

Monthly Downloads

2,951

Version

0.1.8

License

GPL-3

Maintainer

Last Published

June 21st, 2021

Functions in regions (0.1.8)

nuts_recoded

European Union: Recoded NUTS units 1995-2021.
validate_typology

Validate typology Parameter
validate_parameters

Assertion for Correct Function Calls
nuts_exceptions

NUTS Coding Exceptions
nuts_lau_2019

European Union: NUTS And LAU Correspondence
validate_data_frame

Validate Parameter 'dat'
google_nuts_matchtable

Google Mobility Report European Correspondence Table
regions

regions: A package for working with regional statistics.
validate_geo_code

Validate Conformity with NUTS Geo Codes (vector)
validate_nuts_regions

Validate Conformity With NUTS Geo Codes
validate_nuts_countries

Validate Conformity with NUTS Country Codes
%>%

Pipe operator
recode_nuts

Recode Region Codes From Source To Target NUTS Typology
regional_rd_personnel

R&D Personnel by NUTS 2 Regions
validate_param

Validate Mandatory Parameters
get_country_code

Get Country Code Of Regions
australia_states

Australia: States And Territories
all_valid_nuts_codes

European Union: All Valid NUTS Codes
impute_down

Imputing Data From Larger To Smaller Units
impute_down_nuts

Imputing Data From Larger To Smaller Units in the EU NUTS
nuts_changes

European Union: Recoded NUTS units 1995-2021.
mixed_nuts_example

Example Data Frame: Mixed EU Typologies.
daily_internet_users

Daily Internet Users
create_nuts_lau_2019

Create the nuts_lau_2019 correspondence table May be used to create similar historical correspondence tables.