pop.aggregate: Aggregation of Population Projections

Description

Aggregation of existing countries' population projections into projections of given regions, and accessing such aggregations.

Usage

pop.aggregate(pop.pred, regions, 
    input.type = c("country", "region"), name = input.type,
    inputs = list(e0F.sim.dir = NULL, e0M.sim.dir = "joint_", tfr.sim.dir = NULL),
    my.location.file = NULL, verbose = FALSE, ...)
    
get.pop.aggregation(sim.dir = NULL, pop.pred = NULL, name = NULL, 
    write.to.cache = TRUE)
    
pop.aggregate.subnat(pop.pred, regions, locations, ..., verbose = FALSE)

Value

Object of class bayesPop.prediction containing the aggregated results. In addition it contains elements aggregation.method giving the input.type used, and aggregated.countries which is a list of countries aggregated for each region.

Arguments

pop.pred

Object of class bayesPop.prediction containing country-specific population projections.

regions

Vector of numerical codes of regions. It should correspond to values in the column “country_code” in the UNlocations dataset or in my.location.file (see below). For pop.aggregate.subnat it is a numerical code of a country over which subregions are aggregated.

input.type

There are two methods for aggregating projections depending on the type of inputs, “country”- and “region”-based, see Details.

name

Name of the aggregation. It becomes a part of a directory name where aggregation results are stored.

inputs

This argument is only used when the “region”-based method is selected. It is a list of inputs of probabilistic components of the projection:

e0F.sim.dir: Simulation directory with projections of female life expectancy (generated using bayesLife). It must contain projections for the given regions (see functions run.e0.mcmc.extra, e0.predict.extra). If it is not given, the same e0 directory is taken which was used for generating the pop.pred object, in which case the e0 projections are re-loaded from disk.

e0M.sim.dir

Simulation directory with projections of male life expectancy. By default (value NULL or “joint_”) the function assumes a joint female-male projections of life expectancy and thus tries to load the male projections from the female projection object created using the e0F.sim.dir argument.

tfr.sim.dir

Simulation directory with projections of total fertility rate (generated using bayesTFR). It must contain projections for the given regions (see functions run.tfr.mcmc.extra, tfr.predict.extra). If it is not given, the same TFR directory is taken which was used for generating the pop.pred object, in which case the TFR projections are re-loaded from disk.

my.location.file

User-defined location file that can contain other agreggation groups than the default UN location file. It should have the same structure as the UNlocations dataset, see below.

verbose

Logical switching log messages on and off.

sim.dir

Simulation directory where aggregation is stored. It is the same directory used for creating the pop.pred object. Alternatively, pop.pred can be used. Either sim.dir or pop.pred must be given.

write.to.cache

Logical controlling if functions operating on this object are allowed to write into its cache (see Details of get.pop.prediction).

locations

Name of a tab-delimited file that contains definitions of the sub-regions. It should be the same file as used for the locations argument in pop.predict.subnat.

...

Additional arguments. For a country-type aggregation, it can be logical use.kannisto which determines if the Kannisto method should be used for old ages when aggregating mortality rates. A logical argument keep.vital.events determines if vital events should be computed for aggregations. Argument adjust determines if country-level population numbers should be adjusted to the WPP values.

Author

Hana Sevcikova, Adrian Raftery

Details

Function pop.aggregate triggers an aggregations over countries while function pop.aggregate.subnat is used for aggregation over sub-regions to a country. The following details refer to the use of pop.aggregate. For sub-national aggregation see Example in pop.predict.subnat.

The dataset UNlocations or my.location.file is used to determine countries to be aggregated, in particular the field “location_type” of the entries with “country_code” given in the regions argument. One can aggregate over the following location types: Type 0 means aggregating all countries of the world (or in the file), type 2 is aggregating over continents, type 3 is aggregating over regions within continents, and any other integer (except 4) correponds to user-defined aggregations. Note that type 4 is reserved as a location type of countries and thus, all aggregations are performed over entries of this type. For type 2, countries are matched using the “area_code” column; for type 3 the matching is done using the “reg_code” column of the UNlocations dataset. E.g., if regions=908 (Europe) which has location type 2 in the default UNlocations dataset, all countries are aggregated for which values of 908 are found in the “area_code” column. If the location type is other than 0, 2, 3 and 4, there must be a column in the file called “agcode_\(x\)” with \(x\) being the location type. This column is then used to match the countries to be aggregated.

Consider the following example. Say we want to pair four countries (Germany [DE], France [FR], Netherlands [NL], Italy [IT]) in two different ways, so we have two overlapping groupings, each of which has two groups (A,B):

group A = (DE, FR), group B = (NL, IT)
group A = (DE, NL), group B = (FR, IT)

Then, my.location.file should have the following entries:

country_code	name	location_type	agcode_98	agcode_99
1001	grouping1_groupA	98	-1	-1
1002	grouping1_groupB	98	-1	-1
1003	grouping2_groupA	99	-1	-1
1004	grouping2_groupB	99	-1	-1
276	Germany	4	1001	1003
250	France	4	1001	1004
258	Netherlands	4	1002	1003
380	Italy	4	1002	1004
1005	all	0	-1	-1

The “country_code” of the groups is user-specific, but it must be unique within the file. Values of “country_code” for countries must match those in the prediction object. To run the aggregation for the four groups above we set regions=1001:1004. Having “location_type” being 98 and 99, it is expected the file to have columns “agcode_98” and “agcode_99” containing assignements to each of the two groupings. Values in this columns corresponding to groups are not used and thus can have any value. For aggregating over all four countries, set regions=1005 which has “location_type” equal 0 and thus, it is aggregated over all entries with “location_type” equals 4.

There are two methods available for generating aggregations of population projection:

Country-based Method

Aggregations are created by summing trajectories over countries of the given region.

Region-based Method

The aggregation is generated using the same algorithm as population projections for single countries (function pop.predict), but it operates on aggregated input components. These are created as follows. Here \(c\) denotes countries over which we aggregate a region \(R\), \(s \in \{m, f\}\), \(a\), and \(t\) denote sex, age category and time, respectively. \(t=P\) denotes the present year of the prediction. \(N_{s,a,t}^c\) and \(M_{s,a,t}^c\), respectively, denotes the historical population count and the Bayesian predictive median of population, respectively, of sex \(s\), in age category \(a\) at time \(t\) for country \(c\) (refer to the links in parentheses for description of the data):

Initial sex and age-specific population (popM, popF):: \(N_{s,a,t=P}^R = \sum_c N_{s,a,t=P}^c\)

Sex and age-specific death rates (mxM, mxF):

\(mx_{s,a,t}^R = \frac{\sum_c(mx_{s,a,t}^c \cdot N_{s,a,t})}{\sum_c N_{s,a,t}}\)

Sex ratio at birth (srb):

\(SRB_t^R = \frac{\sum_c M_{s=m,a=1,t}^c}{\sum_c M_{s=f,a=1,t}^c}\)

Percentage age-specific fertility rate (pasfr):

\(PASFR_{a,t}^R = \frac{\sum_c(PASFR_{a,t}^c \cdot M_{s=f,a,t})}{\sum_c M_{s=f,a,t}}\)

Migration code and start year (mig.type):

Aggregated migration code is the code of maximum counts over aggregated countries weighted by \(N_{t=P}^c\). Migration start year is the maximum of start years over aggregated countries.

Sex and age-specific migration (migM, migF):

\(mig_{s,a,t}^R = \sum_c mig_{s,a,t}^c\)

Probabilistic projection of life expectancy:

We assume an aggregation of life expectancy for the given regions was generated prior to this call, using the run.e0.mcmc.extra and e0.predict.extra functions of the bayesLife package.

Probabilistic projection of total fertility rate:

We assume an aggregation of total fertility for the given regions was generated prior to this call, using the run.tfr.mcmc.extra and tfr.predict.extra functions of the bayesTFR package.

Results of the aggregations are stored in the same top directory as the pop.pred object, in a sudirectory called ‘aggregations_name’. They can be accessed using the function get.pop.aggregation. Note that multiple runs of this function with the same name will overwrite previous aggregations results of the same name.

References

H. Sevcikova, A. E. Raftery (2016). bayesPop: Probabilistic Population Projections. Journal of Statistical Software, 75(5), 1-29. doi:10.18637/jss.v075.i05

Examples

Run this code

if (FALSE) {
sim.dir <- tempfile()
pred <- pop.predict(countries=c(528,218,450), output.dir=sim.dir)
aggr <- pop.aggregate(pred, 900) # aggregating World (i.e. all countries available in pred)
pop.trajectories.plot(aggr, 900, sum.over.ages=TRUE)
# countries over which we aggregated:
subset(UNlocations, country_code %in% aggr$aggregated.countries[["900"]])
unlink(sim.dir, recursive=TRUE)}

Run the code above in your browser using DataLab