Aggregation of existing countries' population projections into projections of given regions, and accessing such aggregations.
pop.aggregate(pop.pred, regions,
input.type = c("country", "region"), name = input.type,
inputs = list(e0F.sim.dir = NULL, e0M.sim.dir = "joint_", tfr.sim.dir = NULL),
my.location.file = NULL, verbose = FALSE, ...)
get.pop.aggregation(sim.dir = NULL, pop.pred = NULL, name = NULL,
write.to.cache = TRUE)
pop.aggregate.subnat(pop.pred, regions, locations, ..., verbose = FALSE)
Object of class bayesPop.prediction
containing the aggregated results. In addition it contains elements aggregation.method
giving the input.type
used, and aggregated.countries
which is a list of countries aggregated for each region.
Object of class bayesPop.prediction
containing country-specific population projections.
Vector of numerical codes of regions. It should correspond to values in the column “country_code” in the UNlocations
dataset or in my.location.file
(see below). For pop.aggregate.subnat
it is a numerical code of a country over which subregions are aggregated.
There are two methods for aggregating projections depending on the type of inputs, “country”- and “region”-based, see Details.
Name of the aggregation. It becomes a part of a directory name where aggregation results are stored.
This argument is only used when the “region”-based method is selected. It is a list of inputs of probabilistic components of the projection:
Simulation directory with projections of female life expectancy (generated using bayesLife). It must contain projections for the given regions (see functions run.e0.mcmc.extra
, e0.predict.extra
). If it is not given, the same e0 directory is taken which was used for generating the pop.pred
object, in which case the e0 projections are re-loaded from disk.
Simulation directory with projections of male life expectancy. By default (value NULL
or “joint_”) the function assumes a joint female-male projections of life expectancy and thus tries to load the male projections from the female projection object created using the e0F.sim.dir
argument.
Simulation directory with projections of total fertility rate (generated using bayesTFR). It must contain projections for the given regions (see functions run.tfr.mcmc.extra
, tfr.predict.extra
). If it is not given, the same TFR directory is taken which was used for generating the pop.pred
object, in which case the TFR projections are re-loaded from disk.
User-defined location file that can contain other agreggation groups than the default UN location file. It should have the same structure as the UNlocations
dataset, see below.
Logical switching log messages on and off.
Simulation directory where aggregation is stored. It is the same directory used for creating the pop.pred
object. Alternatively, pop.pred
can be used. Either sim.dir
or pop.pred
must be given.
Logical controlling if functions operating on this object are allowed to write into its cache (see Details of get.pop.prediction
).
Name of a tab-delimited file that contains definitions of the sub-regions. It should be the same file as used for the locations
argument in pop.predict.subnat
.
Additional arguments. For a country-type aggregation, it can be logical use.kannisto
which determines if the Kannisto method should be used for old ages when aggregating mortality rates. A logical argument keep.vital.events
determines if vital events should be computed for aggregations. Argument adjust
determines if country-level population numbers should be adjusted to the WPP values.
Hana Sevcikova, Adrian Raftery
Function pop.aggregate
triggers an aggregations over countries while function pop.aggregate.subnat
is used for aggregation over sub-regions to a country. The following details refer to the use of pop.aggregate
. For sub-national aggregation see Example in pop.predict.subnat
.
The dataset UNlocations
or my.location.file
is used to determine countries to be aggregated, in particular the field “location_type” of the entries with “country_code” given in the regions
argument. One can aggregate over the following location types: Type 0 means aggregating all countries of the world (or in the file), type 2 is aggregating over continents, type 3 is aggregating over regions within continents, and any other integer (except 4) correponds to user-defined aggregations. Note that type 4 is reserved as a location type of countries and thus, all aggregations are performed over entries of this type. For type 2, countries are matched using the “area_code” column; for type 3 the matching is done using the “reg_code” column of the UNlocations
dataset. E.g., if regions=908
(Europe) which has location type 2 in the default UNlocations
dataset, all countries are aggregated for which values of 908 are found in the “area_code” column. If the location type is other than 0, 2, 3 and 4, there must be a column in the file called “agcode_\(x\)” with \(x\) being the location type. This column is then used to match the countries to be aggregated.
Consider the following example. Say we want to pair four countries (Germany [DE], France [FR], Netherlands [NL], Italy [IT]) in two different ways, so we have two overlapping groupings, each of which has two groups (A,B):
group A = (DE, FR), group B = (NL, IT)
group A = (DE, NL), group B = (FR, IT)
Then, my.location.file
should have the following entries:
country_code | name | location_type | agcode_98 | agcode_99 |
1001 | grouping1_groupA | 98 | -1 | -1 |
1002 | grouping1_groupB | 98 | -1 | -1 |
1003 | grouping2_groupA | 99 | -1 | -1 |
1004 | grouping2_groupB | 99 | -1 | -1 |
276 | Germany | 4 | 1001 | 1003 |
250 | France | 4 | 1001 | 1004 |
258 | Netherlands | 4 | 1002 | 1003 |
380 | Italy | 4 | 1002 | 1004 |
1005 | all | 0 | -1 | -1 |
The “country_code” of the groups is user-specific, but it must be unique within the file. Values of “country_code” for countries must match those in the prediction object. To run the aggregation for the four groups above we set regions=1001:1004
. Having “location_type” being 98 and 99, it is expected the file to have columns “agcode_98” and “agcode_99” containing assignements to each of the two groupings. Values in this columns corresponding to groups are not used and thus can have any value. For aggregating over all four countries, set regions=1005
which has “location_type” equal 0 and thus, it is aggregated over all entries with “location_type” equals 4.
There are two methods available for generating aggregations of population projection:
Aggregations are created by summing trajectories over countries of the given region.
The aggregation is generated using the same algorithm as population projections for single countries (function pop.predict
), but it operates on aggregated input components. These are created as follows. Here \(c\) denotes countries over which we aggregate a region \(R\), \(s \in \{m, f\}\), \(a\), and \(t\) denote sex, age category and time, respectively. \(t=P\) denotes the present year of the prediction. \(N_{s,a,t}^c\) and \(M_{s,a,t}^c\), respectively, denotes the historical population count and the Bayesian predictive median of population, respectively, of sex \(s\), in age category \(a\) at time \(t\) for country \(c\) (refer to the links in parentheses for description of the data):
\(N_{s,a,t=P}^R = \sum_c N_{s,a,t=P}^c\)
\(mx_{s,a,t}^R = \frac{\sum_c(mx_{s,a,t}^c \cdot N_{s,a,t})}{\sum_c N_{s,a,t}}\)
\(SRB_t^R = \frac{\sum_c M_{s=m,a=1,t}^c}{\sum_c M_{s=f,a=1,t}^c}\)
\(PASFR_{a,t}^R = \frac{\sum_c(PASFR_{a,t}^c \cdot M_{s=f,a,t})}{\sum_c M_{s=f,a,t}}\)
Aggregated migration code is the code of maximum counts over aggregated countries weighted by \(N_{t=P}^c\). Migration start year is the maximum of start years over aggregated countries.
\(mig_{s,a,t}^R = \sum_c mig_{s,a,t}^c\)
We assume an aggregation of life expectancy for the given regions was generated prior to this call, using the run.e0.mcmc.extra
and e0.predict.extra
functions of the bayesLife package.
We assume an aggregation of total fertility for the given regions was generated prior to this call, using the run.tfr.mcmc.extra
and tfr.predict.extra
functions of the bayesTFR package.
Results of the aggregations are stored in the same top directory as the pop.pred
object, in a sudirectory called ‘aggregations_
name’. They can be accessed using the function get.pop.aggregation
. Note that multiple runs of this function with the same name will overwrite previous aggregations results of the same name.
H. Sevcikova, A. E. Raftery (2016). bayesPop: Probabilistic Population Projections. Journal of Statistical Software, 75(5), 1-29. doi:10.18637/jss.v075.i05
pop.predict
, tfr.predict.extra
, e0.predict.extra
if (FALSE) {
sim.dir <- tempfile()
pred <- pop.predict(countries=c(528,218,450), output.dir=sim.dir)
aggr <- pop.aggregate(pred, 900) # aggregating World (i.e. all countries available in pred)
pop.trajectories.plot(aggr, 900, sum.over.ages=TRUE)
# countries over which we aggregated:
subset(UNlocations, country_code %in% aggr$aggregated.countries[["900"]])
unlink(sim.dir, recursive=TRUE)}
Run the code above in your browser using DataLab