GAfit
estimates the specified reduced form STVAR model using a genetic algorithm.
It is designed to find starting values for gradient based methods and NOT to obtain final estimates
constituting a local maximum.
GAfit(
data,
p,
M,
weight_function = c("relative_dens", "logistic", "mlogit", "exponential", "threshold",
"exogenous"),
weightfun_pars = NULL,
cond_dist = c("Gaussian", "Student", "ind_Student", "ind_skewed_t"),
parametrization = c("intercept", "mean"),
AR_constraints = NULL,
mean_constraints = NULL,
weight_constraints = NULL,
ngen = 200,
popsize,
smart_mu = min(100, ceiling(0.5 * ngen)),
initpop = NULL,
mu_scale,
mu_scale2,
omega_scale,
B_scale,
weight_scale,
ar_scale = 0.2,
upper_ar_scale = 1,
ar_scale2 = 1,
regime_force_scale = 1,
penalized,
penalty_params = c(0.05, 0.5),
allow_unstab,
red_criteria = c(0.05, 0.01),
bound_by_weights,
pre_smart_mu_prob = 0,
to_return = c("alt_ind", "best_ind"),
minval,
fixed_params = NULL,
fixed_params_in_smart_mu = TRUE,
seed = NULL
)
Returns the estimated parameter vector which has the form described in initpop
,
with the exception that for models with cond_dist == "ind_Student"
or
"ind_skewed_t"
, the parameter vector is parametrized with \(B_1,B_2^*,...,B_M^*\)
instead of \(B_1,B_2,...,B_M\), where \(B_m^* = B_m - B_1\). Use the function change_parametrization
to change back to the original parametrization if desired.
a matrix or class 'ts'
object with d>1
columns. Each column is taken to represent
a univariate time series. Missing values are not supported.
a positive integer specifying the autoregressive order
a positive integer specifying the number of regimes
What type of transition weights \(\alpha_{m,t}\) should be used?
"relative_dens"
:\(\alpha_{m,t}= \frac{\alpha_mf_{m,dp}(y_{t-1},...,y_{t-p+1})}{\sum_{n=1}^M\alpha_nf_{n,dp}(y_{t-1},...,y_{t-p+1})}\), where \(\alpha_m\in (0,1)\) are weight parameters that satisfy \(\sum_{m=1}^M\alpha_m=1\) and \(f_{m,dp}(\cdot)\) is the \(dp\)-dimensional stationary density of the \(m\)th regime corresponding to \(p\) consecutive observations. Available for Gaussian conditional distribution only.
"logistic"
:\(M=2\), \(\alpha_{1,t}=1-\alpha_{2,t}\), and \(\alpha_{2,t}=[1+\exp\lbrace -\gamma(y_{it-j}-c) \rbrace]^{-1}\), where \(y_{it-j}\) is the lag \(j\) observation of the \(i\)th variable, \(c\) is a location parameter, and \(\gamma > 0\) is a scale parameter.
"mlogit"
:\(\alpha_{m,t}=\frac{\exp\lbrace \gamma_m'z_{t-1} \rbrace} {\sum_{n=1}^M\exp\lbrace \gamma_n'z_{t-1} \rbrace}\), where \(\gamma_m\) are coefficient vectors, \(\gamma_M=0\), and \(z_{t-1}\) \((k\times 1)\) is the vector containing a constant and the (lagged) switching variables.
"exponential"
:\(M=2\), \(\alpha_{1,t}=1-\alpha_{2,t}\), and \(\alpha_{2,t}=1-\exp\lbrace -\gamma(y_{it-j}-c) \rbrace\), where \(y_{it-j}\) is the lag \(j\) observation of the \(i\)th variable, \(c\) is a location parameter, and \(\gamma > 0\) is a scale parameter.
"threshold"
:\(\alpha_{m,t} = 1\) if \(r_{m-1}<y_{it-j}\leq r_{m}\) and \(0\) otherwise, where \(-\infty\equiv r_0<r_1<\cdots <r_{M-1}<r_M\equiv\infty\) are thresholds \(y_{it-j}\) is the lag \(j\) observation of the \(i\)th variable.
"exogenous"
:Exogenous nonrandom transition weights, specify the weight series in weightfun_pars
.
See the vignette for more details about the weight functions.
weight_function == "relative_dens"
:Not used.
weight_function %in% c("logistic", "exponential", "threshold")
:a numeric vector with the switching variable \(i\in\lbrace 1,...,d \rbrace\) in the first and the lag \(j\in\lbrace 1,...,p \rbrace\) in the second element.
weight_function == "mlogit"
:a list of two elements:
$vars
:a numeric vector containing the variables that should used as switching variables in the weight function in an increasing order, i.e., a vector with unique elements in \(\lbrace 1,...,d \rbrace\).
$lags
:an integer in \(\lbrace 1,...,p \rbrace\) specifying the number of lags to be used in the weight function.
weight_function == "exogenous"
:a size (nrow(data) - p
x M
) matrix containing the exogenous
transition weights as [t, m]
for time \(t\) and regime \(m\). Each row needs to sum to one and only weakly positive
values are allowed.
specifies the conditional distribution of the model as "Gaussian"
, "Student"
, "ind_Student"
,
or "ind_skewed_t"
, where "ind_Student"
the Student's \(t\) distribution with independent components, and
"ind_skewed_t"
is the skewed \(t\) distribution with independent components (see Hansen, 1994).
"intercept"
or "mean"
determining whether the model is parametrized with intercept
parameters \(\phi_{m}\) or regime means \(\mu_{m}\), m=1,...,M.
a size \((Mpd^2 \times q)\) constraint matrix \(C\) specifying linear constraints
to the autoregressive parameters. The constraints are of the form
\((\varphi_{1},...,\varphi_{M}) = C\psi\), where \(\varphi_{m} = (vec(A_{m,1}),...,vec(A_{m,p})) \ (pd^2 \times 1),\ m=1,...,M\),
contains the coefficient matrices and \(\psi\) \((q \times 1)\) contains the related parameters.
For example, to restrict the AR-parameters to be the identical across the regimes, set \(C =\)
[I:...:I
]' \((Mpd^2 \times pd^2)\) where I = diag(p*d^2)
.
Restrict the mean parameters of some regimes to be identical? Provide a list of numeric vectors
such that each numeric vector contains the regimes that should share the common mean parameters. For instance, if
M=3
, the argument list(1, 2:3)
restricts the mean parameters of the second and third regime to be
identical but the first regime has freely estimated (unconditional) mean. Ignore or set to NULL
if mean parameters
should not be restricted to be the same among any regimes. This constraint is available only for mean parametrized models;
that is, when parametrization="mean"
.
a list of two elements, \(R\) in the first element and \(r\) in the second element, specifying linear constraints on the transition weight parameters \(\alpha\). The constraints are of the form \(\alpha = R\xi + r\), where \(R\) is a known \((a\times l)\) constraint matrix of full column rank (\(a\) is the dimension of \(\alpha\)), \(r\) is a known \((a\times 1)\) constant, and \(\xi\) is an unknown \((l\times 1)\) parameter. Alternatively, set \(R=0\) to constrain the weight parameters to the constant \(r\) (in this case, \(\alpha\) is dropped from the constrained parameter vector).
a positive integer specifying the number of generations to be ran through in the genetic algorithm.
a positive even integer specifying the population size in the genetic algorithm.
Default is 10*n_params
.
a positive integer specifying the generation after which the random mutations in the genetic algorithm are "smart". This means that mutating individuals will mostly mutate fairly close (or partially close) to the best fitting individual (which has the least regimes with time varying mixing weights practically at zero) so far.
a list of parameter vectors from which the initial population of the genetic algorithm will be generated from. The parameter vectors hould have the form \(\theta = (\phi_{1},...,\phi_{M},\varphi_1,...,\varphi_M,\sigma,\alpha,\nu)\), where (see exceptions below):
\(\phi_{m} = \) the \((d \times 1)\) intercept (or mean) vector of the \(m\)th regime.
\(\varphi_m = (vec(A_{m,1}),...,vec(A_{m,p}))\) \((pd^2 \times 1)\).
cond_dist="Gaussian"
or "Student"
:\(\sigma = (vech(\Omega_1),...,vech(\Omega_M))\) \((Md(d + 1)/2 \times 1)\).
cond_dist="ind_Student"
or "ind_skewed_t"
:\(\sigma = (vec(B_1),...,vec(B_M)\) \((Md^2 \times 1)\).
\(\alpha = \) the \((a\times 1)\) vector containing the transition weight parameters (see below).
cond_dist = "Gaussian")
:Omit \(\nu\) from the parameter vector.
cond_dist="Student"
:\(\nu > 2\) is the single degrees of freedom parameter.
cond_dist="ind_Student"
:\(\nu = (\nu_1,...,\nu_d)\) \((d \times 1)\), \(\nu_i > 2\).
cond_dist="ind_skewed_t"
:\(\nu = (\nu_1,...,\nu_d,\lambda_1,...,\lambda_d)\) \((2d \times 1)\), \(\nu_i > 2\) and \(\lambda_i \in (0, 1)\).
For models with...
weight_function="relative_dens"
:\(\alpha = (\alpha_1,...,\alpha_{M-1})\) \((M - 1 \times 1)\), where \(\alpha_m\) \((1\times 1), m=1,...,M-1\) are the transition weight parameters.
weight_function="logistic"
:\(\alpha = (c,\gamma)\) \((2 \times 1)\), where \(c\in\mathbb{R}\) is the location parameter and \(\gamma >0\) is the scale parameter.
weight_function="mlogit"
:\(\alpha = (\gamma_1,...,\gamma_M)\) \(((M-1)k\times 1)\), where \(\gamma_m\) \((k\times 1)\), \(m=1,...,M-1\) contains the multinomial logit-regression coefficients of the \(m\)th regime. Specifically, for switching variables with indices in \(I\subset\lbrace 1,...,d\rbrace\), and with \(\tilde{p}\in\lbrace 1,...,p\rbrace\) lags included, \(\gamma_m\) contains the coefficients for the vector \(z_{t-1} = (1,\tilde{z}_{\min\lbrace I\rbrace},...,\tilde{z}_{\max\lbrace I\rbrace})\), where \(\tilde{z}_{i} =(y_{it-1},...,y_{it-\tilde{p}})\), \(i\in I\). So \(k=1+|I|\tilde{p}\) where \(|I|\) denotes the number of elements in \(I\).
weight_function="exponential"
:\(\alpha = (c,\gamma)\) \((2 \times 1)\), where \(c\in\mathbb{R}\) is the location parameter and \(\gamma >0\) is the scale parameter.
weight_function="threshold"
:\(\alpha = (r_1,...,r_{M-1})\) \((M-1 \times 1)\), where \(r_1,...,r_{M-1}\) are the threshold values.
weight_function="exogenous"
:Omit \(\alpha\) from the parameter vector.
Replace \(\varphi_1,...,\varphi_M\) with \(\psi\) as described in the argument AR_constraints
.
Replace \(\phi_{1},...,\phi_{M}\) with \((\mu_{1},...,\mu_{g})\) where \(\mu_i, \ (d\times 1)\) is the mean parameter for group \(i\) and \(g\) is the number of groups.
If linear constraints are imposed, replace \(\alpha\) with \(\xi\) as described in the
argument weigh_constraints
. If weight functions parameters are imposed to be fixed values, simply drop \(\alpha\)
from the parameter vector.
Above, \(\phi_{m}\) is the intercept parameter, \(A_{m,i}\) denotes the \(i\)th coefficient matrix of the \(m\)th
regime, \(\Omega_{m}\) denotes the positive definite error term covariance matrix of the \(m\)th regime, and \(B_m\)
is the invertible \((d\times d)\) impact matrix of the \(m\)th regime. \(\nu_m\) is the degrees of freedom parameter
of the \(m\)th regime.
If parametrization=="mean"
, just replace each \(\phi_{m}\) with regimewise mean \(\mu_{m}\).
\(vec()\) is vectorization operator that stacks columns of a given matrix into a vector. \(vech()\) stacks columns
of a given matrix from the principal diagonal downwards (including elements on the diagonal) into a vector.
a size \((dx1)\) vector defining means of the normal distributions from which each
mean parameter \(\mu_{m}\) is drawn from in random mutations. Default is colMeans(data)
. Note that
mean-parametrization is always used for optimization in GAfit
- even when parametrization=="intercept"
.
However, input (in initpop
) and output (return value) parameter vectors can be intercept-parametrized.
a size \((dx1)\) strictly positive vector defining standard deviations of the normal
distributions from which each mean parameter \(\mu_{m}\) is drawn from in random mutations.
Default is vapply(1:d, function(i1) sd(data[,i1]), numeric(1))
.
a size \((dx1)\) strictly positive vector specifying the scale and variability of the
random covariance matrices in random mutations. The covariance matrices are drawn from (scaled) Wishart
distribution. Expected values of the random covariance matrices are diag(omega_scale)
. Standard
deviations of the diagonal elements are sqrt(2/d)*omega_scale[i]
and for non-diagonal elements they are sqrt(1/d*omega_scale[i]*omega_scale[j])
.
Note that for d>4
this scale may need to be chosen carefully. Default in GAfit
is
var(stats::ar(data[,i], order.max=10)$resid, na.rm=TRUE), i=1,...,d
. This argument is ignored if
cond_dist == "ind_Student"
.
a size \((d \times 1)\) strictly positive vector specifying the mean and variability of the
random impact matrices in random mutations. In Regime 1, the mean of the error term covariance matrix
implied by the random impact matrix will be 0.95*diag(B_scale)
and in the rest of the regimes diag(B_scale)
,
whereas the variability increases with B_scale
.
Default in GAfit
is var(stats::ar(data[,i], order.max=10)$resid, na.rm=TRUE), i=1,...,d
.
This argument is ignored if cond_dist != "ind_Student"
.
For...
weight_function %in% c("relative_dens", "exogenous")
:not used.
weight_function %in% c("logistic", "exponential")
:length three vector with the mean (in the first element) and standard deviation (in the second element) of the normal distribution the location parameter is drawn from in random mutations. The third element is the standard deviation of the normal distribution from whose absolute value the location parameter is drawn from.
weight_function == "mlogit"
:length two vector with the mean (in the first element) and standard deviation (in the second element) of the normal distribution the coefficients of the logit sub model's constant terms are drawn from in random mutations. The third element is the standard deviation of the normal distribution from which the non-constant regressors' coefficients are drawn from.
weight_function == "threshold"
:a lenght two vector with the lower bound, in the first element and the upper bound, in the second element, of the uniform distribution threshold parameters are drawn from in random mutations.
a positive real number between zero and one adjusting how large AR parameter values are typically
proposed in construction of the initial population: larger value implies larger coefficients (in absolute value).
After construction of the initial population, a new scale is drawn from (0, upper_ar_scale)
uniform
distribution in each iteration.
the upper bound for ar_scale
parameter (see above) in the random mutations. Setting
this too high might lead to failure in proposing new parameters that are well enough inside the parameter space,
and especially with large p
one might want to try smaller upper bound (e.g., 0.5). With large p
or
d
, upper_ar_scale
is restricted from above, see the details section.
a positive real number adjusting how large AR parameter values are typically proposed in some random mutations (if AR constraints are employed, in all random mutations): larger value implies smaller coefficients (in absolute value). Values larger than 1 can be used if the AR coefficients are expected to be very small. If set smaller than 1, be careful as it might lead to failure in the creation of parameter candidates that satisfy the stability condition.
a non-negative real number specifying how much should natural selection favor individuals
with less regimes that have almost all mixing weights (practically) at zero. Set to zero for no favoring or large
number for heavy favoring. Without any favoring the genetic algorithm gets more often stuck in an area of the
parameter space where some regimes are wasted, but with too much favouring the best genes might never mix into
the population and the algorithm might converge poorly. Default is 1
and it gives \(2x\) larger surviving
probability weights for individuals with no wasted regimes compared to individuals with one wasted regime.
Number 2
would give \(3x\) larger probability weights etc.
Perform penalized LS estimation that minimizes penalized RSS in which estimates close to breaking or not satisfying the
usual stability condition are penalized? If TRUE
, the tuning parameter is set by the argument penalty_params[2]
,
and the penalization starts when the eigenvalues of the companion form AR matrix are larger than 1 - penalty_params[1]
.
a numeric vector with two positive elements specifying the penalization parameters: the first element determined how far from the boundary of the stability region the penalization starts (a number between zero and one, smaller number starts penalization closer to the boundary) and the second element is a tuning parameter for the penalization (a positive real number, a higher value penalizes non-stability more).
If TRUE
, estimates not satisfying the stability condition are allowed. Always FALSE
if
weight_function="relative_dens"
.
a length 2 numeric vector specifying the criteria that is used to determine whether a regime is
redundant (or "wasted") or not.
Any regime m
which satisfies sum(transitionWeights[,m] > red_criteria[1]) < red_criteria[2]*n_obs
will
be considered "redundant". One should be careful when adjusting this argument (set c(0, 0)
to fully disable
the 'redundant regime' features from the algorithm).
should the parameter space be constrained to areas where the transition weights do allocate
enough weights to each regime compared to the number of observations in the regime? See the source code of
the function loglikelihood
for details.
A number in \([0,1]\) giving a probability of a "smart mutation" occuring randomly in each
iteration before the iteration given by the argument smart_mu
.
should the genetic algorithm return the best fitting individual which has "positive enough" mixing
weights for as many regimes as possible ("alt_ind"
) or the individual which has the highest log-likelihood
in general ("best_ind"
) but might have more wasted regimes?
a real number defining the minimum value of the log-likelihood function that will be considered.
Values smaller than this will be treated as they were minval
and the corresponding individuals will
never survive. The default is -(10^(ceiling(log10(n_obs)) + d) - 1)
.
a vector containing fixed parameter values for intercept, autoregressive, and weight parameters that should be fixed in the initial population. Should have the form: \((\phi_{1},...,\phi_{M},\varphi_1,...,\varphi_M,\alpha\), where
\((\phi_{m} = \) the \((d \times 1)\) intercept vector of the \(m\)th regime.
\(\varphi_m = (vec(A_{m,1}),...,vec(A_{m,p}))\) \((pd^2 \times 1)\).
\(\alpha\) vector of the weight parameters.
For models with...
Replace \(\varphi_1,...,\varphi_M\) with \(\psi\) as described in the argument AR_constraints
.
If linear constraints are imposed, replace \(\alpha\) with \(\xi\) as described in the
argument weigh_constraints
. If weight functions parameters are imposed to be fixed values, simply drop \(\alpha\)
from the parameter vector.
Note that fixed_params
should always be in the intercept parametrization (and parametrization="intercept"
should always be used).
Passing this argument from fitSTVAR in does not do anything, as it is designed to be used with the three-phase estimation
procedure only. Also, this argument does not do anything if the initial population is specified in the argument initpop.
should the fixed parameters be fixed in the smart mutation phase as well? If TRUE
, the fixed
parameters stay fixed throughout the whole GA estimation.
a single value, interpreted as an integer, or NULL, that sets seed for the random number generator in
the beginning of the function call. If calling GAfit
from fitSTVAR
, use the argument seeds
instead of passing the argument seed
.
Only reduced form models are supported!
The core of the genetic algorithm is mostly based on the description by Dorsey and Mayer (1995). It utilizes a slightly modified version of the individually adaptive crossover and mutation rates described by Patnaik and Srinivas (1994) and employs (50%) fitness inheritance discussed by Smith, Dike and Stegmann (1995).
By "redundant" or "wasted" regimes we mean regimes that have the time varying mixing weights practically at zero for almost all t. A model including redundant regimes would have about the same log-likelihood value without the redundant regimes and there is no purpose to have redundant regimes in a model.
Some of the AR coefficients are drawn with the algorithm by Ansley and Kohn (1986). However,
when using large ar_scale
with large p
or d
, numerical inaccuracies caused
by the imprecision of the float-point presentation may result in errors or nonstationary AR-matrices.
Using smaller ar_scale
facilitates the usage of larger p
or d
. Therefore, we bound
upper_ar_scale
from above by \(1-pd/150\) when p*d>40
and by \(1\) otherwise.
Structural models are not supported here, as they are best estimated based on reduced form parameter estimates
using the function fitSSTVAR
.
Ansley C.F., Kohn R. 1986. A note on reparameterizing a vector autoregressive moving average model to enforce stationarity. Journal of statistical computation and simulation, 24:2, 99-106.
Dorsey R. E. and Mayer W. J. 1995. Genetic algorithms for estimation problems with multiple optima, nondifferentiability, and other irregular features. Journal of Business & Economic Statistics, 13, 53-66.
Patnaik L.M. and Srinivas M. 1994. Adaptive Probabilities of Crossover and Mutation in Genetic Algorithms. Transactions on Systems, Man and Cybernetics 24, 656-667.
Smith R.E., Dike B.A., Stegmann S.A. 1995. Fitness inheritance in genetic algorithms. Proceedings of the 1995 ACM Symposium on Applied Computing, 345-350.