- fn
the function to be maximized. As the objective function
values are not directly used for optimization, this argument is
optional, given grad
is provided.
It must have the parameter vector as the first argument, and it must
have an argument index
to specify the integer index of the selected
observations.
It must return either a single number, or a numeric vector (this is
is summed internally).
If the parameters are out of range, fn
should
return NA
. See details for constant parameters.
fn
may also return attributes "gradient" and/or "hessian".
If these attributes are set, the algorithm uses the corresponding
values as
gradient and Hessian.
- grad
gradient of the objective function.
It must have the parameter vector as the first argument, and it must
have an argument index
to specify the integer index of selected
observations.
It must return either a gradient vector of the objective function,
or a matrix, where columns correspond to individual parameters.
The column sums are treated as gradient components.
If NULL
, finite-difference gradients are computed.
If fn
returns an object with attribute gradient
,
this argument is ignored.
If grad
is not supplied, it is computed by finite-difference
method using fn
. However, this is only adviseable for
small-scale tests, not for any production run. Obviously, fn
must be correctly defined in that case.
- hess
Hessian matrix of the function. Mainly for compatibility
reasons, only used for computing the final Hessian if asked to do
so by setting finalHessian
to TRUE
.
It must have the parameter vector as the first argument and
it must return the Hessian matrix of the objective function.
If missing, either finite-difference Hessian, based on
gradient
or BHHH approach
is computed if asked to do so.
- start
initial parameter values. If these have names, the
names are also used for results.
- nObs
number of observations. This is used to partition the data
into individual batches. The resulting batch
indices are forwarded to the grad
function through the
argument index
.
- constraints
either NULL
for unconstrained optimization
or a list with two components. The components may be either
eqA
and eqB
for equality-constrained optimization
\(A \theta + B = 0\); or ineqA
and
ineqB
for inequality constraints \(A \theta + B > 0\). More
than one
row in ineqA
and ineqB
corresponds to more than
one linear constraint, in that case all these must be zero
(equality) or positive (inequality constraints).
The equality-constrained problem is forwarded
to sumt
, the inequality-constrained case to
constrOptim2
.
- finalHessian
how (and if) to calculate the final Hessian. Either
FALSE
(do not calculate), TRUE
(use analytic/finite-difference
Hessian) or "bhhh"
/"BHHH"
for the information equality
approach. The latter approach is only suitable when working with a
log-likelihood function, and it requires the gradient/log-likelihood to
be supplied by individual observations.
Hessian matrix is not often used for optimization problems where one
applies SGA, but even if one is not interested in standard errors,
it may provide useful information about the model performance. If
computed by finite-difference method, the Hessian computation may be
very slow.
- fixed
parameters to be treated as constants at their
start
values. If present, it is treated as an index vector of
start
parameters.
- control
list of control parameters. The ones
used by these optimizers are
- SGA_momentum
0, numeric momentum parameter for SGA. Must lie
in interval \([0,1]\). See details.
Adam-specific parameters
- Adam_momentum1
0.9, numeric in interval \((0,1)\), the first moment momentum
- Adam_momentum2
0.999, numeric in interval \((0,1)\), the second moment momentum
General stochastic gradient parameters:
- SG_learningRate
step size the SGA algorithm takes in the
gradient direction. If 1, the step equals to the gradient value. A
good value is often 0.01--0.3
- SG_batchSize
SGA batch size, an integer between 1 and
nObs
.
If NULL
(default), the full batch gradient is computed.
- SG_clip
NULL
, gradient clipping threshold. The
algorithm ensures that \(||g(\theta)||_2^2 \le
\kappa\) where \(\kappa\) is
the SG_clip
value. If the
actual norm of the gradient exceeds (square root of)
\(\kappa\),
the gradient will be scaled back accordingly while
preserving its direction. NULL
means no clipping.
Stopping conditions:
- gradtol
stopping condition. Stop if norm of the gradient is
less than gradtol
. Default 0, i.e. do not use this
condition. This condition is useful if the
objective is to drive full batch gradient to zero on training data.
It is not a good objective in case of the stochastic
gradient, and if the objective is to optimize the objective on
validation data.
- SG_patience
NULL
, or integer. Stopping condition:
the algorithm counts how many times
the objective function has been worse than its best value so
far, and if this exceeds SG_patience
, the algorithm stops.
- SG_patienceStep
1L, integer. After how many epochs to check
the patience value. 1
means to check at each epoch, and hence to compute the
objective function. This may be undesirable if the objective
function is costly to compute.
- iterlim
stopping condition. Stop if more than iterlim
epochs, return code=4
.
Epoch is a set of iterations that cycles through all
observations. In case of full batch, iterations and epochs are
equivalent. If iterlim = 0
, does not do any learning and
returns the initial values unchanged.
- printLevel
this argument determines the level of
printing which is done during the optimization process. The default
value 0 means that no printing occurs, 1 prints the
initial and final details, 2 prints all the
main tracing information for every epoch. Higher
values will result in even more output.
- storeParameters
logical, whether to store and return the
parameter
values at each epoch. If TRUE
, the stored values
can be retrieved with storedParameters
-method. The
parameters are stored as a matrix with rows corresponding to the
epochs and columns to the parameter components. There are
iterlim
+ 1 rows, where the first one corresponds to the
initial parameters.
Default FALSE
.
- storeValues
logical, whether to store and return the objective
function values at each epoch. If TRUE
, the stored values
can be retrieved with storedValues
-method. There are
iterlim
+ 1 values, where the first one corresponds to
the value at the
initial parameters.
Default FALSE
.