NCV estimates smoothing parameters by optimizing the average ability of a model to predict subsets of data when subsets of data are omitted from fitting. Usually the predicted subset is a subset of the omitted subset. If both subsets are the same single datapoint, and the average is over all datapoints, then NCV is leave-one-out cross validation. QNCV is a quadratic approximation to NCV, guaranteed finite for any family link combination.
In detail, suppose that a model is estimated by minimizing a penalized loss
$$\sum_i D(y_i,\theta_i) + \sum_j \lambda_j \beta^{\sf T} {S}_j \beta $$
where \(D\) is a loss (such as a negative log likelihood), dependent on response \(y_i\) and parameter vector \(\theta_i\), which in turn depends on covariates via one or more smooth linear predictors with coefficients \(\beta\). The quadratic penalty terms penalize model complexity: \(S_j\) is a known matrix and \(\lambda_j\) an unknown smoothing parameter. Given smoothing parameters the penalized loss is readily minimized to estimate \(\beta\).
The smoothing parameters also have to be estimated. To this end, choose \(k = 1,\ldots,m\) subsets \(\alpha(k)\subset \{1,\ldots,n\}\) and \(\delta(k)\subset \{1,\ldots,n\}\). Usually \(\delta(k)\) is a subset of (or equal to) \(\alpha(k)\). Let \(\theta_i^{\alpha(k)}\) denote the estimate of \(\theta_i\) when the points indexed by \(\alpha(k)\) are omitted from fitting. Then the NCV criterion
$$V = \sum_{k=1}^m \sum_{i \in \alpha(k)} D(y_i,\theta_i^{\alpha(k)})$$
is minimized w.r.t. the smoothing parameters, \(\lambda_j\). If \(m=n\) and \(\alpha(k)=\delta(k)=k\) then ordinary leave-one-out cross validation is recovered. This formulation covers many of the variants of cross validation reviewed in Arlot and Celisse (2010), for example.
Except for a quadratic loss, \(V\) can not be computed exactly, but it can be computed to \(O(n^{-2})\) accuracy (fixed basis size), by taking single Newton optimization steps from the full data \(\beta\) estimates to the equivalent when each \(\alpha(k)\) is dropped. This is what mgcv
does. The Newton steps require update of the full model Hessian to the equivalent when each datum is dropped. This can be achieved at \(O(p^2)\) cost, where \(p\) is the dimension of \(\beta\). Hence, for example, the ordinary cross validation criterion is computable at the \(O(np^2)\) cost of estimating the model given smoothing parameters.
The NCV score computed in this way is optimized using a BFGS quasi-Newton method, adapted to the case in which smoothing parameters tending to infinity may cause indefiniteness.