xsample: Randomly samples an underdetermined problem with linear equality and inequality constraints

Description

Random sampling of inverse linear problems with linear equality and inequality constraints. Uses either a "hit and run" algorithm (random or coordinate directions) or a mirroring technique for sampling.

The Markov Chain Monte Carlo method produces a sample solution for $$Ex=f$$ $$Ax\simeq B$$ $$Gx>=h$$ where $Ex=F$ have to be met exactly, and x is distributed according to $ p(\mathbf{x})\propto e^{-\frac{1}{2}(\mathbf{Ax-b})^T\mathbf{W}^2(\mathbf{Ax-b})} $

Usage

xsample(A = NULL, B = NULL, E = NULL, F =NULL, 
        G = NULL, H = NULL, sdB = NULL, W = 1, 
        iter = 3000, outputlength = iter, burninlength = NULL, 
        type = "mirror", jmp = NULL, tol = sqrt(.Machine$double.eps), 
        x0 = NULL, fulloutput = FALSE, test = TRUE, verbose=TRUE,
        lower = NULL, upper = NULL)

Value

a list containing:

X: matrix whose rows contain the sampled values of x.
acceptedratio: ratio of acceptance (i.e. the ratio of the accepted runs / total iterations).
Q: only returned if fulloutput is TRUE: the transformed samples Q.
p: only returned if fulloutput is TRUE: probability vector for all samples (e.g. one value for each row of X).
jmp: the jump length used for the random walk. Can be used to check the automated jump length.

Arguments

A

numeric matrix containing the coefficients of the (approximate) equality constraints, $Ax\simeq B$.

B

numeric vector containing the right-hand side of the (approximate) equality constraints.

E

numeric matrix containing the coefficients of the (exact) equality constraints, $Ex=F$.

F

numeric vector containing the right-hand side of the (exact) equality constraints.

G

numeric matrix containing the coefficients of the inequality constraints, $Gx>=H$.

H

numeric vector containing the right-hand side of the inequality constraints.

sdB

vector with standard deviation on B. Defaults to NULL.

W

weighting for $Ax\simeq B$. Only used if sdB=NULL and the problem is overdetermined. In that case, the error of B around the model Ax is estimated based on the residuals of $Ax\simeq B$. This error is made proportional to 1/W. If sdB is not NULL, $W=diag(sdB^-1)$.

iter

integer determining the number of iterations.

outputlength

number of iterations kept in the output; at most equal to iter.

burninlength

a number of extra iterations, performed at first, to "warm up" the algorithm.

type

type of algorithm: one of: "mirror", (mirroring algorithm), "rda" (random directions algorithm) or "cda" (coordinates directions algorithm).

jmp

jump length of the transformed variables q: $x=x0+Zq$ (only if type=="mirror"); if jmp is NULL, a reasonable value is determined by xsample, depending on the size of the NULL space.

tol

tolerance for equality and inequality constraints; numbers whose absolute value is smaller than tol are set to zero.

x0

initial (particular) solution.

fulloutput

if TRUE, also outputs the transformed variables q.

test

if TRUE, xsample will test for hidden equalities (see details). This may be necessary for large problems, but slows down execution a bit.

verbose

logical to print warnings and messages.

upper, lower

vector containing upper and lower bounds on the unknowns. If one value, it is assumed to apply to all unknowns. If a vector, it should have a length equal to the number of unknowns; this vector can contain NA for unbounded variables. The upper and lower bounds are added to the inequality conditions G*x>=H.

Author

Karel Van den Meersche

Karline Soetaert <karline.soetaert@nioz.nl>

Details

The algorithm proceeds in two steps.

the equality constraints $Ex=F$ are eliminated, and the system $Ex=f$, $Gx>=h$ is rewritten as $G(p+Zq)>= h$, i.e. containing only inequality constraints and where Z is a basis for the null space of E.
the distribution of $q$ is sampled numerically using a random walk (based on the Metropolis algorithm).

There are three algorithms for selecting new samples: rda, cda (two hit-and-run algorithms) and a novel mirror algorithm.

In the rda algorithm first a random direction is selected, and the new sample obtained by uniformly sampling the line connecting the old sample and the intersection with the planes defined by the inequality constraints.
the cda algorithm is similar, except that the direction is chosen along one of the coordinate axes.
the mirror algorithm is yet unpublished; it uses the inequality constraints as "reflecting planes" along which jumps are reflected. In contrast to cda and rda, this algorithm also works with unbounded problems (i.e. for which some of the unknowns can attain Inf).

For more information, see the package vignette vignette(xsample) or the file xsample.pdf in the packages docs subdirectory.

Raftery and Lewis (1996) suggest a minimum of 3000 iterations to reach the extremes.

If provided, then x0 should be a valid particular solution (i.e. $E*x0=b$ and $G*x0>=h$), else the algorithm will fail.

For larger problems, a central solution may be necessary as a starting point for the rda and cda algorithms. A good starting value is provided by the "central" value when running the function xranges with option central equal to TRUE.

If the particular solution (x0) is not provided, then the parsimonious solution is sought, see ldei.

This may however not be the most efficient way to start the algorithm. The parsimonious solution is usually located near the edges, and the rda and cda algorithms may not get out of this corner. The mirror algorithm is insensitive to that. Here it may be even better to start in a corner (as this position will always never be reached by random sampling).

The algorithm will fail if there are hidden equalities. For instance, two inequalities may together impose an equality on an unknown, or, inequalities may impose equalities on a linear combination of two or more unknowns.

In this case, the basis of the null space Z will be deficient. Therefore, xsample starts by checking if such hidden equalities exist. If it is suspected that this is NOT the case, set test to FALSE. This will speed up execution slightly.

It is our experience that for small problems either the rda and cda algorithms are often more efficient. For really large problems, the mirror algorithm is usually much more efficient; select a jump length (jmp) that ensures good random coverage, while still keeping the number of reflections reasonable. If unsure about the size of jmp, the default will do.

See E_coli for an example where a relatively large problem is sampled.

References

Van den Meersche K, Soetaert K, Van Oevelen D (2009). xsample(): An R Function for Sampling Linear Inverse Problems. Journal of Statistical Software, Code Snippets, 30(1), 1-15.

https://www.jstatsoft.org/v30/c01/

Examples

Run this code

#-------------------------------------------------------------------------------
# A simple problem
#-------------------------------------------------------------------------------
# Sample the probability density function of x1,...x4
# subject to:
# x1 + x2       + x4 = 3
#      x2  -x3  + x4 = -1
# xi   > 0

E <- matrix(nrow = 2, byrow = TRUE, data = c(1, 1, 0,  1,
                                             0, 1, -1, 1))
F   <- c(3, -1)

xs  <- xsample(E = E, F = F, lower = 0)
pairs(xs)

#-------------------------------------------------------------------------------
# Sample the underdetermined Mink diet problem
#-------------------------------------------------------------------------------
E <- rbind(Minkdiet$Prey, rep(1, 7))
F <- c(Minkdiet$Mink, 1)

# Here the Requirement x > 0 is been inposed in G and H.
pairs(xsample(E = E, F = F, G = diag(7), H = rep(0, 7), iter = 5000,
      output = 1000, type = "cda")$X,
      main = "Minkdiet 1000 solutions, - cda")

Run the code above in your browser using DataLab