scg_group: SCG Problem Solver

Description

This function solves the Spectral Coarse Graining (SCG) problem; either exactly, or approximately but faster.

Usage

scg_group(
  V,
  nt,
  mtype = c("symmetric", "laplacian", "stochastic"),
  algo = c("optimum", "interv_km", "interv", "exact_scg"),
  p = NULL,
  maxiter = 100
)

Arguments

A numeric matrix of (eigen)vectors to be preserved by the coarse graining (the vectors are to be stored column-wise in V).

A vector of positive integers of length one or equal to length(ev). When algo = “optimum”, nt contains the number of groups used to partition each eigenvector separately. When algo is equal to “interv\_km” or “interv”, nt contains the number of intervals used to partition each eigenvector. The same partition size or number of intervals is used for each eigenvector if nt is a single integer. When algo = “exact\_cg” this parameter is ignored.

mtype

The type of semi-projectors used in the SCG. For now “symmetric”, “laplacian” and “stochastic” are available.

algo

The algorithm used to solve the SCG problem. Possible values are “optimum”, “interv\_km”, “interv” and “exact\_scg”.

A probability vector of length equal to nrow(V). p is the stationary probability distribution of a Markov chain when mtype = “stochastic”. This parameter is ignored in all other cases.

maxiter

A positive integer giving the maximum number of iterations of the k-means algorithm when algo = “interv\_km”. This parameter is ignored in all other cases.

Value

A vector of nrow(V) integers giving the group label of each object (vertex) in the partition.

Details

The algorithm “optimum” solves exactly the SCG problem for each eigenvector in V. The running time of this algorithm is \(O(\max nt \cdot m^2)\) for the symmetric and laplacian matrix problems (i.e. when mtype is “symmetric” or “laplacian”. It is \(O(m^3)\) for the stochastic problem. Here \(m\) is the number of rows in V. In all three cases, the memory usage is \(O(m^2)\).

The algorithms “interv” and “interv\_km” solve approximately the SCG problem by performing a (for now) constant binning of the components of the eigenvectors, that is nt[i] constant-size bins are used to partition V[,i]. When algo = “interv\_km”, the (Lloyd) k-means algorithm is run on each partition obtained by “interv” to improve accuracy.

Once a minimizing partition (either exact or approximate) has been found for each eigenvector, the final grouping is worked out as follows: two vertices are grouped together in the final partition if they are grouped together in each minimizing partition. In general the size of the final partition is not known in advance when ncol(V)>1.

Finally, the algorithm “exact\_scg” groups the vertices with equal components in each eigenvector. The last three algorithms essentially have linear running time and memory load.

References

D. Morton de Lachapelle, D. Gfeller, and P. De Los Rios, Shrinking Matrices while Preserving their Eigenpairs with Application to the Spectral Coarse Graining of Graphs. Submitted to SIAM Journal on Matrix Analysis and Applications, 2008. http://people.epfl.ch/david.morton

Examples

Run this code

# NOT RUN {

## We are not running these examples any more, because they
## take a long time to run and this is against the CRAN repository
## policy. Copy and paste them by hand to your R prompt if
## you want to run them.

# }
# NOT RUN {
# eigenvectors of a random symmetric matrix
M <- matrix(rexp(10^6), 10^3, 10^3)
M <- (M + t(M))/2
V <- eigen(M, symmetric=TRUE)$vectors[,c(1,2)]

# displays size of the groups in the final partition
gr <- scg_group(V, nt=c(2,3))
col <- rainbow(max(gr))
plot(table(gr), col=col, main="Group size", xlab="group", ylab="size")

## comparison with the grouping obtained by kmeans
## for a partition of same size
gr.km <- kmeans(V,centers=max(gr), iter.max=100, nstart=100)$cluster
op <- par(mfrow=c(1,2))
plot(V[,1], V[,2], col=col[gr],
	main = "SCG grouping",
	xlab = "1st eigenvector",
	ylab = "2nd eigenvector")
plot(V[,1], V[,2], col=col[gr.km],
	main = "K-means grouping",
	xlab = "1st eigenvector",
	ylab = "2nd eigenvector")
par(op)
## kmeans disregards the first eigenvector as it
## spreads a much smaller range of values than the second one

### comparing optimal and k-means solutions
### in the one-dimensional case.
x <- rexp(2000, 2)
gr.true <- scg_group(cbind(x), 100)
gr.km <- kmeans(x, 100, 100, 300)$cluster
scg_eps(cbind(x), gr.true)
scg_eps(cbind(x), gr.km)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab