CAclustering: Covariate Assisted Spectral Clustering.

Description

CAclustering clusters graph nodes by applying spectral clustering with the assistance from node covariates.

Usage

CAclustering(Adj, Covariate, K, alphan = 5, itermax = 100, startn = 10)

Value

estall: A factor indicating nodes' labels. Items sharing the same label are in the same community.

Arguments

Adj: An \(n \times n\) symmetric adjacency matrix with diagonals being \(0\) and positive entries being \(1\).
Covariate: An \(n \times p\) covariate matrix. The rows correspond to nodes and the columns correspond to covariates.
K: A positive integer which is no larger than \(n\). This is the predefined number of communities.
alphan: The number of candidate \(\alpha\)'s to try within the range \((\alpha_{min}, \alpha_{max})\) given in Binkiewicz et al. (2017). An optimal \(\alpha\) is expected to achieve a balance between \(L_{\tau}\) and \(X\).
itermax: k-means parameter, indicating the maximum number of iterations allowed. The default value is 100.
startn: k-means parameter. The number of times the algorithm should be run with different initial centroids. The default value is 10.

Details

CAclustering is an algorithm designed for community detection in networks with node covariates, as introduced in the paper Covariate-assisted spectral clustering of Binkiewicz et al. (2017). CAclustering applies k-means on the first \(K\) leading eigenvectors of \(L_{\tau}+\alpha XX^{\prime}\), where \(L_{\tau}\) is the regularized graph Laplacian, \(X\) is the covariates matrix, and \(\alpha\) is a tuning parameter.

References

Binkiewicz, N., Vogelstein, J. T., & Rohe, K. (2017). Covariate-assisted spectral clustering. Biometrika, 104(2), 361-377.
tools:::Rd_expr_doi("10.1093/biomet/asx008")

Examples

Run this code


# Simulate the Network
n = 10; K = 2; p =5; prob1 = 0.9;
theta = 0.4 + (0.45-0.05)*(seq(1:n)/n)^2; Theta = diag(theta);
P  = matrix(c(0.8, 0.2, 0.2, 0.8), byrow = TRUE, nrow = K)
set.seed(2022)
l = sample(1:K, n, replace=TRUE); # node labels
Pi = matrix(0, n, K) # label matrix
for (k in 1:K){
  Pi[l == k, k] = 1
}
Omega = Theta %*% Pi %*% P %*% t(Pi) %*% Theta;
Adj = matrix(runif(n*n, 0, 1), nrow = n);
Adj = Omega - Adj;
Adj = 1*(Adj >= 0)
diag(Adj) = 0
Adj[lower.tri(Adj)] = t(Adj)[lower.tri(Adj)]
Q = 0.1*matrix(sign(runif(p*K) - 0.5), nrow = p);
for(i in 1:K){
  Q[(i-1)*(p/K)+(1:(p/K)), i] = 0.3; #remark. has a change here
}
W = matrix(0, nrow = n, ncol = K);
for(jj in 1:n) {
  pp = rep(1/(K-1), K); pp[l[jj]] = 0;
  if(runif(1) <= prob1) {W[jj, 1:K] = Pi[jj, ];}
  else
  W[jj, sample(K, 1, prob = pp)] = 1;
  }
W = t(W)
D0 = Q %*% W
D = matrix(0, n, p)
for (i in 1:n){
  D[i,] = rnorm(p, mean = D0[,i], sd = 1);
}
CAclustering(Adj, D, 2)

Run the code above in your browser using DataLab