clusters: Identify clusters in a collection of positions or intervals

Description

This function uses tools in the intervals package to quickly identify clusters -- contiguous collections of positions or intervals which are separated by no more than a given distance from their neighbors to either side.

Usage

# S4 method for numeric
clusters(x, w, which = FALSE, check_valid = TRUE)
# S4 method for Intervals_virtual
clusters(x, w, which = FALSE, check_valid = TRUE)

Value

A list whose components are the clusters. Each component is thus a subset of x, or, if which == TRUE, a vector of indices into the x object. (The indices correspond to row numbers when x is of class "Intervals_virtual".)

Arguments

x: An appropriate object.
w: Maximum permitted distance between a cluster member and its neighbors to either side.
which: Should indices into the x object be returned instead of actual subsets?
check_valid: Should validObject be called before passing to compiled code? Also see interval_overlap and reduce.

Details

A cluster is defined to be a maximal collection, with at least two members, of components of x which are separated by no more than w. Note that when x represents intervals, an interval must actually contain a point at distance w or less from a neighboring interval to be assigned to the same cluster. If the ends of both intervals in question are open and exactly at distance w, they will not be deemed to be cluster co-members. See the example below.

Examples

Run this code

# Numeric method
w <- 20
x <- sample( 1000, 100 )
c1 <- clusters( x, w )

# Check results
sapply( c1, function( x ) all( diff(x) <= w ) )
d1 <- diff( sort(x) )
all.equal(
          as.numeric( d1[ d1 <= w ] ),
          unlist( sapply( c1, diff ) )
          )

# Intervals method, starting with a reduced object so we know that all
# intervals are disjoint and sorted.
B <- 100
left <- runif( B, 0, 1e4 )
right <- left + rexp( B, rate = 1/10 )
y <- reduce( Intervals( cbind( left, right ) ) )

gaps <- function(x) x[-1,1] - x[-nrow(x),2]
hist( gaps(y), breaks = 30 )

w <- 200
c2 <- clusters( y, w )
head( c2 )
sapply( c2, function(x) all( gaps(x) <= w ) )

# Clusters and open end points. See "Details".
z <- Intervals(
               matrix( 1:4, 2, 2, byrow = TRUE ),
               closed = c( TRUE, FALSE )
               )
z
clusters( z, 1 )
closed(z)[1] <- FALSE
z
clusters( z, 1 )

Run the code above in your browser using DataLab