imageClusterPipeline: Generate and plot a color distance matrix from a set of images

Description

Takes images, computes color clusters for each image, and calculates distance matrix/dendrogram from those clusters.

Usage

imageClusterPipeline(
  images,
  cluster.method = "hist",
  distance.method = "emd",
  lower = c(0, 140/255, 0),
  upper = c(60/255, 1, 60/255),
  hist.bins = 3,
  kmeans.bins = 27,
  bin.avg = TRUE,
  norm.pix = FALSE,
  plot.bins = FALSE,
  pausing = TRUE,
  color.space = "rgb",
  ref.white,
  from = "sRGB",
  bounds = c(0, 1),
  sample.size = 20000,
  iter.max = 50,
  nstart = 5,
  img.type = FALSE,
  ordering = "default",
  size.weight = 0.5,
  color.weight = 0.5,
  plot.heatmap = TRUE,
  return.distance.matrix = TRUE,
  save.tree = FALSE,
  save.distance.matrix = FALSE,
  a.bounds = c(-127, 128),
  b.bounds = c(-127, 128)
)

Arguments

images

Character vector of directories, image paths, or both.

cluster.method

Which method for getting color clusters from each image should be used? Must be either "hist" (predetermined bins generated by dividing each channel with equidistant bounds; calls getHistList) or "kmeans" (determine clusters using kmeans fitting on pixels; calls getKMeansList).

distance.method

One of four possible comparison methods for calculating the color distances: "emd" (uses EMDistance, recommended), "chisq" (uses chisqDistance), "color.dist" (uses colorDistance; not appropriate if bin.avg=F), or "weighted.pairs" (weightedPairsDistance).

lower

RGB or HSV triplet specifying the lower bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]).

upper

RGB or HSV triplet specifying the upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:

Black: lower=c(0, 0, 0); upper=c(0.1, 0.1, 0.1)
White: lower=c(0.8, 0.8, 0.8); upper=c(1, 1, 1)
Green: lower=c(0, 0.55, 0); upper=c(0.24, 1, 0.24)
Blue: lower=c(0, 0, 0.55); upper=c(0.24, 0.24, 1)

If no background filtering is needed, set bounds to some non-numeric value (NULL, FALSE, "off", etc); any non-numeric value is interpreted as NULL.

hist.bins

Only applicable if cluster.method="hist". Number of bins for each channel OR a vector of length 3 with bins for each channel. Bins=3 will result in 3^3 = 27 bins; bins=c(2, 2, 3) will result in 2*2*3=12 bins (2 red, 2 green, 3 blue), etc. Passed to getHistList.

kmeans.bins

Only applicable if cluster.method="kmeans". Number of KMeans clusters to fit. Unlike getImageHist, this represents the actual final number of bins, rather than the number of breaks in each channel.

bin.avg

Logical. Should the color clusters used for the distance matrix be the average of the pixels in that bin (bin.avg=TRUE) or the center of the bin (FALSE)? If a bin is empty, the center of the bin is returned as the cluster color regardless. Only applicable if cluster.method="hist", since kmeans clusters are at the center of their assigned pixel clouds by definition.

norm.pix

Logical. Should RGB or HSV cluster values be normalized using normalizeRGB?

plot.bins

Logical. Should the bins for each image be plotted as they are calculated?

pausing

Logical. If plot.bins=TRUE, pause and wait for user keystroke before plotting bins for next image?

color.space

The color space ("rgb", "hsv", or "lab") in which to plot pixels.

ref.white

The reference white passed to convertColorSpace; must be specified if using color.space = "lab".

from

Display color space of image if clustering in CIE Lab space, probably either "sRGB" or "Apple RGB", depending on your computer.

bounds

Upper and lower limits for the channels; R reads in images with intensities on a 0-1 scale, but 0-255 is common.

sample.size

Only applicable if cluster.method="kmeans". Number of pixels to be randomly sampled from filtered pixel array for performing fit. If set to FALSE, all pixels are fit, but this can be time-consuming, especially for large images. Passed to getKMeansList.

iter.max

Only applicable if cluster.method="kmeans". Inherited from kmeans. The maximum number of iterations allowed during kmeans fitting. Passed to getKMeansList.

nstart

Only applicable if cluster.method="kmeans". Inherited from kmeans. How many random sets should be chosen? Passed to getKMeansList.

img.type

Logical. Should file extensions be retained with labels?

ordering

Logical if not left as "default". Should the color clusters in the list be reordered to minimize the distances between the pairs? If left as default, ordering depends on distance method: "emd" and "chisq" do not order clusters ("emd" orders on a case-by-case in the EMDistance function itself and reordering by size similarity would make chi-squared meaningless); "color.dist" and "weighted.pairs" use ordering. To override defaults, set to either T (for ordering) or F (for no ordering).

size.weight

Weight of size similarity in determining overall score and ordering (if ordering=T).

color.weight

Weight of color similarity in determining overall score and ordering (if ordering=T). Color and size weights do not necessarily have to sum to 1.

plot.heatmap

Logical. Should a heatmap of the distance matrix be plotted?

return.distance.matrix

Logical. Should the distance matrix be returned to the R environment or just plotted?

save.tree

Either logical or a filepath for saving the tree; default if set to TRUE is to save in current working directory as "ColorTree.newick".

save.distance.matrix

Either logical or filepath for saving distance matrix; default if set to TRUE is to save in current working directory as "ColorDistanceMatrix.csv"

a.bounds, b.bounds

Passed to getLabHistList.Numeric ranges for the a (green-red) and b (blue-yellow) channels of Lab color space. Technically, a and b have infinite range, but in practice nearly all values fall between -128 and 127 (the default). Many images will have an even narrower range than this, depending on the lighting conditions and conversion; setting narrower ranges will result in finer-scale binning, without generating empty bins at the edges of the channels.

Value

Color distance matrix, heatmap, and saved distance matrix and tree files if saving is TRUE.

Examples

Run this code

# NOT RUN {
colordistance::imageClusterPipeline(dir(system.file("extdata", "Heliconius/",
package="colordistance"), full.names=TRUE), color.space="hsv", lower=rep(0.8,
3), upper=rep(1, 3), cluster.method="hist", distance.method="emd",
hist.bins=3, plot.bins=TRUE, save.tree="example_tree.newick",
save.distance.matrix="example_DM.csv")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab