Takes images, computes color clusters for each image, and calculates distance matrix/dendrogram from those clusters.
imageClusterPipeline(
images,
cluster.method = "hist",
distance.method = "emd",
lower = c(0, 140/255, 0),
upper = c(60/255, 1, 60/255),
hist.bins = 3,
kmeans.bins = 27,
bin.avg = TRUE,
norm.pix = FALSE,
plot.bins = FALSE,
pausing = TRUE,
color.space = "rgb",
ref.white,
from = "sRGB",
bounds = c(0, 1),
sample.size = 20000,
iter.max = 50,
nstart = 5,
img.type = FALSE,
ordering = "default",
size.weight = 0.5,
color.weight = 0.5,
plot.heatmap = TRUE,
return.distance.matrix = TRUE,
save.tree = FALSE,
save.distance.matrix = FALSE,
a.bounds = c(-127, 128),
b.bounds = c(-127, 128)
)
Character vector of directories, image paths, or both.
Which method for getting color clusters from each image
should be used? Must be either "hist"
(predetermined bins generated
by dividing each channel with equidistant bounds; calls
getHistList
) or "kmeans"
(determine clusters using
kmeans fitting on pixels; calls getKMeansList
).
One of four possible comparison methods for calculating
the color distances: "emd"
(uses EMDistance
,
recommended), "chisq"
(uses chisqDistance
),
"color.dist"
(uses colorDistance
; not appropriate if
bin.avg=F), or "weighted.pairs"
(weightedPairsDistance
).
RGB or HSV triplet specifying the lower bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]).
RGB or HSV triplet specifying the upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:
Black: lower=c(0, 0, 0); upper=c(0.1, 0.1, 0.1)
White: lower=c(0.8, 0.8, 0.8); upper=c(1, 1, 1)
Green: lower=c(0, 0.55, 0); upper=c(0.24, 1, 0.24)
Blue: lower=c(0, 0, 0.55); upper=c(0.24, 0.24, 1)
If no background filtering is
needed, set bounds to some non-numeric value (NULL
, FALSE
,
"off"
, etc); any non-numeric value is interpreted as NULL
.
Only applicable if cluster.method="hist"
. Number of
bins for each channel OR a vector of length 3 with bins for each channel.
Bins=3 will result in 3^3 = 27 bins; bins=c(2, 2, 3) will result in
2*2*3=12 bins (2 red, 2 green, 3 blue), etc. Passed to
getHistList
.
Only applicable if cluster.method="kmeans"
. Number of
KMeans clusters to fit. Unlike getImageHist
, this represents
the actual final number of bins, rather than the number of breaks in each
channel.
Logical. Should the color clusters used for the distance matrix
be the average of the pixels in that bin (bin.avg=TRUE
) or the center
of the bin (FALSE)? If a bin is empty, the center of the bin is returned
as the cluster color regardless. Only applicable if
cluster.method="hist"
, since kmeans
clusters are at the center
of their assigned pixel clouds by definition.
Logical. Should RGB or HSV cluster values be normalized using
normalizeRGB
?
Logical. Should the bins for each image be plotted as they are calculated?
Logical. If plot.bins=TRUE
, pause and wait for user
keystroke before plotting bins for next image?
The color space ("rgb"
, "hsv"
, or
"lab"
) in which to plot pixels.
The reference white passed to
convertColorSpace
; must be specified if using
color.space = "lab"
.
Display color space of image if clustering in CIE Lab space, probably either "sRGB" or "Apple RGB", depending on your computer.
Upper and lower limits for the channels; R reads in images with intensities on a 0-1 scale, but 0-255 is common.
Only applicable if cluster.method="kmeans"
. Number of
pixels to be randomly sampled from filtered pixel array for performing fit.
If set to FALSE
, all pixels are fit, but this can be time-consuming,
especially for large images. Passed to getKMeansList
.
Only applicable if cluster.method="kmeans"
. Inherited
from kmeans
. The maximum number of iterations allowed
during kmeans fitting. Passed to getKMeansList
.
Only applicable if cluster.method="kmeans"
. Inherited
from kmeans
. How many random sets should be chosen?
Passed to getKMeansList
.
Logical. Should file extensions be retained with labels?
Logical if not left as "default". Should the color clusters
in the list be reordered to minimize the distances between the pairs? If
left as default, ordering depends on distance method: "emd" and "chisq" do
not order clusters ("emd" orders on a case-by-case in the
EMDistance
function itself and reordering by size similarity
would make chi-squared meaningless); "color.dist" and "weighted.pairs" use
ordering. To override defaults, set to either T
(for ordering) or
F
(for no ordering).
Weight of size similarity in determining overall score and
ordering (if ordering=T
).
Weight of color similarity in determining overall score
and ordering (if ordering=T
). Color and size weights do not
necessarily have to sum to 1.
Logical. Should a heatmap of the distance matrix be plotted?
Logical. Should the distance matrix be returned to the R environment or just plotted?
Either logical or a filepath for saving the tree; default if
set to TRUE
is to save in current working directory as
"ColorTree.newick".
Either logical or filepath for saving distance
matrix; default if set to TRUE
is to save in current working
directory as "ColorDistanceMatrix.csv"
Passed to getLabHistList
.Numeric
ranges for the a (green-red) and b (blue-yellow) channels of Lab color
space. Technically, a and b have infinite range, but in practice nearly all
values fall between -128 and 127 (the default). Many images will have an
even narrower range than this, depending on the lighting conditions and
conversion; setting narrower ranges will result in finer-scale binning,
without generating empty bins at the edges of the channels.
Color distance matrix, heatmap, and saved distance matrix and tree
files if saving is TRUE
.
# NOT RUN {
colordistance::imageClusterPipeline(dir(system.file("extdata", "Heliconius/",
package="colordistance"), full.names=TRUE), color.space="hsv", lower=rep(0.8,
3), upper=rep(1, 3), cluster.method="hist", distance.method="emd",
hist.bins=3, plot.bins=TRUE, save.tree="example_tree.newick",
save.distance.matrix="example_DM.csv")
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab