'ZIClass' objects are key items in ZooImage. They contain all what is required for automatically classify plancton from .zid files. They can be used as blackboxes by all users (but require users trained in machine learning techniques to build them). Hence, ZooImage is made very simple for biologists that just want to use classifiers but do not want to worry about all the complexities of what is done inside the engine!
ZIClass(formula, data, method = getOption("ZI.mlearning", "mlRforest"),
calc.vars = getOption("ZI.calcVars", calcVars), drop.vars = NULL,
drop.vars.def = dropVars(), cv.k = 10, cv.strat = TRUE,
…, subset, na.action = na.omit)# S3 method for ZIClass
print(x, …)
# S3 method for ZIClass
summary(object, sort.by = "Fscore", decreasing = TRUE,
na.rm = FALSE, …)
# S3 method for ZIClass
predict(object, newdata, calc = TRUE, class.only = TRUE,
type = "class", …)
# S3 method for ZIClass
confusion(x, y = response(x), labels = c("Actual", "Predicted"),
useNA = "ifany", prior, use.cv = TRUE, …)
a formula with left member being the class variable and the
right member being a list of predicting variables separated by a '+' sign.
Since data
is supposed to be previously filtered using
calc.vars
and the class variable in 'ZITrain' object is always
named Class
, the formula almost always reduces to Class ~ .
a data frame (a 'ZITrain' object usually), containing both measurement and manual classification (a factor variables usually named 'Class').
the machine learning method to use. It should produce
results compatible with mlearning
objects as returned by the various
mlXXX()
functions in the mlearning
package. By default, the
random forest algorithm is used (it is among the ones that give best result
with plankton).
a function to use to calculate variables from the original data frame.
a character vector with names of variables to drop for the
classification, or NULL
(by default) to keep them all.
a second list of variables to drop contained in a
character vector. That list is supposed to match the name of variables that
are obviously non informative and are dropped by default. It can be gathered
automatically using dropVars()
. See ?calcVars
for more details.
the k times for cross-validation.
do we use a stratified sampling for cross-validation? (recommended).
further arguments to pass to the classification algorithm (see help of that particular function).
an expression for subsetting to original data frame.
the function to filter the initial data frame for missing
values. Althoung the default in R is na.fail
, leading to failure if
at least one NA
is found in the data frame, the default here is
na.omit
which leads to elimination of all lines containing at least
one NA
. Take care about how many items remain, if you encounter
many NA
s in your dataset!
a 'ZIClass' object.
a 'ZIClass' object.
a 'ZIDat' object, or a 'data.frame' to use for prediction.
the statistics to use to sort the table (by default, F-score).
do we sort in increasing or decreasing order?
do we eliminate entries with missing data first (using
na.omit()
)?
a boolean indicating if variables have to be recalculated before running the prediction.
if TRUE, return just a vector with classification, otherwise, return the 'ZIDat' object with 'Predicted' column appended to it.
the type of result to return, "class"
by default. No other
value is permitted if class.only is FALSE
.
a factor with reference classes.
labels to use for, respectively, the reference class and the predicted class.
do we keep NAs as a separate category? The default "ifany"
creates this category only if there are missing values. Other possibilities
are "no"
, or "always"
. The default is suitable for test sets
because unclassified items (those in the "\_" directory or one of its
subdirectories) get NA
for Class.
class frequencies to use for first classifier that
is tabulated in the rows of the confusion matrix. This is either a single
positive numeric to set all class frequencies to this value (use 1 for
relative frequencies and 100 for relative freqs in percent), or a vector of
positive numbers of the same length as the levels in the object. If the
vector is named, names must match levels. Alternatively, providing
NULL
or an object of null length resets row class prefencies into
their initial values.
the predicted values extracted from the 'ZIClass' object can either be the predicted values from the training set, or the cross-validated predictions (by default). Most of the time, you want the cross-validated predictions, which allows for not (or less) biased evaluation of the classifier prediction... So, if you don't know, you are probably better leaving the default value.
ZIClass()
is the constructor that build the 'ZIClass' object.
print()
, summary()
and predict())
are the methods to
print the object, to calculate statistics on this classifier based on the
confusion matrix and to predict groups for ZooImage samples, using one
'ZIClass' object.