ZIClass: Create and manipulate 'ZIClass' objects

Description

'ZIClass' objects are key items in ZooImage. They contain all what is required for automatically classify plancton from .zid files. They can be used as blackboxes by all users (but require users trained in machine learning techniques to build them). Hence, ZooImage is made very simple for biologists that just want to use classifiers but do not want to worry about all the complexities of what is done inside the engine!

Usage

ZIClass(formula, data, method = getOption("ZI.mlearning", "mlRforest"),
    calc.vars = getOption("ZI.calcVars", calcVars), drop.vars = NULL,
    drop.vars.def = dropVars(), cv.k = 10, cv.strat = TRUE,
    …, subset, na.action = na.omit)
# S3 method for ZIClass
print(x, …)
# S3 method for ZIClass
summary(object, sort.by = "Fscore", decreasing = TRUE,
    na.rm = FALSE, …)
# S3 method for ZIClass
predict(object, newdata, calc = TRUE, class.only = TRUE,
    type = "class", …)
# S3 method for ZIClass
confusion(x, y = response(x), labels = c("Actual", "Predicted"),
    useNA = "ifany", prior, use.cv = TRUE, …)

Arguments

formula

a formula with left member being the class variable and the right member being a list of predicting variables separated by a '+' sign. Since data is supposed to be previously filtered using calc.vars and the class variable in 'ZITrain' object is always named Class, the formula almost always reduces to Class ~ .

data

a data frame (a 'ZITrain' object usually), containing both measurement and manual classification (a factor variables usually named 'Class').

method

the machine learning method to use. It should produce results compatible with mlearning objects as returned by the various mlXXX() functions in the mlearning package. By default, the random forest algorithm is used (it is among the ones that give best result with plankton).

calc.vars

a function to use to calculate variables from the original data frame.

drop.vars

a character vector with names of variables to drop for the classification, or NULL (by default) to keep them all.

drop.vars.def

a second list of variables to drop contained in a character vector. That list is supposed to match the name of variables that are obviously non informative and are dropped by default. It can be gathered automatically using dropVars(). See ?calcVars for more details.

cv.k

the k times for cross-validation.

cv.strat

do we use a stratified sampling for cross-validation? (recommended).

…

further arguments to pass to the classification algorithm (see help of that particular function).

subset

an expression for subsetting to original data frame.

na.action

the function to filter the initial data frame for missing values. Althoung the default in R is na.fail, leading to failure if at least one NA is found in the data frame, the default here is na.omit which leads to elimination of all lines containing at least one NA. Take care about how many items remain, if you encounter many NAs in your dataset!

a 'ZIClass' object.

object

a 'ZIClass' object.

newdata

a 'ZIDat' object, or a 'data.frame' to use for prediction.

sort.by

the statistics to use to sort the table (by default, F-score).

decreasing

do we sort in increasing or decreasing order?

na.rm

do we eliminate entries with missing data first (using na.omit())?

calc

a boolean indicating if variables have to be recalculated before running the prediction.

class.only

if TRUE, return just a vector with classification, otherwise, return the 'ZIDat' object with 'Predicted' column appended to it.

type

the type of result to return, "class" by default. No other value is permitted if class.only is FALSE.

a factor with reference classes.

labels

labels to use for, respectively, the reference class and the predicted class.

useNA

do we keep NAs as a separate category? The default "ifany" creates this category only if there are missing values. Other possibilities are "no", or "always". The default is suitable for test sets because unclassified items (those in the "\_" directory or one of its subdirectories) get NA for Class.

prior

class frequencies to use for first classifier that is tabulated in the rows of the confusion matrix. This is either a single positive numeric to set all class frequencies to this value (use 1 for relative frequencies and 100 for relative freqs in percent), or a vector of positive numbers of the same length as the levels in the object. If the vector is named, names must match levels. Alternatively, providing NULL or an object of null length resets row class prefencies into their initial values.

use.cv

the predicted values extracted from the 'ZIClass' object can either be the predicted values from the training set, or the cross-validated predictions (by default). Most of the time, you want the cross-validated predictions, which allows for not (or less) biased evaluation of the classifier prediction... So, if you don't know, you are probably better leaving the default value.

Value

ZIClass() is the constructor that build the 'ZIClass' object. print(), summary() and predict()) are the methods to print the object, to calculate statistics on this classifier based on the confusion matrix and to predict groups for ZooImage samples, using one 'ZIClass' object.

Examples

Run this code

# NOT RUN {
##TODO...
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples