This is the abstract base class for TaskSupervised and TaskUnsupervised. TaskClassif and TaskRegr inherit from TaskSupervised. More supervised tasks are implemented in mlr3proba, unsupervised cluster tasks in package mlr3cluster.
Tasks serve two purposes:
Tasks wrap a DataBackend, an object to transparently interface different data storage types.
Tasks store meta-information, such as the role of the individual columns in the DataBackend. For example, for a classification task a single column must be marked as target column, and others as features.
Predefined (toy) tasks are stored in the dictionary mlr_tasks,
e.g. iris
or boston_housing
.
More toy tasks can be found in the dictionary after loading mlr3data.
as.data.table(t)
Task -> data.table::data.table()
Returns the complete data as data.table::data.table()
.
The following methods change the task in-place:
Any modification of the lists $col_roles
or $row_roles
.
This provides a different "view" on the data without altering the data itself.
Modification of column or row roles via $set_col_roles()
or $set_row_roles()
, respectively.
$filter()
and $select()
subset the set of active rows or features in $row_roles
or $col_roles
, respectively.
This provides a different "view" on the data without altering the data itself.
rbind()
and cbind()
change the task in-place by binding rows or columns to the data, but without modifying the original DataBackend.
Instead, the methods first create a new DataBackendDataTable from the provided new data, and then
merge both backends into an abstract DataBackend which merges the results on-demand.
rename()
wraps the DataBackend of the Task in an additional DataBackend which deals with the renaming. Also updates $col_roles
and $col_info
.
id
(character(1)
)
Identifier of the object.
Used in tables, plot and text output.
task_type
(character(1)
)
Task type, e.g. "classif"
or "regr"
.
For a complete list of possible task types (depending on the loaded packages),
see mlr_reflections$task_types$type
.
backend
(DataBackend) Abstract interface to the data of the task.
col_info
(data.table::data.table()
)
Table with with 3 columns:
"id"
(character()
) stores the name of the column.
"type"
(character()
) holds the storage type of the variable, e.g. integer
, numeric
or character
.
See mlr_reflections$task_feature_types for a complete list of allowed types.
"levels"
stores a vector of distinct values (levels) for ordered and unordered factor variables.
man
(character(1)
)
String in the format [pkg]::[topic]
pointing to a manual page for this object.
Defaults to NA
, but can be set by child classes.
extra_args
(named list()
)
Additional arguments set during construction.
Required for convert_task()
.
hash
(character(1)
)
Hash (unique identifier) for this object.
row_ids
(integer()
)
Returns the row ids of the DataBackend for observations with role "use".
row_names
(data.table::data.table()
)
Returns a table with two columns:
"row_id"
(integer()
), and
"row_name"
(character()
).
feature_names
(character()
)
Returns all column names with role == "feature"
.
Note that this vector determines the default order of columns for task$data(cols = NULL, ...)
.
However, it is recommended to not rely on the order of columns, but instead always
address columns by their name. The default order is not well defined after some
operations, e.g. after task$cbind()
or after processing via mlr3pipelines.
target_names
(character()
)
Returns all column names with role "target".
properties
(character()
)
Set of task properties.
Possible properties are are stored in mlr_reflections$task_properties.
The following properties are currently standardized and understood by tasks in mlr3:
"strata"
: The task is resampled using one or more stratification variables (role "stratum"
).
"groups"
: The task comes with grouping/blocking information (role "group"
).
"weights"
: The task comes with observation weights (role "weight"
).
Note that above listed properties are calculated from the $col_roles
and may not be set explicitly.
row_roles
(named list()
)
Each row (observation) can have an arbitrary number of roles in the learning task:
"use"
: Use in train / predict / resampling.
"validation"
: Hold the observations back unless explicitly requested.
Validation sets are not yet completely integrated into the package.
row_roles
is a named list whose elements are named by row role and each element is an integer()
vector of row ids.
To alter the roles, just modify the list, e.g. with R's set functions (intersect()
, setdiff()
, union()
, …).
col_roles
(named list()
)
Each column (feature) can have an arbitrary number of the following roles:
"feature"
: Regular feature used in the model fitting process.
"target"
: Target variable.
"name"
: Row names / observation labels. To be used in plots. Can be queried with $row_names
.
"order"
: Data returned by $data()
is ordered by this column (or these columns).
"group"
: During resampling, observations with the same value of the variable with role "group" are marked as "belonging together".
For each resampling iteration, observations of the same group will be exclusively assigned to be either in the training set or in the test set.
Note that only up to one column may have this role.
"stratum"
: Stratification variables. Multiple discrete columns may have this role.
"weight"
: Observation weights. Only up to one column (assumed to be discrete) may have this role.
col_roles
is a named list whose elements are named by column role and each element is a character()
vector of column names.
To alter the roles, just modify the list, e.g. with R's set functions (intersect()
, setdiff()
, union()
, …).
nrow
(integer(1)
)
Returns the total number of rows with role "use".
ncol
(integer(1)
)
Returns the total number of columns with role "target" or "feature".
feature_types
(data.table::data.table()
)
Returns a table with columns id
and type
where id
are the column names of "active"
features of the task and type
is the storage type.
data_formats
character()
Vector of supported data output formats.
A specific format can be chosen in the $data()
method.
strata
(data.table::data.table()
)
If the task has columns designated with role "stratum"
, returns a table with one subpopulation per row and two columns:
N
(integer()
) with the number of observations in the subpopulation, and
row_id
(list of integer()
) as list column with the row ids in the respective subpopulation.
Returns NULL
if there are is no stratification variable.
See Resampling for more information on stratification.
groups
(data.table::data.table()
)
If the task has a column with designated role "group"
, a table with two columns:
row_id
(integer()
), and
grouping variable group
(vector()
).
Returns NULL
if there are is no grouping column.
See Resampling for more information on grouping.
order
(data.table::data.table()
)
If the task has at least one column with designated role "order"
, a table with two columns:
row_id
(integer()
), and
ordering vector order
(integer()
).
Returns NULL
if there are is no order column.
weights
(data.table::data.table()
)
If the task has a column with designated role "weight"
, a table with two columns:
row_id
(integer()
), and
observation weights weight
(numeric()
).
Returns NULL
if there are is no weight column.
new()
Creates a new instance of this R6 class.
Note that this object is typically constructed via a derived classes, e.g. TaskClassif or TaskRegr.
Task$new(id, task_type, backend, extra_args = list())
id
(character(1)
)
Identifier for the new instance.
task_type
(character(1)
)
Type of task, e.g. "regr"
or "classif"
.
Must be an element of mlr_reflections$task_types$type.
backend
(DataBackend)
Either a DataBackend, or any object which is convertible to a DataBackend with as_data_backend()
.
E.g., a data.frame()
will be converted to a DataBackendDataTable.
extra_args
(named list()
)
Named list of constructor arguments, required for converting task types
via convert_task()
.
help()
Opens the corresponding help page referenced by field $man
.
Task$help()
format()
Helper for print outputs.
Task$format()
print()
Printer.
Task$print(...)
...
(ignored).
data()
Returns a slice of the data from the DataBackend in the data format specified by data_format
.
Rows are additionally subsetted to only contain observations with role "use"
, and
columns are filtered to only contain features with roles "target"
and "feature"
.
If invalid rows
or cols
are specified, an exception is raised.
Rows and columns are returned in the order specified via the arguments rows
and cols
.
If rows
is NULL
, rows are returned in the order of task$row_ids
.
If cols
is NULL
, the column order defaults to
c(task$target_names, task$feature_names)
.
Note that it is recommended to not rely on the order of columns, and instead always
address columns with their respective column name.
Task$data(rows = NULL, cols = NULL, data_format = "data.table", ordered = TRUE)
rows
integer()
Row indices.
cols
character()
Column names.
data_format
(character(1)
)
Desired data format, e.g. "data.table"
or "Matrix"
.
ordered
(logical(1)
)
If TRUE
(default), data is ordered according to the columns with column role "order"
.
Depending on the DataBackend, but usually a data.table::data.table()
.
formula()
Constructs a formula()
, e.g. [target] ~ [feature_1] + [feature_2] + ... + [feature_k]
,
using the features provided in argument rhs
(defaults to all columns with role "feature"
, symbolized by "."
).
Task$formula(rhs = ".")
rhs
(character(1)
)
Right hand side of the formula. Defaults to "."
(all features of the task).
head()
Get the first n
observations with role "use"
of all columns with role "target"
or "feature"
.
Task$head(n = 6L)
n
(integer(1)
).
data.table::data.table()
with n
rows.
levels()
Returns the distinct values for columns referenced in cols
with storage type "factor" or "ordered".
Argument cols
defaults to all such columns with role "target"
or "feature"
.
Note that this function ignores the row roles, it returns all levels available in the DataBackend.
To update the stored level information, e.g. after subsetting a task with $filter()
, call $droplevels()
.
Task$levels(cols = NULL)
cols
character()
Column names.
named list()
.
missings()
Returns the number of missing observations for columns referenced in cols
.
Considers only active rows with row role "use"
.
Argument cols
defaults to all columns with role "target" or "feature".
Task$missings(cols = NULL)
cols
character()
Column names.
Named integer()
.
filter()
Subsets the task, keeping only the rows specified via row ids rows
.
This operation mutates the task in-place. See the section on task mutators for more information.
Task$filter(rows)
rows
integer()
Row indices.
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
select()
Subsets the task, keeping only the features specified via column names cols
.
Note that you cannot deselect the target column, for obvious reasons.
This operation mutates the task in-place. See the section on task mutators for more information.
Task$select(cols)
cols
character()
Column names.
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
rbind()
Adds additional rows to the DataBackend stored in $backend
.
New row ids are automatically created, unless data
has a column whose name matches
the primary key of the DataBackend (task$backend$primary_key
).
In case of name clashes of row ids, rows in data
have higher precedence
and virtually overwrite the rows in the DataBackend.
All columns with the roles "target"
, "feature"
, "weight"
, "group"
, "stratum"
, and "order"
must be present in data
.
Columns only present in data
but not in the DataBackend of task
will be discarded.
This operation mutates the task in-place. See the section on task mutators for more information.
Task$rbind(data)
data
(data.frame()
).
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
cbind()
Adds additional columns to the DataBackend stored in $backend
.
The row ids must be provided as column in data
(with column name matching the primary key name of the DataBackend).
If this column is missing, it is assumed that the rows are exactly in the order of $row_ids
.
In case of name clashes of column names in data
and DataBackend, columns in data
have higher precedence
and virtually overwrite the columns in the DataBackend.
This operation mutates the task in-place. See the section on task mutators for more information.
Task$cbind(data)
data
(data.frame()
).
rename()
Renames columns by mapping column names in old
to new column names in new
(element-wise).
This operation mutates the task in-place. See the section on task mutators for more information.
Task$rename(old, new)
old
(character()
)
Old names.
new
(character()
)
New names.
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
set_row_roles()
Modifies the roles in $row_roles
in-place.
Task$set_row_roles(rows, roles = NULL, add_to = NULL, remove_from = NULL)
rows
(integer()
)
Row ids for which to change the roles for.
roles
(character()
)
Exclusively set rows to the specified roles
(remove from other roles).
add_to
(character()
)
Add rows with row ids rows
to roles specified in add_to
.
Rows keep their previous roles.
remove_from
(character()
)
Remove rows with row ids rows
from roles specified in remove_from
.
Other row roles are preserved.
Roles are first set exclusively (argument roles
), then added (argument add_to
) and finally
removed (argument remove_from
) from different roles.
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
set_col_roles()
Modifies the roles in $col_roles
in-place.
Task$set_col_roles(cols, roles = NULL, add_to = NULL, remove_from = NULL)
cols
(character()
)
Column names for which to change the roles for.
roles
(character()
)
Exclusively set columns to the specified roles
(remove from other roles).
add_to
(character()
)
Add columns with column names cols
to roles specified in add_to
.
Columns keep their previous roles.
remove_from
(character()
)
Remove columns with columns names cols
from roles specified in remove_from
.
Other column roles are preserved.
Roles are first set exclusively (argument roles
), then added (argument add_to
) and finally
removed (argument remove_from
) from different roles.
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
droplevels()
Updates the cache of stored factor levels, removing all levels not present in the current set of active rows.
cols
defaults to all columns with storage type "factor" or "ordered".
Task$droplevels(cols = NULL)
cols
character()
Column names.
Modified self
.
clone()
The objects of this class are cloneable with this method.
Task$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other Task:
TaskClassif
,
TaskRegr
,
TaskSupervised
,
TaskUnsupervised
,
mlr_tasks_boston_housing
,
mlr_tasks_breast_cancer
,
mlr_tasks_german_credit
,
mlr_tasks_iris
,
mlr_tasks_mtcars
,
mlr_tasks_pima
,
mlr_tasks_sonar
,
mlr_tasks_spam
,
mlr_tasks_wine
,
mlr_tasks_zoo
,
mlr_tasks
# NOT RUN {
# we use the inherited class TaskClassif here,
# Class Task is not intended for direct use
task = TaskClassif$new("iris", iris, target = "Species")
task$nrow
task$ncol
task$feature_names
task$formula()
# de-select "Petal.Width"
task$select(setdiff(task$feature_names, "Petal.Width"))
task$feature_names
# Add new column "foo"
task$cbind(data.frame(foo = 1:150))
task$head()
# }
Run the code above in your browser using DataLab