Learn R Programming

OpenML (version 1.12)

listOMLDataSets: List the first 5000 OpenML data sets.


The returned data.frame contains the data set id “data.id”, the “status” (“active”, “deactivated”, “in_preparation”) and describing data qualities.

Note that by default only active data sets (due to “status = "active"”) will be returned. Furthermore, the argument “limit = 5000” will limit the number of results to 5000.


  number.of.instances = NULL,
  number.of.features = NULL,
  number.of.classes = NULL,
  number.of.missing.values = NULL,
  tag = NULL,
  data.name = NULL,
  limit = 5000,
  offset = NULL,
  status = "active",
  verbosity = NULL





[numeric(1) | numeric(2)]
If not NULL, subsets the entries with respect to the given values or, if a vector of length 2 is passed, the given ranges.


[numeric(1) | numeric(2)]
If not NULL, it subsets the entries with respect to the given values or, if a vector of length 2 is passed, the given range.


[numeric(1) | numeric(2)]
If not NULL, subsets the entries with respect to the given values or, if a vector of length 2 is passed, the given ranges.


[numeric(1) | numeric(2)]
If not NULL, subsets the entries with respect to the given values or, if a vector of length 2 is passed, the given ranges.


If not NULL only entries with the corresponding tags are listed.


Name of the data set.


Optional. The maximum number of entries to return. Without specifying offset, it returns the first 'limit' entries. Setting limit = NULL returns all available entries.


Optional. The offset to start from. Should be indices starting from 0, which do not refer to IDs. Is ignored when no limit is given.


Subsets the results according to the status. Possible values are {"active", "deactivated", "in_preparation", "all"}. Default is "active".


Print verbose output on console? Possible values are:
0: normal output,
1: info output,
2: debug output.
Default is set via setOMLConfig.

See Also

Other listing functions: chunkOMLlist(), listOMLDataSetQualities(), listOMLEstimationProcedures(), listOMLEvaluationMeasures(), listOMLFlows(), listOMLRuns(), listOMLSetup(), listOMLStudies(), listOMLTaskTypes(), listOMLTasks()

Other data set-related functions: OMLDataSetDescription, OMLDataSet, convertMlrTaskToOMLDataSet(), convertOMLDataSetToMlr(), deleteOMLObject(), getOMLDataSet(), tagOMLObject(), uploadOMLDataSet()


Run this code
# \dontrun{
# 	datasets = listOMLDataSets()
# 	tail(datasets)
# }

Run the code above in your browser using DataLab