Learn R Programming

OpenML (version 1.12)

OMLDataSetDescription: Construct OMLDataSetDescription.

Description

Creates a description for an OMLDataSet. To see a full list of all elements, please see the documentation.

Usage

makeOMLDataSetDescription(
  id = 0L,
  name,
  version = "0",
  description,
  format = "ARFF",
  creator = NA_character_,
  contributor = NA_character_,
  collection.date = NA_character_,
  upload.date = as.POSIXct(Sys.time()),
  language = NA_character_,
  licence = NA_character_,
  url = NA_character_,
  default.target.attribute = NA_character_,
  row.id.attribute = NA_character_,
  ignore.attribute = NA_character_,
  version.label = NA_character_,
  citation = NA_character_,
  visibility = NA_character_,
  original.data.url = NA_character_,
  paper.url = NA_character_,
  update.comment = NA_character_,
  md5.checksum = NA_character_,
  status = NA_character_,
  tags = NA_character_
)

Arguments

id

[integer(1)]
Data set ID, autogenerated by the server. Ignored when set manually.

name

[character(1)]
The name of the data set.

version

[character(1)]
Version of the data set, autogenerated by the server. Ignored when set manually.

description

[character(1)]
Description of the data set, given by the uploader.

format

[character(1)]
Format of the data set. At the moment this is always "ARFF".

creator

[character]
The person(s), that created this data set. Optional.

contributor

[character]
People, that contibuted to this version of the data set (e.g., by reformatting). Optional.

collection.date

[character(1)]
The date the data was originally collected. Given by the uploader. Optional.

upload.date

[POSIXt]
The date the data was uploaded. Added by the server. Ignored when set manually.

language

[character(1)]
Language in which the data is represented. Starts with 1 upper case letter, rest lower case, e.g. 'English'

licence

[character(1)]
Licence of the data. NA means: Public Domain or "don't know/care".

url

[character(1)]
Valid URL that points to the data file.

default.target.attribute

[character]
The default target attribute, if it exists. Of course, tasks can be defined that use another attribute as target.

row.id.attribute

[character(1)]
The attribute that represents the row-id column, if present in the data set. Else NA.

ignore.attribute

[character]
Attributes that should be excluded in modelling, such as identifiers and indexes. Optional.

version.label

[character(1)]
Version label provided by user, something relevant to the user. Can also be a date, hash, or some other type of id.

citation

[character(1)]
Reference(s) that should be cited when building on this data.

visibility

[character(1)]
Who can see the data set. Typical values: 'Everyone', 'All my friends', 'Only me'. Can also be any of the user's circles.

original.data.url

[character(1)]
For derived data, the url to the original data set. This can be an OpenML data set, e.g. 'http://openml.org/d/1'.

paper.url

[character(1)]
Link to a paper describing the data set.

update.comment

[character(1)]
When the data set is updated, add an explanation here.

md5.checksum

[character(1)]
MD5 checksum to check if the data set is downloaded without corruption. Can be ignored by user.

status

[character(1)]
The status of the data set, autogenerated by the server. Ignored when set manually.

tags

[character]
Optional tags for the data set.

See Also

Other data set-related functions: OMLDataSet, convertMlrTaskToOMLDataSet(), convertOMLDataSetToMlr(), deleteOMLObject(), getOMLDataSet(), listOMLDataSets(), tagOMLObject(), uploadOMLDataSet()

Examples

Run this code
data("airquality")
dsc = "Daily air quality measurements in New York, May to September 1973.
This data is taken from R."
cit = "Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical
Methods for Data Analysis. Belmont, CA: Wadsworth."
desc_airquality = makeOMLDataSetDescription(name = "airquality",
  description = dsc,
  creator = "New York State Department of Conservation (ozone data) and the National
    Weather Service (meteorological data)",
  collection.date = "May 1, 1973 to September 30, 1973",
  language = "English",
  licence = "GPL-2",
  url = "https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html",
  default.target.attribute = "Ozone",
  citation = cit,
  tags = "R")

airquality_oml = makeOMLDataSet(desc = desc_airquality,
  data = airquality,
  colnames.old = colnames(airquality),
  colnames.new = colnames(airquality),
  target.features = "Ozone")

Run the code above in your browser using DataLab