Creates a description for an OMLDataSet. To see a full list of all elements, please see the XSD.

makeOMLDataSetDescription(id = 0L, name, version = "0", description,
  format = "ARFF", creator = NA_character_,
  contributor = NA_character_, = NA_character_, = as.POSIXct(Sys.time()), language = NA_character_,
  licence = NA_character_, url = NA_character_, = NA_character_, = NA_character_, ignore.attribute = NA_character_,
  version.label = NA_character_, citation = NA_character_,
  visibility = NA_character_, = NA_character_,
  paper.url = NA_character_, update.comment = NA_character_,
  md5.checksum = NA_character_, status = NA_character_,
  tags = NA_character_)



Data set ID, autogenerated by the server. Ignored when set manually.


The name of the data set.


Version of the data set, autogenerated by the server. Ignored when set manually.


Description of the data set, given by the uploader.


Format of the data set. At the moment this is always "ARFF".


The person(s), that created this data set. Optional.


People, that contibuted to this version of the data set (e.g., by reformatting). Optional.

The date the data was originally collected. Given by the uploader. Optional.

The date the data was uploaded. Added by the server. Ignored when set manually.


Language in which the data is represented. Starts with 1 upper case letter, rest lower case, e.g. 'English'


Licence of the data. NA means: Public Domain or "don't know/care".


Valid URL that points to the data file.

The default target attribute, if it exists. Of course, tasks can be defined that use another attribute as target.

The attribute that represents the row-id column, if present in the data set. Else NA.


Attributes that should be excluded in modelling, such as identifiers and indexes. Optional.


Version label provided by user, something relevant to the user. Can also be a date, hash, or some other type of id.


Reference(s) that should be cited when building on this data.


Who can see the data set. Typical values: 'Everyone', 'All my friends', 'Only me'. Can also be any of the user's circles.

For derived data, the url to the original data set. This can be an OpenML data set, e.g. ''.


Link to a paper describing the data set.


When the data set is updated, add an explanation here.


MD5 checksum to check if the data set is downloaded without corruption. Can be ignored by user.


The status of the data set, autogenerated by the server. Ignored when set manually.


Optional tags for the data set.

See also


data("airquality") dsc = "Daily air quality measurements in New York, May to September 1973. This data is taken from R." cit = "Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical Methods for Data Analysis. Belmont, CA: Wadsworth." desc_airquality = makeOMLDataSetDescription(name = "airquality", description = dsc, creator = "New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data)", = "May 1, 1973 to September 30, 1973", language = "English", licence = "GPL-2", url = "", = "Ozone", citation = cit, tags = "R") airquality_oml = makeOMLDataSet(desc = desc_airquality, data = airquality, colnames.old = colnames(airquality), = colnames(airquality), target.features = "Ozone")