An OMLDataSet consists of an OMLDataSetDescription, a data.frame containing the data set, the old and new column names and, finally, the target features.

The OMLDataSetDescription provides information on the data set, like the ID, name, version, etc. To see a full list of all elements, please see the XSD.

The slot colnames.old contains the original names, i.e., the column names that were uploaded to the server, while colnames.new contains the names that you will see when working with the data in R. Most of the time, old and new column names are identical. Only if the original names are not valid, the new ones will differ.

The slot target.features contains the column name(s) from the data.frame of the OMLDataSet that refer to the target feature(s).

makeOMLDataSet(desc, data, colnames.old = colnames(data),
  colnames.new = colnames(data), target.features = NULL)

Arguments

desc

[OMLDataSetDescription]
Data set description.

data

[data.frame]
The data set.

colnames.old

[character]
Names of the features that were uploaded to the server.

colnames.new

[character]
Names of the features that are displayed.

target.features

[character]
Name(s) of the target feature(s). If set, this will replace the default target in desc.

Value

[OMLDataSet]

See also

Examples

data("airquality") dsc = "Daily air quality measurements in New York, May to September 1973. This data is taken from R." cit = "Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical Methods for Data Analysis. Belmont, CA: Wadsworth." desc_airquality = makeOMLDataSetDescription(name = "airquality", description = dsc, creator = "New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data)", collection.date = "May 1, 1973 to September 30, 1973", language = "English", licence = "GPL-2", url = "https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html", default.target.attribute = "Ozone", citation = cit, tags = "R") airquality_oml = makeOMLDataSet(desc = desc_airquality, data = airquality, colnames.old = colnames(airquality), colnames.new = colnames(airquality), target.features = "Ozone")