openml.datasets.edit_dataset

openml.datasets.edit_dataset(data_id: int, description: str | None = None, creator: str | None = None, contributor: str | None = None, collection_date: str | None = None, language: str | None = None, default_target_attribute: str | None = None, ignore_attribute: str | list[str] | None = None, citation: str | None = None, row_id_attribute: str | None = None, original_data_url: str | None = None, paper_url: str | None = None) int

Edits an OpenMLDataset.

In addition to providing the dataset id of the dataset to edit (through data_id), you must specify a value for at least one of the optional function arguments, i.e. one value for a field to edit.

This function allows editing of both non-critical and critical fields. Critical fields are default_target_attribute, ignore_attribute, row_id_attribute.

  • Editing non-critical data fields is allowed for all authenticated users.

  • Editing critical fields is allowed only for the owner, provided there are no tasks associated with this dataset.

If dataset has tasks or if the user is not the owner, the only way to edit critical fields is to use fork_dataset followed by edit_dataset.

Parameters:
data_idint

ID of the dataset.

descriptionstr

Description of the dataset.

creatorstr

The person who created the dataset.

contributorstr

People who contributed to the current version of the dataset.

collection_datestr

The date the data was originally collected, given by the uploader.

languagestr

Language in which the data is represented. Starts with 1 upper case letter, rest lower case, e.g. ‘English’.

default_target_attributestr

The default target attribute, if it exists. Can have multiple values, comma separated.

ignore_attributestr | list

Attributes that should be excluded in modelling, such as identifiers and indexes.

citationstr

Reference(s) that should be cited when building on this data.

row_id_attributestr, optional

The attribute that represents the row-id column, if present in the dataset. If data is a dataframe and row_id_attribute is not specified, the index of the dataframe will be used as the row_id_attribute. If the name of the index is None, it will be discarded.

original_data_urlstr, optional

For derived data, the url to the original dataset.

paper_urlstr, optional

Link to a paper describing the dataset.

Returns:
Dataset id