openml.datasets.get_dataset(dataset_id: Union[int, str], download_data: bool = True, version: Optional[int] = None, error_if_multiple: bool = False, cache_format: str = 'pickle', download_qualities: bool = True) openml.datasets.dataset.OpenMLDataset

Download the OpenML dataset representation, optionally also download actual data file.

This function is thread/multiprocessing safe. This function uses caching. A check will be performed to determine if the information has previously been downloaded, and if so be loaded from disk instead of retrieved from the server.

If dataset is retrieved by name, a version may be specified. If no version is specified and multiple versions of the dataset exist, the earliest version of the dataset that is still active will be returned. If no version is specified, multiple versions of the dataset exist and exception_if_multiple is set to True, this function will raise an exception.

dataset_idint or str

Dataset ID of the dataset to download

download_databool (default=True)

If True, also download the data file. Beware that some datasets are large and it might make the operation noticeably slower. Metadata is also still retrieved. If False, create the OpenMLDataset and only populate it with the metadata. The data may later be retrieved through the OpenMLDataset.get_data method.

versionint, optional (default=None)

Specifies the version if dataset_id is specified by name. If no version is specified, retrieve the least recent still active version.

error_if_multiplebool (default=False)

If True raise an error if multiple datasets are found with matching criteria.

cache_formatstr (default=’pickle’)

Format for caching the dataset - may be feather or pickle Note that the default ‘pickle’ option may load slower than feather when no.of.rows is very high.

download_qualitiesbool (default=True)

Option to download ‘qualities’ meta-data in addition to the minimal dataset description.


The downloaded dataset.