openml.datasets.get_dataset

openml.datasets.get_dataset(dataset_id: Union[int, str], download_data: bool = True, version: int = None, error_if_multiple: bool = False) → openml.datasets.dataset.OpenMLDataset

Download the OpenML dataset representation, optionally also download actual data file.

This function is thread/multiprocessing safe. This function uses caching. A check will be performed to determine if the information has previously been downloaded, and if so be loaded from disk instead of retrieved from the server.

If dataset is retrieved by name, a version may be specified. If no version is specified and multiple versions of the dataset exist, the earliest version of the dataset that is still active will be returned. This scenario will raise an error instead if exception_if_multiple is True.

Parameters
dataset_idint or str

Dataset ID of the dataset to download

download_databool, optional (default=True)

If True, also download the data file. Beware that some datasets are large and it might make the operation noticeably slower. Metadata is also still retrieved. If False, create the OpenMLDataset and only populate it with the metadata. The data may later be retrieved through the OpenMLDataset.get_data method.

versionint, optional (default=None)

Specifies the version if dataset_id is specified by name. If no version is specified, retrieve the least recent still active version.

error_if_multiplebool, optional (default=False)

If True raise an error if multiple datasets are found with matching criteria.

Returns
datasetopenml.OpenMLDataset

The downloaded dataset.