openml.datasets.list_datasets

openml.datasets.list_datasets(data_id: List[int] | None = None, offset: int | None = None, size: int | None = None, status: str | None = None, tag: str | None = None, output_format: str = 'dict', **kwargs) Dict | DataFrame

Return a list of all dataset which are on OpenML. Supports large amount of results.

Parameters:
data_idlist, optional

A list of data ids, to specify which datasets should be listed

offsetint, optional

The number of datasets to skip, starting from the first.

sizeint, optional

The maximum number of datasets to show.

statusstr, optional

Should be {active, in_preparation, deactivated}. By default active datasets are returned, but also datasets from another status can be requested.

tagstr, optional
output_format: str, optional (default=’dict’)

The parameter decides the format of the output. - If ‘dict’ the output is a dict of dict - If ‘dataframe’ the output is a pandas DataFrame

kwargsdict, optional

Legal filter operators (keys in the dict): data_name, data_version, number_instances, number_features, number_classes, number_missing_values.

Returns:
datasetsdict of dicts, or dataframe
  • If output_format=’dict’

    A mapping from dataset ID to dict.

    Every dataset is represented by a dictionary containing the following information: - dataset id - name - format - status If qualities are calculated for the dataset, some of these are also returned.

  • If output_format=’dataframe’

    Each row maps to a dataset Each column contains the following information: - dataset id - name - format - status If qualities are calculated for the dataset, some of these are also included as columns.