openml.datasets.list_datasets

openml.datasets.list_datasets(offset: Union[int, NoneType] = None, size: Union[int, NoneType] = None, status: Union[str, NoneType] = None, tag: Union[str, NoneType] = None, output_format: str = 'dict', **kwargs) → Union[Dict, pandas.core.frame.DataFrame]

Return a list of all dataset which are on OpenML. Supports large amount of results.

Parameters
offsetint, optional

The number of datasets to skip, starting from the first.

sizeint, optional

The maximum number of datasets to show.

statusstr, optional

Should be {active, in_preparation, deactivated}. By default active datasets are returned, but also datasets from another status can be requested.

tagstr, optional
output_format: str, optional (default=’dict’)

The parameter decides the format of the output. - If ‘dict’ the output is a dict of dict - If ‘dataframe’ the output is a pandas DataFrame

kwargsdict, optional

Legal filter operators (keys in the dict): data_name, data_version, number_instances, number_features, number_classes, number_missing_values.

Returns
datasetsdict of dicts, or dataframe
  • If output_format=’dict’

    A mapping from dataset ID to dict.

    Every dataset is represented by a dictionary containing the following information: - dataset id - name - format - status If qualities are calculated for the dataset, some of these are also returned.

  • If output_format=’dataframe’

    Each row maps to a dataset Each column contains the following information: - dataset id - name - format - status If qualities are calculated for the dataset, some of these are also included as columns.