openml.datasets
.list_datasets¶
- openml.datasets.list_datasets(data_id: List[int] | None = None, offset: int | None = None, size: int | None = None, status: str | None = None, tag: str | None = None, output_format: str = 'dict', **kwargs) Dict | DataFrame ¶
Return a list of all dataset which are on OpenML. Supports large amount of results.
- Parameters:
- data_idlist, optional
A list of data ids, to specify which datasets should be listed
- offsetint, optional
The number of datasets to skip, starting from the first.
- sizeint, optional
The maximum number of datasets to show.
- statusstr, optional
Should be {active, in_preparation, deactivated}. By default active datasets are returned, but also datasets from another status can be requested.
- tagstr, optional
- output_format: str, optional (default=’dict’)
The parameter decides the format of the output. - If ‘dict’ the output is a dict of dict - If ‘dataframe’ the output is a pandas DataFrame
- kwargsdict, optional
Legal filter operators (keys in the dict): data_name, data_version, number_instances, number_features, number_classes, number_missing_values.
- Returns:
- datasetsdict of dicts, or dataframe
- If output_format=’dict’
A mapping from dataset ID to dict.
Every dataset is represented by a dictionary containing the following information: - dataset id - name - format - status If qualities are calculated for the dataset, some of these are also returned.
- If output_format=’dataframe’
Each row maps to a dataset Each column contains the following information: - dataset id - name - format - status If qualities are calculated for the dataset, some of these are also included as columns.