openml
openml
#
The OpenML module implements a python interface to
OpenML <https://www.openml.org>
_, a collaborative platform for machine
learning. OpenML can be used to
- store, download and analyze datasets
- make experiments and their results (e.g. models, predictions) accesible and reproducible for everybody
- analyze experiments (uploaded by you and other collaborators) and conduct meta studies
In particular, this module implements a python interface for the
OpenML REST API <https://www.openml.org/guide#!rest_services>
(REST on wikipedia
<https://en.wikipedia.org/wiki/Representational_state_transfer>
).
OpenMLBenchmarkSuite
#
OpenMLBenchmarkSuite(suite_id: int | None, alias: str | None, name: str, description: str, status: str | None, creation_date: str | None, creator: int | None, tags: list[dict] | None, data: list[int] | None, tasks: list[int] | None)
Bases: BaseStudy
An OpenMLBenchmarkSuite represents the OpenML concept of a suite (a collection of tasks).
It contains the following information: name, id, description, creation date, creator id and the task ids.
According to this list of task ids, the suite object receives a list of OpenML object ids (datasets).
Parameters#
suite_id : int the study id alias : str (optional) a string ID, unique on server (url-friendly) main_entity_type : str the entity type (e.g., task, run) that is core in this study. only entities of this type can be added explicitly name : str the name of the study (meta-info) description : str brief description (meta-info) status : str Whether the study is in preparation, active or deactivated creation_date : str date of creation (meta-info) creator : int openml user id of the owner / creator tags : list(dict) The list of tags shows which tags are associated with the study. Each tag is a dict of (tag) name, window_start and write_access. data : list a list of data ids associated with this study tasks : list a list of task ids associated with this study
Source code in openml/study/study.py
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLClassificationTask
#
OpenMLClassificationTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 1, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, evaluation_measure: str | None = None, data_splits_url: str | None = None, task_id: int | None = None, class_labels: list[str] | None = None, cost_matrix: ndarray | None = None)
Bases: OpenMLSupervisedTask
OpenML Classification object.
Parameters#
task_type_id : TaskType ID of the Classification task type. task_type : str Name of the Classification task type. data_set_id : int ID of the OpenML dataset associated with the Classification task. target_name : str Name of the target variable. estimation_procedure_id : int, default=None ID of the estimation procedure for the Classification task. estimation_procedure_type : str, default=None Type of the estimation procedure. estimation_parameters : dict, default=None Estimation parameters for the Classification task. evaluation_measure : str, default=None Name of the evaluation measure. data_splits_url : str, default=None URL of the data splits for the Classification task. task_id : Union[int, None] ID of the Classification task (if it already exists on OpenML). class_labels : List of str, default=None A list of class labels (for classification tasks). cost_matrix : array, default=None A cost matrix (for classification tasks).
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
Returns#
tuple - X and y
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLClusteringTask
#
OpenMLClusteringTask(task_type_id: TaskType, task_type: str, data_set_id: int, estimation_procedure_id: int = 17, task_id: int | None = None, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, data_splits_url: str | None = None, evaluation_measure: str | None = None, target_name: str | None = None)
Bases: OpenMLTask
OpenML Clustering object.
Parameters#
task_type_id : TaskType Task type ID of the OpenML clustering task. task_type : str Task type of the OpenML clustering task. data_set_id : int ID of the OpenML dataset used in clustering the task. estimation_procedure_id : int, default=None ID of the OpenML estimation procedure. task_id : Union[int, None] ID of the OpenML clustering task. estimation_procedure_type : str, default=None Type of the OpenML estimation procedure used in the clustering task. estimation_parameters : dict, default=None Parameters used by the OpenML estimation procedure. data_splits_url : str, default=None URL of the OpenML data splits for the clustering task. evaluation_measure : str, default=None Evaluation measure used in the clustering task. target_name : str, default=None Name of the target feature (class) that is not part of the feature set for the clustering task.
Source code in openml/tasks/task.py
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLDataFeature
#
OpenMLDataFeature(index: int, name: str, data_type: str, nominal_values: list[str], number_missing_values: int, ontologies: list[str] | None = None)
Data Feature (a.k.a. Attribute) object.
Parameters#
index : int The index of this feature name : str Name of the feature data_type : str can be nominal, numeric, string, date (corresponds to arff) nominal_values : list(str) list of the possible values, in case of nominal attribute number_missing_values : int Number of rows that have a missing value for this feature. ontologies : list(str) list of ontologies attached to this feature. An ontology describes the concept that are described in a feature. An ontology is defined by an URL where the information is provided.
Source code in openml/datasets/data_feature.py
OpenMLDataset
#
OpenMLDataset(name: str, description: str | None, data_format: Literal['arff', 'sparse_arff'] = 'arff', cache_format: Literal['feather', 'pickle'] = 'pickle', dataset_id: int | None = None, version: int | None = None, creator: str | None = None, contributor: str | None = None, collection_date: str | None = None, upload_date: str | None = None, language: str | None = None, licence: str | None = None, url: str | None = None, default_target_attribute: str | None = None, row_id_attribute: str | None = None, ignore_attribute: str | list[str] | None = None, version_label: str | None = None, citation: str | None = None, tag: str | None = None, visibility: str | None = None, original_data_url: str | None = None, paper_url: str | None = None, update_comment: str | None = None, md5_checksum: str | None = None, data_file: str | None = None, features_file: str | None = None, qualities_file: str | None = None, dataset: str | None = None, parquet_url: str | None = None, parquet_file: str | None = None)
Bases: OpenMLBase
Dataset object.
Allows fetching and uploading datasets to OpenML.
Parameters#
name : str Name of the dataset. description : str Description of the dataset. data_format : str Format of the dataset which can be either 'arff' or 'sparse_arff'. cache_format : str Format for caching the dataset which can be either 'feather' or 'pickle'. dataset_id : int, optional Id autogenerated by the server. version : int, optional Version of this dataset. '1' for original version. Auto-incremented by server. creator : str, optional The person who created the dataset. contributor : str, optional People who contributed to the current version of the dataset. collection_date : str, optional The date the data was originally collected, given by the uploader. upload_date : str, optional The date-time when the dataset was uploaded, generated by server. language : str, optional Language in which the data is represented. Starts with 1 upper case letter, rest lower case, e.g. 'English'. licence : str, optional License of the data. url : str, optional Valid URL, points to actual data file. The file can be on the OpenML server or another dataset repository. default_target_attribute : str, optional The default target attribute, if it exists. Can have multiple values, comma separated. row_id_attribute : str, optional The attribute that represents the row-id column, if present in the dataset. ignore_attribute : str | list, optional Attributes that should be excluded in modelling, such as identifiers and indexes. version_label : str, optional Version label provided by user. Can be a date, hash, or some other type of id. citation : str, optional Reference(s) that should be cited when building on this data. tag : str, optional Tags, describing the algorithms. visibility : str, optional Who can see the dataset. Typical values: 'Everyone','All my friends','Only me'. Can also be any of the user's circles. original_data_url : str, optional For derived data, the url to the original dataset. paper_url : str, optional Link to a paper describing the dataset. update_comment : str, optional An explanation for when the dataset is uploaded. md5_checksum : str, optional MD5 checksum to check if the dataset is downloaded without corruption. data_file : str, optional Path to where the dataset is located. features_file : dict, optional A dictionary of dataset features, which maps a feature index to a OpenMLDataFeature. qualities_file : dict, optional A dictionary of dataset qualities, which maps a quality name to a quality value. dataset: string, optional Serialized arff dataset string. parquet_url: string, optional This is the URL to the storage location where the dataset files are hosted. This can be a MinIO bucket URL. If specified, the data will be accessed from this URL when reading the files. parquet_file: string, optional Path to the local file.
Source code in openml/datasets/dataset.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 |
|
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
get_data
#
get_data(target: list[str] | str | None = None, include_row_id: bool = False, include_ignore_attribute: bool = False) -> tuple[DataFrame, Series | None, list[bool], list[str]]
Returns dataset content as dataframes.
Parameters#
target : string, List[str] or None (default=None) Name of target column to separate from the data. Splitting multiple columns is currently not supported. include_row_id : boolean (default=False) Whether to include row ids in the returned dataset. include_ignore_attribute : boolean (default=False) Whether to include columns that are marked as "ignore" on the server in the dataset.
Returns#
X : dataframe, shape (n_samples, n_columns) Dataset, may have sparse dtypes in the columns if required. y : pd.Series, shape (n_samples, ) or None Target column categorical_indicator : list[bool] Mask that indicate categorical features. attribute_names : list[str] List of attribute names.
Source code in openml/datasets/dataset.py
719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 |
|
get_features_by_type
#
get_features_by_type(data_type: str, exclude: list[str] | None = None, exclude_ignore_attribute: bool = True, exclude_row_id_attribute: bool = True) -> list[int]
Return indices of features of a given type, e.g. all nominal features. Optional parameters to exclude various features by index or ontology.
Parameters#
data_type : str The data type to return (e.g., nominal, numeric, date, string) exclude : list(int) List of columns to exclude from the return value exclude_ignore_attribute : bool Whether to exclude the defined ignore attributes (and adapt the return values as if these indices are not present) exclude_row_id_attribute : bool Whether to exclude the defined row id attributes (and adapt the return values as if these indices are not present)
Returns#
result : list a list of indices that have the specified data type
Source code in openml/datasets/dataset.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
retrieve_class_labels
#
Reads the datasets arff to determine the class-labels.
If the task has no class labels (for example a regression problem) it returns None. Necessary because the data returned by get_data only contains the indices of the classes, while OpenML needs the real classname when uploading the results of a run.
Parameters#
target_name : str Name of the target attribute
Returns#
list
Source code in openml/datasets/dataset.py
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLEvaluation
#
OpenMLEvaluation(run_id: int, task_id: int, setup_id: int, flow_id: int, flow_name: str, data_id: int, data_name: str, function: str, upload_time: str, uploader: int, uploader_name: str, value: float | None, values: list[float] | None, array_data: str | None = None)
Contains all meta-information about a run / evaluation combination, according to the evaluation/list function
Parameters#
run_id : int Refers to the run. task_id : int Refers to the task. setup_id : int Refers to the setup. flow_id : int Refers to the flow. flow_name : str Name of the referred flow. data_id : int Refers to the dataset. data_name : str The name of the dataset. function : str The evaluation metric of this item (e.g., accuracy). upload_time : str The time of evaluation. uploader: int Uploader ID (user ID) upload_name : str Name of the uploader of this evaluation value : float The value (score) of this evaluation. values : List[float] The values (scores) per repeat and fold (if requested) array_data : str list of information per class. (e.g., in case of precision, auroc, recall)
Source code in openml/evaluations/evaluation.py
OpenMLFlow
#
OpenMLFlow(name: str, description: str, model: object, components: dict, parameters: dict, parameters_meta_info: dict, external_version: str, tags: list, language: str, dependencies: str, class_name: str | None = None, custom_name: str | None = None, binary_url: str | None = None, binary_format: str | None = None, binary_md5: str | None = None, uploader: str | None = None, upload_date: str | None = None, flow_id: int | None = None, extension: Extension | None = None, version: str | None = None)
Bases: OpenMLBase
OpenML Flow. Stores machine learning models.
Flows should not be generated manually, but by the function
:meth:openml.flows.create_flow_from_model
. Using this helper function
ensures that all relevant fields are filled in.
Implements openml.implementation.upload.xsd
<https://github.com/openml/openml/blob/master/openml_OS/views/pages/api_new/v1/xsd/
openml.implementation.upload.xsd>
_.
Parameters#
name : str
Name of the flow. Is used together with the attribute
external_version
as a unique identifier of the flow.
description : str
Human-readable description of the flow (free text).
model : object
ML model which is described by this flow.
components : OrderedDict
Mapping from component identifier to an OpenMLFlow object. Components
are usually subfunctions of an algorithm (e.g. kernels), base learners
in ensemble algorithms (decision tree in adaboost) or building blocks
of a machine learning pipeline. Components are modeled as independent
flows and can be shared between flows (different pipelines can use
the same components).
parameters : OrderedDict
Mapping from parameter name to the parameter default value. The
parameter default value must be of type str
, so that the respective
toolbox plugin can take care of casting the parameter default value to
the correct type.
parameters_meta_info : OrderedDict
Mapping from parameter name to dict
. Stores additional information
for each parameter. Required keys are data_type
and description
.
external_version : str
Version number of the software the flow is implemented in. Is used
together with the attribute name
as a uniquer identifier of the flow.
tags : list
List of tags. Created on the server by other API calls.
language : str
Natural language the flow is described in (not the programming
language).
dependencies : str
A list of dependencies necessary to run the flow. This field should
contain all libraries the flow depends on. To allow reproducibility
it should also specify the exact version numbers.
class_name : str, optional
The development language name of the class which is described by this
flow.
custom_name : str, optional
Custom name of the flow given by the owner.
binary_url : str, optional
Url from which the binary can be downloaded. Added by the server.
Ignored when uploaded manually. Will not be used by the python API
because binaries aren't compatible across machines.
binary_format : str, optional
Format in which the binary code was uploaded. Will not be used by the
python API because binaries aren't compatible across machines.
binary_md5 : str, optional
MD5 checksum to check if the binary code was correctly downloaded. Will
not be used by the python API because binaries aren't compatible across
machines.
uploader : str, optional
OpenML user ID of the uploader. Filled in by the server.
upload_date : str, optional
Date the flow was uploaded. Filled in by the server.
flow_id : int, optional
Flow ID. Assigned by the server.
extension : Extension, optional
The extension for a flow (e.g., sklearn).
version : str, optional
OpenML version of the flow. Assigned by the server.
Source code in openml/flows/flow.py
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
from_filesystem
classmethod
#
from_filesystem(input_directory: str | Path) -> OpenMLFlow
Read a flow from an XML in input_directory on the filesystem.
Source code in openml/flows/flow.py
get_structure
#
Returns for each sub-component of the flow the path of identifiers that should be traversed to reach this component. The resulting dict maps a key (identifying a flow by either its id, name or fullname) to the parameter prefix.
Parameters#
key_item: str The flow attribute that will be used to identify flows in the structure. Allowed values {flow_id, name}
Returns#
dict[str, List[str]] The flow structure
Source code in openml/flows/flow.py
get_subflow
#
get_subflow(structure: list[str]) -> OpenMLFlow
Returns a subflow from the tree of dependencies.
Parameters#
structure: list[str] A list of strings, indicating the location of the subflow
Returns#
OpenMLFlow The OpenMLFlow that corresponds to the structure
Source code in openml/flows/flow.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish(raise_error_if_exists: bool = False) -> OpenMLFlow
Publish this flow to OpenML server.
Raises a PyOpenMLError if the flow exists on the server, but
self.flow_id
does not match the server known flow id.
Parameters#
raise_error_if_exists : bool, optional (default=False) If True, raise PyOpenMLError if the flow exists on the server. If False, update the local flow to match the server flow.
Returns#
self : OpenMLFlow
Source code in openml/flows/flow.py
push_tag
#
remove_tag
#
to_filesystem
#
Write a flow to the filesystem as XML to output_directory.
Source code in openml/flows/flow.py
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLLearningCurveTask
#
OpenMLLearningCurveTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 13, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, data_splits_url: str | None = None, task_id: int | None = None, evaluation_measure: str | None = None, class_labels: list[str] | None = None, cost_matrix: ndarray | None = None)
Bases: OpenMLClassificationTask
OpenML Learning Curve object.
Parameters#
task_type_id : TaskType ID of the Learning Curve task. task_type : str Name of the Learning Curve task. data_set_id : int ID of the dataset that this task is associated with. target_name : str Name of the target feature in the dataset. estimation_procedure_id : int, default=None ID of the estimation procedure to use for evaluating models. estimation_procedure_type : str, default=None Type of the estimation procedure. estimation_parameters : dict, default=None Additional parameters for the estimation procedure. data_splits_url : str, default=None URL of the file containing the data splits for Learning Curve task. task_id : Union[int, None] ID of the Learning Curve task. evaluation_measure : str, default=None Name of the evaluation measure to use for evaluating models. class_labels : list of str, default=None Class labels for Learning Curve tasks. cost_matrix : numpy array, default=None Cost matrix for Learning Curve tasks.
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
Returns#
tuple - X and y
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLParameter
#
OpenMLParameter(input_id: int, flow_id: int, flow_name: str, full_name: str, parameter_name: str, data_type: str, default_value: str, value: str)
Parameter object (used in setup).
Parameters#
input_id : int The input id from the openml database flow id : int The flow to which this parameter is associated flow name : str The name of the flow (no version number) to which this parameter is associated full_name : str The name of the flow and parameter combined parameter_name : str The name of the parameter data_type : str The datatype of the parameter. generally unused for sklearn flows default_value : str The default value. For sklearn parameters, this is unknown and a default value is selected arbitrarily value : str If the parameter was set, the value that it was set to.
Source code in openml/setups/setup.py
OpenMLRegressionTask
#
OpenMLRegressionTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 7, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, data_splits_url: str | None = None, task_id: int | None = None, evaluation_measure: str | None = None)
Bases: OpenMLSupervisedTask
OpenML Regression object.
Parameters#
task_type_id : TaskType Task type ID of the OpenML Regression task. task_type : str Task type of the OpenML Regression task. data_set_id : int ID of the OpenML dataset. target_name : str Name of the target feature used in the Regression task. estimation_procedure_id : int, default=None ID of the OpenML estimation procedure. estimation_procedure_type : str, default=None Type of the OpenML estimation procedure. estimation_parameters : dict, default=None Parameters used by the OpenML estimation procedure. data_splits_url : str, default=None URL of the OpenML data splits for the Regression task. task_id : Union[int, None] ID of the OpenML Regression task. evaluation_measure : str, default=None Evaluation measure used in the Regression task.
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
Returns#
tuple - X and y
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLRun
#
OpenMLRun(task_id: int, flow_id: int | None, dataset_id: int | None, setup_string: str | None = None, output_files: dict[str, int] | None = None, setup_id: int | None = None, tags: list[str] | None = None, uploader: int | None = None, uploader_name: str | None = None, evaluations: dict | None = None, fold_evaluations: dict | None = None, sample_evaluations: dict | None = None, data_content: list[list] | None = None, trace: OpenMLRunTrace | None = None, model: object | None = None, task_type: str | None = None, task_evaluation_measure: str | None = None, flow_name: str | None = None, parameter_settings: list[dict[str, Any]] | None = None, predictions_url: str | None = None, task: OpenMLTask | None = None, flow: OpenMLFlow | None = None, run_id: int | None = None, description_text: str | None = None, run_details: str | None = None)
Bases: OpenMLBase
OpenML Run: result of running a model on an OpenML dataset.
Parameters#
task_id: int The ID of the OpenML task associated with the run. flow_id: int The ID of the OpenML flow associated with the run. dataset_id: int The ID of the OpenML dataset used for the run. setup_string: str The setup string of the run. output_files: Dict[str, int] Specifies where each related file can be found. setup_id: int An integer representing the ID of the setup used for the run. tags: List[str] Representing the tags associated with the run. uploader: int User ID of the uploader. uploader_name: str The name of the person who uploaded the run. evaluations: Dict Representing the evaluations of the run. fold_evaluations: Dict The evaluations of the run for each fold. sample_evaluations: Dict The evaluations of the run for each sample. data_content: List[List] The predictions generated from executing this run. trace: OpenMLRunTrace The trace containing information on internal model evaluations of this run. model: object The untrained model that was evaluated in the run. task_type: str The type of the OpenML task associated with the run. task_evaluation_measure: str The evaluation measure used for the task. flow_name: str The name of the OpenML flow associated with the run. parameter_settings: list[OrderedDict] Representing the parameter settings used for the run. predictions_url: str The URL of the predictions file. task: OpenMLTask An instance of the OpenMLTask class, representing the OpenML task associated with the run. flow: OpenMLFlow An instance of the OpenMLFlow class, representing the OpenML flow associated with the run. run_id: int The ID of the run. description_text: str, optional Description text to add to the predictions file. If left None, is set to the time the arff file is generated. run_details: str, optional (default=None) Description of the run stored in the run meta-data.
Source code in openml/runs/run.py
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
from_filesystem
classmethod
#
from_filesystem(directory: str | Path, expect_model: bool = True) -> OpenMLRun
The inverse of the to_filesystem method. Instantiates an OpenMLRun object based on files stored on the file system.
Parameters#
directory : str a path leading to the folder where the results are stored
bool
if True, it requires the model pickle to be present, and an error will be thrown if not. Otherwise, the model might or might not be present.
Returns#
run : OpenMLRun the re-instantiated run object
Source code in openml/runs/run.py
get_metric_fn
#
Calculates metric scores based on predicted values. Assumes the run has been executed locally (and contains run_data). Furthermore, it assumes that the 'correct' or 'truth' attribute is specified in the arff (which is an optional field, but always the case for openml-python runs)
Parameters#
sklearn_fn : function
a function pointer to a sklearn function that
accepts y_true
, y_pred
and **kwargs
kwargs : dict
kwargs for the function
Returns#
scores : ndarray of scores of length num_folds * num_repeats metric results
Source code in openml/runs/run.py
487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 |
|
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
to_filesystem
#
The inverse of the from_filesystem method. Serializes a run on the filesystem, to be uploaded later.
Parameters#
directory : str a path leading to the folder where the results will be stored. Should be empty
bool, optional (default=True)
if True, a model will be pickled as well. As this is the most storage expensive part, it is often desirable to not store the model.
Source code in openml/runs/run.py
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLSetup
#
Setup object (a.k.a. Configuration).
Parameters#
setup_id : int The OpenML setup id flow_id : int The flow that it is build upon parameters : dict The setting of the parameters
Source code in openml/setups/setup.py
OpenMLSplit
#
OpenMLSplit(name: int | str, description: str, split: dict[int, dict[int, dict[int, tuple[ndarray, ndarray]]]])
OpenML Split object.
This class manages train-test splits for a dataset across multiple repetitions, folds, and samples.
Parameters#
name : int or str The name or ID of the split. description : str A description of the split. split : dict A dictionary containing the splits organized by repetition, fold, and sample.
Source code in openml/tasks/split.py
get
#
Returns the specified data split from the CrossValidationSplit object.
Parameters#
repeat : int Index of the repeat to retrieve. fold : int Index of the fold to retrieve. sample : int Index of the sample to retrieve.
Returns#
numpy.ndarray The data split for the specified repeat, fold, and sample.
Raises#
ValueError If the specified repeat, fold, or sample is not known.
Source code in openml/tasks/split.py
OpenMLStudy
#
OpenMLStudy(study_id: int | None, alias: str | None, benchmark_suite: int | None, name: str, description: str, status: str | None, creation_date: str | None, creator: int | None, tags: list[dict] | None, data: list[int] | None, tasks: list[int] | None, flows: list[int] | None, runs: list[int] | None, setups: list[int] | None)
Bases: BaseStudy
An OpenMLStudy represents the OpenML concept of a study (a collection of runs).
It contains the following information: name, id, description, creation date, creator id and a list of run ids.
According to this list of run ids, the study object receives a list of OpenML object ids (datasets, flows, tasks and setups).
Parameters#
study_id : int the study id alias : str (optional) a string ID, unique on server (url-friendly) benchmark_suite : int (optional) the benchmark suite (another study) upon which this study is ran. can only be active if main entity type is runs. name : str the name of the study (meta-info) description : str brief description (meta-info) status : str Whether the study is in preparation, active or deactivated creation_date : str date of creation (meta-info) creator : int openml user id of the owner / creator tags : list(dict) The list of tags shows which tags are associated with the study. Each tag is a dict of (tag) name, window_start and write_access. data : list a list of data ids associated with this study tasks : list a list of task ids associated with this study flows : list a list of flow ids associated with this study runs : list a list of run ids associated with this study setups : list a list of setup ids associated with this study
Source code in openml/study/study.py
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLSupervisedTask
#
OpenMLSupervisedTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 1, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, evaluation_measure: str | None = None, data_splits_url: str | None = None, task_id: int | None = None)
Bases: OpenMLTask
, ABC
OpenML Supervised Classification object.
Parameters#
task_type_id : TaskType ID of the task type. task_type : str Name of the task type. data_set_id : int ID of the OpenML dataset associated with the task. target_name : str Name of the target feature (the class variable). estimation_procedure_id : int, default=None ID of the estimation procedure for the task. estimation_procedure_type : str, default=None Type of the estimation procedure for the task. estimation_parameters : dict, default=None Estimation parameters for the task. evaluation_measure : str, default=None Name of the evaluation measure for the task. data_splits_url : str, default=None URL of the data splits for the task. task_id: Union[int, None] Refers to the unique identifier of task.
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
Returns#
tuple - X and y
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLTask
#
OpenMLTask(task_id: int | None, task_type_id: TaskType, task_type: str, data_set_id: int, estimation_procedure_id: int = 1, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, evaluation_measure: str | None = None, data_splits_url: str | None = None)
Bases: OpenMLBase
OpenML Task object.
Parameters#
task_id: Union[int, None] Refers to the unique identifier of OpenML task. task_type_id: TaskType Refers to the type of OpenML task. task_type: str Refers to the OpenML task. data_set_id: int Refers to the data. estimation_procedure_id: int Refers to the type of estimates used. estimation_procedure_type: str, default=None Refers to the type of estimation procedure used for the OpenML task. estimation_parameters: [Dict[str, str]], default=None Estimation parameters used for the OpenML task. evaluation_measure: str, default=None Refers to the evaluation measure. data_splits_url: str, default=None Refers to the URL of the data splits used for the OpenML task.
Source code in openml/tasks/task.py
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
populate_cache
#
populate_cache(task_ids: list[int] | None = None, dataset_ids: list[int | str] | None = None, flow_ids: list[int] | None = None, run_ids: list[int] | None = None) -> None
Populate a cache for offline and parallel usage of the OpenML connector.
Parameters#
task_ids : iterable
dataset_ids : iterable
flow_ids : iterable
run_ids : iterable
Returns#
None