tasks
openml.tasks
#
OpenMLClassificationTask
#
OpenMLClassificationTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 1, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, evaluation_measure: str | None = None, data_splits_url: str | None = None, task_id: int | None = None, class_labels: list[str] | None = None, cost_matrix: ndarray | None = None)
Bases: OpenMLSupervisedTask
OpenML Classification object.
Parameters#
task_type_id : TaskType ID of the Classification task type. task_type : str Name of the Classification task type. data_set_id : int ID of the OpenML dataset associated with the Classification task. target_name : str Name of the target variable. estimation_procedure_id : int, default=None ID of the estimation procedure for the Classification task. estimation_procedure_type : str, default=None Type of the estimation procedure. estimation_parameters : dict, default=None Estimation parameters for the Classification task. evaluation_measure : str, default=None Name of the evaluation measure. data_splits_url : str, default=None URL of the data splits for the Classification task. task_id : Union[int, None] ID of the Classification task (if it already exists on OpenML). class_labels : List of str, default=None A list of class labels (for classification tasks). cost_matrix : array, default=None A cost matrix (for classification tasks).
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
Returns#
tuple - X and y
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLClusteringTask
#
OpenMLClusteringTask(task_type_id: TaskType, task_type: str, data_set_id: int, estimation_procedure_id: int = 17, task_id: int | None = None, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, data_splits_url: str | None = None, evaluation_measure: str | None = None, target_name: str | None = None)
Bases: OpenMLTask
OpenML Clustering object.
Parameters#
task_type_id : TaskType Task type ID of the OpenML clustering task. task_type : str Task type of the OpenML clustering task. data_set_id : int ID of the OpenML dataset used in clustering the task. estimation_procedure_id : int, default=None ID of the OpenML estimation procedure. task_id : Union[int, None] ID of the OpenML clustering task. estimation_procedure_type : str, default=None Type of the OpenML estimation procedure used in the clustering task. estimation_parameters : dict, default=None Parameters used by the OpenML estimation procedure. data_splits_url : str, default=None URL of the OpenML data splits for the clustering task. evaluation_measure : str, default=None Evaluation measure used in the clustering task. target_name : str, default=None Name of the target feature (class) that is not part of the feature set for the clustering task.
Source code in openml/tasks/task.py
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLLearningCurveTask
#
OpenMLLearningCurveTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 13, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, data_splits_url: str | None = None, task_id: int | None = None, evaluation_measure: str | None = None, class_labels: list[str] | None = None, cost_matrix: ndarray | None = None)
Bases: OpenMLClassificationTask
OpenML Learning Curve object.
Parameters#
task_type_id : TaskType ID of the Learning Curve task. task_type : str Name of the Learning Curve task. data_set_id : int ID of the dataset that this task is associated with. target_name : str Name of the target feature in the dataset. estimation_procedure_id : int, default=None ID of the estimation procedure to use for evaluating models. estimation_procedure_type : str, default=None Type of the estimation procedure. estimation_parameters : dict, default=None Additional parameters for the estimation procedure. data_splits_url : str, default=None URL of the file containing the data splits for Learning Curve task. task_id : Union[int, None] ID of the Learning Curve task. evaluation_measure : str, default=None Name of the evaluation measure to use for evaluating models. class_labels : list of str, default=None Class labels for Learning Curve tasks. cost_matrix : numpy array, default=None Cost matrix for Learning Curve tasks.
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
Returns#
tuple - X and y
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLRegressionTask
#
OpenMLRegressionTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 7, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, data_splits_url: str | None = None, task_id: int | None = None, evaluation_measure: str | None = None)
Bases: OpenMLSupervisedTask
OpenML Regression object.
Parameters#
task_type_id : TaskType Task type ID of the OpenML Regression task. task_type : str Task type of the OpenML Regression task. data_set_id : int ID of the OpenML dataset. target_name : str Name of the target feature used in the Regression task. estimation_procedure_id : int, default=None ID of the OpenML estimation procedure. estimation_procedure_type : str, default=None Type of the OpenML estimation procedure. estimation_parameters : dict, default=None Parameters used by the OpenML estimation procedure. data_splits_url : str, default=None URL of the OpenML data splits for the Regression task. task_id : Union[int, None] ID of the OpenML Regression task. evaluation_measure : str, default=None Evaluation measure used in the Regression task.
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
Returns#
tuple - X and y
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLSplit
#
OpenMLSplit(name: int | str, description: str, split: dict[int, dict[int, dict[int, tuple[ndarray, ndarray]]]])
OpenML Split object.
This class manages train-test splits for a dataset across multiple repetitions, folds, and samples.
Parameters#
name : int or str The name or ID of the split. description : str A description of the split. split : dict A dictionary containing the splits organized by repetition, fold, and sample.
Source code in openml/tasks/split.py
get
#
Returns the specified data split from the CrossValidationSplit object.
Parameters#
repeat : int Index of the repeat to retrieve. fold : int Index of the fold to retrieve. sample : int Index of the sample to retrieve.
Returns#
numpy.ndarray The data split for the specified repeat, fold, and sample.
Raises#
ValueError If the specified repeat, fold, or sample is not known.
Source code in openml/tasks/split.py
OpenMLSupervisedTask
#
OpenMLSupervisedTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 1, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, evaluation_measure: str | None = None, data_splits_url: str | None = None, task_id: int | None = None)
Bases: OpenMLTask
, ABC
OpenML Supervised Classification object.
Parameters#
task_type_id : TaskType ID of the task type. task_type : str Name of the task type. data_set_id : int ID of the OpenML dataset associated with the task. target_name : str Name of the target feature (the class variable). estimation_procedure_id : int, default=None ID of the estimation procedure for the task. estimation_procedure_type : str, default=None Type of the estimation procedure for the task. estimation_parameters : dict, default=None Estimation parameters for the task. evaluation_measure : str, default=None Name of the evaluation measure for the task. data_splits_url : str, default=None URL of the data splits for the task. task_id: Union[int, None] Refers to the unique identifier of task.
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
Returns#
tuple - X and y
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLTask
#
OpenMLTask(task_id: int | None, task_type_id: TaskType, task_type: str, data_set_id: int, estimation_procedure_id: int = 1, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, evaluation_measure: str | None = None, data_splits_url: str | None = None)
Bases: OpenMLBase
OpenML Task object.
Parameters#
task_id: Union[int, None] Refers to the unique identifier of OpenML task. task_type_id: TaskType Refers to the type of OpenML task. task_type: str Refers to the OpenML task. data_set_id: int Refers to the data. estimation_procedure_id: int Refers to the type of estimates used. estimation_procedure_type: str, default=None Refers to the type of estimation procedure used for the OpenML task. estimation_parameters: [Dict[str, str]], default=None Estimation parameters used for the OpenML task. evaluation_measure: str, default=None Refers to the evaluation measure. data_splits_url: str, default=None Refers to the URL of the data splits used for the OpenML task.
Source code in openml/tasks/task.py
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset
.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
remove_tag
#
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
TaskType
#
Bases: Enum
Possible task types as defined in OpenML.
create_task
#
create_task(task_type: TaskType, dataset_id: int, estimation_procedure_id: int, target_name: str | None = None, evaluation_measure: str | None = None, **kwargs: Any) -> OpenMLClassificationTask | OpenMLRegressionTask | OpenMLLearningCurveTask | OpenMLClusteringTask
Create a task based on different given attributes.
Builds a task object with the function arguments as attributes. The type of the task object built is determined from the task type id. More information on how the arguments (task attributes), relate to the different possible tasks can be found in the individual task objects at the openml.tasks.task module.
Parameters#
task_type : TaskType Id of the task type. dataset_id : int The id of the dataset for the task. target_name : str, optional The name of the feature used as a target. At the moment, only optional for the clustering tasks. estimation_procedure_id : int The id of the estimation procedure. evaluation_measure : str, optional The name of the evaluation measure. kwargs : dict, optional Other task attributes that are not mandatory for task upload.
Returns#
OpenMLClassificationTask, OpenMLRegressionTask, OpenMLLearningCurveTask, OpenMLClusteringTask
Source code in openml/tasks/functions.py
delete_task
#
Delete task with id task_id
from the OpenML server.
You can only delete tasks which you created and have no runs associated with them.
Parameters#
task_id : int OpenML id of the task
Returns#
bool True if the deletion was successful. False otherwise.
Source code in openml/tasks/functions.py
get_task
#
get_task(task_id: int, download_splits: bool = False, **get_dataset_kwargs: Any) -> OpenMLTask
Download OpenML task for a given task ID.
Downloads the task representation.
Use the download_splits
parameter to control whether the splits are downloaded.
Moreover, you may pass additional parameter (args or kwargs) that are passed to
:meth:openml.datasets.get_dataset
.
Parameters#
task_id : int
The OpenML task id of the task to download.
download_splits: bool (default=False)
Whether to download the splits as well.
get_dataset_kwargs :
Args and kwargs can be used pass optional parameters to :meth:openml.datasets.get_dataset
.
Returns#
task: OpenMLTask
Source code in openml/tasks/functions.py
get_tasks
#
get_tasks(task_ids: list[int], download_data: bool | None = None, download_qualities: bool | None = None) -> list[OpenMLTask]
Download tasks.
This function iterates :meth:openml.tasks.get_task
.
Parameters#
task_ids : List[int] A list of task ids to download. download_data : bool (default = True) Option to trigger download of data along with the meta data. download_qualities : bool (default=True) Option to download 'qualities' meta-data in addition to the minimal dataset description.
Returns#
list
Source code in openml/tasks/functions.py
list_tasks
#
list_tasks(task_type: TaskType | None = None, offset: int | None = None, size: int | None = None, tag: str | None = None, data_tag: str | None = None, status: str | None = None, data_name: str | None = None, data_id: int | None = None, number_instances: int | None = None, number_features: int | None = None, number_classes: int | None = None, number_missing_values: int | None = None) -> DataFrame
Return a number of tasks having the given tag and task_type
Parameters#
Filter task_type is separated from the other filters because it is used as task_type in the task description, but it is named type when used as a filter in list tasks call. offset : int, optional the number of tasks to skip, starting from the first task_type : TaskType, optional Refers to the type of task. size : int, optional the maximum number of tasks to show tag : str, optional the tag to include data_tag : str, optional the tag of the dataset data_id : int, optional status : str, optional data_name : str, optional number_instances : int, optional number_features : int, optional number_classes : int, optional number_missing_values : int, optional
Returns#
dataframe All tasks having the given task_type and the give tag. Every task is represented by a row in the data frame containing the following information as columns: task id, dataset id, task_type and status. If qualities are calculated for the associated dataset, some of these are also returned.