tasks
openml.tasks
#
OpenMLClassificationTask
#
OpenMLClassificationTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 1, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, evaluation_measure: str | None = None, data_splits_url: str | None = None, task_id: int | None = None, class_labels: list[str] | None = None, cost_matrix: ndarray | None = None)
Bases: OpenMLSupervisedTask
OpenML Classification object.
| PARAMETER | DESCRIPTION |
|---|---|
task_type_id
|
ID of the Classification task type.
TYPE:
|
task_type
|
Name of the Classification task type.
TYPE:
|
data_set_id
|
ID of the OpenML dataset associated with the Classification task.
TYPE:
|
target_name
|
Name of the target variable.
TYPE:
|
estimation_procedure_id
|
ID of the estimation procedure for the Classification task.
TYPE:
|
estimation_procedure_type
|
Type of the estimation procedure.
TYPE:
|
estimation_parameters
|
Estimation parameters for the Classification task.
TYPE:
|
evaluation_measure
|
Name of the evaluation measure.
TYPE:
|
data_splits_url
|
URL of the data splits for the Classification task.
TYPE:
|
task_id
|
ID of the Classification task (if it already exists on OpenML).
TYPE:
|
class_labels
|
A list of class labels (for classification tasks).
TYPE:
|
cost_matrix
|
A cost matrix (for classification tasks).
TYPE:
|
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
| RETURNS | DESCRIPTION |
|---|---|
tuple - X and y
|
|
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
Annotates this entity with a tag on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
remove_tag
#
Removes a tag from this entity on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLClusteringTask
#
OpenMLClusteringTask(task_type_id: TaskType, task_type: str, data_set_id: int, estimation_procedure_id: int = 17, task_id: int | None = None, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, data_splits_url: str | None = None, evaluation_measure: str | None = None, target_name: str | None = None)
Bases: OpenMLTask
OpenML Clustering object.
| PARAMETER | DESCRIPTION |
|---|---|
task_type_id
|
Task type ID of the OpenML clustering task.
TYPE:
|
task_type
|
Task type of the OpenML clustering task.
TYPE:
|
data_set_id
|
ID of the OpenML dataset used in clustering the task.
TYPE:
|
estimation_procedure_id
|
ID of the OpenML estimation procedure.
TYPE:
|
task_id
|
ID of the OpenML clustering task.
TYPE:
|
estimation_procedure_type
|
Type of the OpenML estimation procedure used in the clustering task.
TYPE:
|
estimation_parameters
|
Parameters used by the OpenML estimation procedure.
TYPE:
|
data_splits_url
|
URL of the OpenML data splits for the clustering task.
TYPE:
|
evaluation_measure
|
Evaluation measure used in the clustering task.
TYPE:
|
target_name
|
Name of the target feature (class) that is not part of the feature set for the clustering task.
TYPE:
|
Source code in openml/tasks/task.py
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X
#
Get data associated with the current task.
| RETURNS | DESCRIPTION |
|---|---|
The X data as a dataframe
|
|
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
Annotates this entity with a tag on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
remove_tag
#
Removes a tag from this entity on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLLearningCurveTask
#
OpenMLLearningCurveTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 13, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, data_splits_url: str | None = None, task_id: int | None = None, evaluation_measure: str | None = None, class_labels: list[str] | None = None, cost_matrix: ndarray | None = None)
Bases: OpenMLClassificationTask
OpenML Learning Curve object.
| PARAMETER | DESCRIPTION |
|---|---|
task_type_id
|
ID of the Learning Curve task.
TYPE:
|
task_type
|
Name of the Learning Curve task.
TYPE:
|
data_set_id
|
ID of the dataset that this task is associated with.
TYPE:
|
target_name
|
Name of the target feature in the dataset.
TYPE:
|
estimation_procedure_id
|
ID of the estimation procedure to use for evaluating models.
TYPE:
|
estimation_procedure_type
|
Type of the estimation procedure.
TYPE:
|
estimation_parameters
|
Additional parameters for the estimation procedure.
TYPE:
|
data_splits_url
|
URL of the file containing the data splits for Learning Curve task.
TYPE:
|
task_id
|
ID of the Learning Curve task.
TYPE:
|
evaluation_measure
|
Name of the evaluation measure to use for evaluating models.
TYPE:
|
class_labels
|
Class labels for Learning Curve tasks.
TYPE:
|
cost_matrix
|
Cost matrix for Learning Curve tasks.
TYPE:
|
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
| RETURNS | DESCRIPTION |
|---|---|
tuple - X and y
|
|
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
Annotates this entity with a tag on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
remove_tag
#
Removes a tag from this entity on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLRegressionTask
#
OpenMLRegressionTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 7, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, data_splits_url: str | None = None, task_id: int | None = None, evaluation_measure: str | None = None)
Bases: OpenMLSupervisedTask
OpenML Regression object.
| PARAMETER | DESCRIPTION |
|---|---|
task_type_id
|
Task type ID of the OpenML Regression task.
TYPE:
|
task_type
|
Task type of the OpenML Regression task.
TYPE:
|
data_set_id
|
ID of the OpenML dataset.
TYPE:
|
target_name
|
Name of the target feature used in the Regression task.
TYPE:
|
estimation_procedure_id
|
ID of the OpenML estimation procedure.
TYPE:
|
estimation_procedure_type
|
Type of the OpenML estimation procedure.
TYPE:
|
estimation_parameters
|
Parameters used by the OpenML estimation procedure.
TYPE:
|
data_splits_url
|
URL of the OpenML data splits for the Regression task.
TYPE:
|
task_id
|
ID of the OpenML Regression task.
TYPE:
|
evaluation_measure
|
Evaluation measure used in the Regression task.
TYPE:
|
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
| RETURNS | DESCRIPTION |
|---|---|
tuple - X and y
|
|
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
Annotates this entity with a tag on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
remove_tag
#
Removes a tag from this entity on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLSplit
#
OpenMLSplit(name: int | str, description: str, split: dict[int, dict[int, dict[int, tuple[ndarray, ndarray]]]])
OpenML Split object.
This class manages train-test splits for a dataset across multiple repetitions, folds, and samples.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
The name or ID of the split.
TYPE:
|
description
|
A description of the split.
TYPE:
|
split
|
A dictionary containing the splits organized by repetition, fold, and sample.
TYPE:
|
Source code in openml/tasks/split.py
get
#
Returns the specified data split from the CrossValidationSplit object.
| PARAMETER | DESCRIPTION |
|---|---|
repeat
|
Index of the repeat to retrieve.
TYPE:
|
fold
|
Index of the fold to retrieve.
TYPE:
|
sample
|
Index of the sample to retrieve.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
The data split for the specified repeat, fold, and sample. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the specified repeat, fold, or sample is not known. |
Source code in openml/tasks/split.py
OpenMLSupervisedTask
#
OpenMLSupervisedTask(task_type_id: TaskType, task_type: str, data_set_id: int, target_name: str, estimation_procedure_id: int = 1, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, evaluation_measure: str | None = None, data_splits_url: str | None = None, task_id: int | None = None)
Bases: OpenMLTask, ABC
OpenML Supervised Classification object.
| PARAMETER | DESCRIPTION |
|---|---|
task_type_id
|
ID of the task type.
TYPE:
|
task_type
|
Name of the task type.
TYPE:
|
data_set_id
|
ID of the OpenML dataset associated with the task.
TYPE:
|
target_name
|
Name of the target feature (the class variable).
TYPE:
|
estimation_procedure_id
|
ID of the estimation procedure for the task.
TYPE:
|
estimation_procedure_type
|
Type of the estimation procedure for the task.
TYPE:
|
estimation_parameters
|
Estimation parameters for the task.
TYPE:
|
evaluation_measure
|
Name of the evaluation measure for the task.
TYPE:
|
data_splits_url
|
URL of the data splits for the task.
TYPE:
|
task_id
|
Refers to the unique identifier of task.
TYPE:
|
Source code in openml/tasks/task.py
estimation_parameters
property
writable
#
Return the estimation parameters for the task.
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_X_and_y
#
Get data associated with the current task.
| RETURNS | DESCRIPTION |
|---|---|
tuple - X and y
|
|
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
Annotates this entity with a tag on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
remove_tag
#
Removes a tag from this entity on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
OpenMLTask
#
OpenMLTask(task_id: int | None, task_type_id: TaskType, task_type: str, data_set_id: int, estimation_procedure_id: int = 1, estimation_procedure_type: str | None = None, estimation_parameters: dict[str, str] | None = None, evaluation_measure: str | None = None, data_splits_url: str | None = None)
Bases: OpenMLBase
OpenML Task object.
| PARAMETER | DESCRIPTION |
|---|---|
task_id
|
Refers to the unique identifier of OpenML task.
TYPE:
|
task_type_id
|
Refers to the type of OpenML task.
TYPE:
|
task_type
|
Refers to the OpenML task.
TYPE:
|
data_set_id
|
Refers to the data.
TYPE:
|
estimation_procedure_id
|
Refers to the type of estimates used.
TYPE:
|
estimation_procedure_type
|
Refers to the type of estimation procedure used for the OpenML task.
TYPE:
|
estimation_parameters
|
Estimation parameters used for the OpenML task.
TYPE:
|
evaluation_measure
|
Refers to the evaluation measure.
TYPE:
|
data_splits_url
|
Refers to the URL of the data splits used for the OpenML task.
TYPE:
|
Source code in openml/tasks/task.py
openml_url
property
#
The URL of the object on the server, if it was uploaded, else None.
download_split
#
download_split() -> OpenMLSplit
Download the OpenML split for a given task.
Source code in openml/tasks/task.py
get_dataset
#
get_dataset(**kwargs: Any) -> OpenMLDataset
Download dataset associated with task.
Accepts the same keyword arguments as the openml.datasets.get_dataset.
get_split_dimensions
#
Get the (repeats, folds, samples) of the split for a given task.
Source code in openml/tasks/task.py
get_train_test_split_indices
#
get_train_test_split_indices(fold: int = 0, repeat: int = 0, sample: int = 0) -> tuple[ndarray, ndarray]
Get the indices of the train and test splits for a given task.
Source code in openml/tasks/task.py
open_in_browser
#
Opens the OpenML web page corresponding to this object in your default browser.
Source code in openml/base.py
publish
#
publish() -> OpenMLBase
Publish the object on the OpenML server.
Source code in openml/base.py
push_tag
#
Annotates this entity with a tag on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
remove_tag
#
Removes a tag from this entity on the server.
| PARAMETER | DESCRIPTION |
|---|---|
tag
|
Tag to attach to the flow.
TYPE:
|
url_for_id
classmethod
#
Return the OpenML URL for the object of the class entity with the given id.
TaskType
#
Bases: Enum
Possible task types as defined in OpenML.
create_task
#
create_task(task_type: TaskType, dataset_id: int, estimation_procedure_id: int, target_name: str | None = None, evaluation_measure: str | None = None, **kwargs: Any) -> OpenMLClassificationTask | OpenMLRegressionTask | OpenMLLearningCurveTask | OpenMLClusteringTask
Create a task based on different given attributes.
Builds a task object with the function arguments as attributes. The type of the task object built is determined from the task type id. More information on how the arguments (task attributes), relate to the different possible tasks can be found in the individual task objects at the openml.tasks.task module.
| PARAMETER | DESCRIPTION |
|---|---|
task_type
|
Id of the task type.
TYPE:
|
dataset_id
|
The id of the dataset for the task.
TYPE:
|
target_name
|
The name of the feature used as a target. At the moment, only optional for the clustering tasks.
TYPE:
|
estimation_procedure_id
|
The id of the estimation procedure.
TYPE:
|
evaluation_measure
|
The name of the evaluation measure.
TYPE:
|
kwargs
|
Other task attributes that are not mandatory for task upload.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
(OpenMLClassificationTask, OpenMLRegressionTask)
|
|
(OpenMLLearningCurveTask, OpenMLClusteringTask)
|
|
Source code in openml/tasks/functions.py
delete_task
#
Delete task with id task_id from the OpenML server.
You can only delete tasks which you created and have no runs associated with them.
| PARAMETER | DESCRIPTION |
|---|---|
task_id
|
OpenML id of the task
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the deletion was successful. False otherwise. |
Source code in openml/tasks/functions.py
get_task
#
get_task(task_id: int, download_splits: bool = False, **get_dataset_kwargs: Any) -> OpenMLTask
Download OpenML task for a given task ID.
Downloads the task representation.
Use the download_splits parameter to control whether the splits are downloaded.
Moreover, you may pass additional parameter (args or kwargs) that are passed to
:meth:openml.datasets.get_dataset.
| PARAMETER | DESCRIPTION |
|---|---|
task_id
|
The OpenML task id of the task to download.
TYPE:
|
download_splits
|
Whether to download the splits as well.
TYPE:
|
get_dataset_kwargs
|
Args and kwargs can be used pass optional parameters to :meth:
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
task
|
TYPE:
|
Source code in openml/tasks/functions.py
get_tasks
#
get_tasks(task_ids: list[int], download_data: bool | None = None, download_qualities: bool | None = None) -> list[OpenMLTask]
Download tasks.
This function iterates :meth:openml.tasks.get_task.
| PARAMETER | DESCRIPTION |
|---|---|
task_ids
|
A list of task ids to download.
TYPE:
|
download_data
|
Option to trigger download of data along with the meta data.
TYPE:
|
download_qualities
|
Option to download 'qualities' meta-data in addition to the minimal dataset description.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
|
Source code in openml/tasks/functions.py
list_tasks
#
list_tasks(task_type: TaskType | None = None, offset: int | None = None, size: int | None = None, tag: str | None = None, data_tag: str | None = None, status: str | None = None, data_name: str | None = None, data_id: int | None = None, number_instances: int | None = None, number_features: int | None = None, number_classes: int | None = None, number_missing_values: int | None = None) -> DataFrame
Return a number of tasks having the given tag and task_type
| PARAMETER | DESCRIPTION |
|---|---|
Filter
|
|
it
|
|
type
|
|
offset
|
the number of tasks to skip, starting from the first
TYPE:
|
task_type
|
Refers to the type of task.
TYPE:
|
size
|
the maximum number of tasks to show
TYPE:
|
tag
|
the tag to include
TYPE:
|
data_tag
|
the tag of the dataset
TYPE:
|
data_id
|
TYPE:
|
status
|
TYPE:
|
data_name
|
TYPE:
|
number_instances
|
TYPE:
|
number_features
|
TYPE:
|
number_classes
|
TYPE:
|
number_missing_values
|
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dataframe
|
All tasks having the given task_type and the give tag. Every task is represented by a row in the data frame containing the following information as columns: task id, dataset id, task_type and status. If qualities are calculated for the associated dataset, some of these are also returned. |