Obtaining Flow IDs¶

This tutorial discusses different ways to obtain the ID of a flow in order to perform further analysis.

# License: BSD 3-Clause

import sklearn.tree

import openml

Warning

This example uploads data. For that reason, this example connects to the test server at test.openml.org. This prevents the main server from crowding with example datasets, tasks, runs, and so on. The use of this test server can affect behaviour and performance of the OpenML-Python API.

openml.config.start_using_configuration_for_example()

/home/runner/work/openml-python/openml-python/examples/30_extended/flow_id_tutorial.py:22: UserWarning: Switching to the test server https://test.openml.org/api/v1/xml to not upload results to the live server. Using the test server may result in reduced performance of the API!
  openml.config.start_using_configuration_for_example()

Defining a classifier

clf = sklearn.tree.DecisionTreeClassifier()

1. Obtaining a flow given a classifier¶

flow = openml.extensions.get_extension_by_model(clf).model_to_flow(clf).publish()
flow_id = flow.flow_id
print(flow_id)

This piece of code is rather involved. First, it retrieves a Extension which is registered and can handle the given model, in our case it is openml.extensions.sklearn.SklearnExtension. Second, the extension converts the classifier into an instance of openml.OpenMLFlow. Third and finally, the publish method checks whether the current flow is already present on OpenML. If not, it uploads the flow, otherwise, it updates the current instance with all information computed by the server (which is obviously also done when uploading/publishing a flow).

To simplify the usage we have created a helper function which automates all these steps:

flow_id = openml.flows.get_flow_id(model=clf)
print(flow_id)

2. Obtaining a flow given its name¶

The schema of a flow is given in XSD (here). # noqa E501 Only two fields are required, a unique name, and an external version. While it should be pretty obvious why we need a name, the need for the additional external version information might not be immediately clear. However, this information is very important as it allows to have multiple flows with the same name for different versions of a software. This might be necessary if an algorithm or implementation introduces, renames or drop hyperparameters over time.

print(flow.name, flow.external_version)

sklearn.tree._classes.DecisionTreeClassifier openml==0.15.0,sklearn==1.3.2

The name and external version are automatically added to a flow when constructing it from a model. We can then use them to retrieve the flow id as follows:

flow_id = openml.flows.flow_exists(name=flow.name, external_version=flow.external_version)
print(flow_id)

We can also retrieve all flows for a given name:

flow_ids = openml.flows.get_flow_id(name=flow.name)
print(flow_ids)

[20, 28, 33, 37, 41, 45, 50, 77, 811]

This also works with the actual model (generalizing the first part of this example):

flow_ids = openml.flows.get_flow_id(model=clf, exact_version=False)
print(flow_ids)

# Deactivating test server
openml.config.stop_using_configuration_for_example()

[20, 28, 33, 37, 41, 45, 50, 77, 811]

Total running time of the script: (0 minutes 4.774 seconds)

Gallery generated by Sphinx-Gallery