Obtaining Flow IDs

This tutorial discusses different ways to obtain the ID of a flow in order to perform further analysis.

# License: BSD 3-Clause

import sklearn.tree

import openml

clf = sklearn.tree.DecisionTreeClassifier()

1. Obtaining a flow given a classifier

flow = openml.extensions.get_extension_by_model(clf).model_to_flow(clf).publish()
flow_id = flow.flow_id
print(flow_id)

Out:

17347

This piece of code is rather involved. First, it retrieves a Extension which is registered and can handle the given model, in our case it is openml.extensions.sklearn.SklearnExtension. Second, the extension converts the classifier into an instance of openml.flow.OpenMLFlow. Third and finally, the publish method checks whether the current flow is already present on OpenML. If not, it uploads the flow, otherwise, it updates the current instance with all information computed by the server (which is obviously also done when uploading/publishing a flow).

To simplify the usage we have created a helper function which automates all these steps:

flow_id = openml.flows.get_flow_id(model=clf)
print(flow_id)

Out:

17347

2. Obtaining a flow given its name

The schema of a flow is given in XSD (here). # noqa E501 Only two fields are required, a unique name, and an external version. While it should be pretty obvious why we need a name, the need for the additional external version information might not be immediately clear. However, this information is very important as it allows to have multiple flows with the same name for different versions of a software. This might be necessary if an algorithm or implementation introduces, renames or drop hyperparameters over time.

print(flow.name, flow.external_version)

Out:

sklearn.tree.tree.DecisionTreeClassifier openml==0.10.2,sklearn==0.21.2

The name and external version are automatically added to a flow when constructing it from a model. We can then use them to retrieve the flow id as follows:

flow_id = openml.flows.flow_exists(name=flow.name, external_version=flow.external_version)
print(flow_id)

Out:

17347

We can also retrieve all flows for a given name:

flow_ids = openml.flows.get_flow_id(name=flow.name)
print(flow_ids)

Out:

[3404, 4074, 4834, 4853, 5498, 5502, 5863, 6841, 6941, 6996, 7115, 7124, 7164, 7186, 7278, 7669, 7672, 7693, 7718, 7760, 8426, 8565, 8635, 8697, 8699, 8707, 8754, 8783, 8805, 8833, 8847, 8906, 9062, 9353, 9543, 9614, 9675, 9691, 9703, 12125, 12482, 12739, 12789, 13297, 14716, 15222, 16165, 17279, 17298, 17313, 17328, 17332, 17333, 17334, 17347, 17349, 17350, 17454, 17468]

This also works with the actual model (generalizing the first part of this example):

flow_ids = openml.flows.get_flow_id(model=clf, exact_version=False)
print(flow_ids)

Out:

[3404, 4074, 4834, 4853, 5498, 5502, 5863, 6841, 6941, 6996, 7115, 7124, 7164, 7186, 7278, 7669, 7672, 7693, 7718, 7760, 8426, 8565, 8635, 8697, 8699, 8707, 8754, 8783, 8805, 8833, 8847, 8906, 9062, 9353, 9543, 9614, 9675, 9691, 9703, 12125, 12482, 12739, 12789, 13297, 14716, 15222, 16165, 17279, 17298, 17313, 17328, 17332, 17333, 17334, 17347, 17349, 17350, 17454, 17468]

Total running time of the script: ( 0 minutes 6.657 seconds)

Gallery generated by Sphinx-Gallery