Flows and Runs

A simple tutorial on how to train/run a model and how to upload the results.

# License: BSD 3-Clause

import openml
from sklearn import ensemble, neighbors

Train a machine learning model

Warning

This example uploads data. For that reason, this example connects to the test server at test.openml.org. This prevents the main server from crowding with example datasets, tasks, runs, and so on.

openml.config.start_using_configuration_for_example()

# NOTE: We are using dataset 20 from the test server: https://test.openml.org/d/20
dataset = openml.datasets.get_dataset(20)
X, y, categorical_indicator, attribute_names = dataset.get_data(
    dataset_format='array',
    target=dataset.default_target_attribute
)
clf = neighbors.KNeighborsClassifier(n_neighbors=3)
clf.fit(X, y)

Out:

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=3, p=2,
                     weights='uniform')

Running a model on a task

task = openml.tasks.get_task(119)
clf = ensemble.RandomForestClassifier()
run = openml.runs.run_model_on_task(clf, task)
print(run)

Out:

/home/travis/miniconda/envs/testenv/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
OpenML Run
==========
Uploader Name: None
Metric.......: None
Run ID.......: None
Task ID......: 119
Task Type....: None
Task URL.....: https://www.openml.org/t/119
Flow ID......: 23457
Flow Name....: sklearn.ensemble.forest.RandomForestClassifier
Flow URL.....: https://www.openml.org/f/23457
Setup ID.....: None
Setup String.: Python_3.7.6. Sklearn_0.21.2. NumPy_1.18.1. SciPy_1.4.1. RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=None, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators='warn',
                       n_jobs=None, oob_score=False, random_state=15881,
                       verbose=0, warm_start=False)
Dataset ID...: 20
Dataset URL..: https://www.openml.org/d/20

Publishing the run

myrun = run.publish()
print("Run was uploaded to http://test.openml.org/r/" + str(myrun.run_id))
print("The flow can be found at http://test.openml.org/f/" + str(myrun.flow_id))

Out:

Run was uploaded to http://test.openml.org/r/37815
The flow can be found at http://test.openml.org/f/23457
openml.config.stop_using_configuration_for_example()

Total running time of the script: ( 0 minutes 6.757 seconds)

Gallery generated by Sphinx-Gallery