Note
Go to the end to download the full example code.
Introduction tutorial & Setup¶
An example how to set up OpenML-Python followed up by a simple example.
OpenML is an online collaboration platform for machine learning which allows you to:
Find or share interesting, well-documented datasets
Define research / modelling goals (tasks)
Explore large amounts of machine learning algorithms, with APIs in Java, R, Python
Log and share reproducible experiments, models, results
Works seamlessly with scikit-learn and other libraries
Large scale benchmarking, compare to state of the art
Installation¶
Installation is done via pip
:
pip install openml
For further information, please check out the installation guide at Installation & Set up.
Authentication¶
The OpenML server can only be accessed by users who have signed up on the OpenML platform. If you don’t have an account yet, sign up now. You will receive an API key, which will authenticate you to the server and allow you to download and upload datasets, tasks, runs and flows.
Create an OpenML account (free) on https://www.openml.org.
After logging in, open your account page (avatar on the top right)
Open ‘Account Settings’, then ‘API authentication’ to find your API key.
There are two ways to permanently authenticate:
Use the
openml
CLI tool withopenml configure apikey MYKEY
, replacing MYKEY with your API key.Create a plain text file ~/.openml/config with the line ‘apikey=MYKEY’, replacing MYKEY with your API key. The config file must be in the directory ~/.openml/config and exist prior to importing the openml module.
Alternatively, by running the code below and replacing ‘YOURKEY’ with your API key, you authenticate for the duration of the python process.
# License: BSD 3-Clause
import openml
from sklearn import neighbors
Warning
This example uploads data. For that reason, this example connects to the test server at test.openml.org. This prevents the main server from crowding with example datasets, tasks, runs, and so on. The use of this test server can affect behaviour and performance of the OpenML-Python API.
openml.config.start_using_configuration_for_example()
/home/runner/work/openml-python/openml-python/examples/20_basic/introduction_tutorial.py:68: UserWarning: Switching to the test server https://test.openml.org/api/v1/xml to not upload results to the live server. Using the test server may result in reduced performance of the API!
openml.config.start_using_configuration_for_example()
When using the main server instead, make sure your apikey is configured. This can be done with the following line of code (uncomment it!). Never share your apikey with others.
# openml.config.apikey = 'YOURKEY'
Caching¶
When downloading datasets, tasks, runs and flows, they will be cached to retrieve them without calling the server later. As with the API key, the cache directory can be either specified through the config file or through the API:
Add the line cachedir = ‘MYDIR’ to the config file, replacing ‘MYDIR’ with the path to the cache directory. By default, OpenML will use ~/.openml/cache as the cache directory.
Run the code below, replacing ‘YOURDIR’ with the path to the cache directory.
# Uncomment and set your OpenML cache directory
# import os
# openml.config.cache_directory = os.path.expanduser('YOURDIR')
Simple Example¶
Download the OpenML task for the eeg-eye-state.
task = openml.tasks.get_task(403)
data = openml.datasets.get_dataset(task.dataset_id)
clf = neighbors.KNeighborsClassifier(n_neighbors=5)
run = openml.runs.run_model_on_task(clf, task, avoid_duplicate_runs=False)
# Publish the experiment on OpenML (optional, requires an API key).
# For this tutorial, our configuration publishes to the test server
# as to not crowd the main server with runs created by examples.
myrun = run.publish()
print(f"kNN on {data.name}: {myrun.openml_url}")
kNN on eeg-eye-state: https://test.openml.org/r/3815
openml.config.stop_using_configuration_for_example()
Total running time of the script: (0 minutes 12.540 seconds)