Client API Standards
This page defines a minimal standard to adhere in programming APIs.
Configuration file¶
The configuration file resides in a directory .openml
in the home directory of the user and is called config. It consists of key = value
pairs which are seperated by newlines. The following keys are defined:
- apikey:
- required to access the server
- server:
- default:
http://www.openml.org
- default:
- verbosity:
- 0: normal output
- 1: info output
- 2: debug output
- cachedir:
- if not given, will default to
file.path(tempdir(), "cache")
.
- if not given, will default to
- arff.reader:
RWeka
: This is the standard Java parser used in Weka.farff
: The farff package lives below the mlr-org and is a newer, faster parser without Java.
Caching¶
Cache invalidation¶
All parts of the entities which affect experiments are immutable. The entities dataset and task have a flag status
which tells the user whether they can be used safely.
File structure¶
Caching should be implemented for
- datasets
- tasks
- splits
- predictions
and further entities might follow in the future. The cache directory $cache
should be specified by the user when invoking the API. The structure in the cache directory should be as following:
- One directory for the following entities:
$cache/datasets
$cache/tasks
$cache/runs
- For every dataset there is an extra directory for which the name is the dataset ID, e.g.
$cache/datasets/2
for the dataset anneal.ORIG- The dataset should be called
dataset.arff
- Every other file should be named by the API call which was used to obtain it. The XML returned by invoking
openml.data.qualities
should therefore be called qualities.xml.
- The dataset should be called
- For every task there is an extra directory for which the name is the task ID, e.g.
$cache/tasks/1
- The task file should be called
task.xml
. - The splits accompanying a task are stored in a file
datasplits.arff
.
- The task file should be called
- For every run there is an extra directory for which the name is the run ID, e.g.
$cache/run/1
- The predictions should be called
predictions.arff
.
- The predictions should be called