Setting up the development environment
First, follow the installation instructions for contributors to install a local fork with optional development dependencies. Stop when you reach the section "Setting up a Database Server".
Pre-commit
We use pre-commit to ensure certain tools and checks are
ran before each commit. These tools perform code-formatting, linting, and more. This
makes the code more consistent across the board, which makes it easier to work with
each other's code and also can catch common errors. After installing it, it will
automatically run when you try to make a commit. Install it now and verify that all
checks pass out of the box:
pre-commit install
pre-commit run --all-files
Docker
With the projected forked, cloned, and installed, the easiest way to set up all
required services for development is through docker compose.
Starting containers
docker compose --profile all up -d
docker compose -f compose.yaml -f compose.ports.yaml --profile all up -d
This will spin up 5 services, as defined in the compose.yaml file:
database: this is a mysql database prepopulated with test data. By default it is configured to have a root user with password"ok".docs: this container serves project documentation atlocalhost:8000. These pages are built from the documents in thedocs/directory of this repository, whenever you edit and save a file there, the page will immediately be updated.elasticsearch: Elasticsearch, required for the PHP REST API to function.php-api: this container serves the old PHP REST API atlocalhost:8002. For example, visit http://localhost:8002/api/v1/json/data/1 to fetch a JSON description of dataset 1.python-api: this container serves the new Python-based REST API atlocalhost:8001. For example, visit http://localhost:8001/docs to see the REST API documentation. Changes to the code insrc/will be reflected in this container.
Exposing ports to the host network isn't needed for development, but may be useful to inspect responses directly from the host machine.
Note
On arm-based Macs, you need to enable Rosetta emulation for Docker for the Elasticsearch container to work.
We can now run the full test suite, which takes about 4 minutes:
docker compose exec python-api python -m pytest tests
php_api: all tests that require the PHP API container. These are tests whichpython_api: all tests that require the Python API container. That's almost all of them.slow: for long-running tests. Currently only one test.
In many cases during development it's sufficient to either run with not php_api and not slow when initially adding the endpoint and implementing its response, or later php_api and not slow when working on the 'migration' tests that validate against the old PHP API.
The not slow is only needed if a slow test would be included in your test selection. In many cases, you might prefer to only run the specific tests (or test modules) that you are working on and excluding it through markers may be unnecessary.
Examples:
docker compose exec python-api python -m pytest tests -m "not php_api and not slow", here the test selection is made primarily through markers. This command takes a few seconds.docker compose exec python-api python -m pytest tests/routers/openml/dataset_tag_test.py, here the test selection is made through specifying the file with tests. Since this test file naturally includes neither migration tests (intests/routers/openml/migration) nor the slow test (attests/routers/openml/datasets_list_datasets_test.py), excluding tests through markers is unnecessary. This command takes a few seconds.
You don't always need every container, often just having a database and the Python-based REST API may be enough. In that case, only specify those services:
docker compose up python-api -d
Refer to the docker compose documentation for more uses.
Note
We are working on making it easy to run tests from your local shell instead of the container (#232). This will likely be limited to the tests that do not need the PHP API. Our CI pipeline runs all tests.
Connecting to containers
To connect to a container of a service, run:
docker compose exec SERVICE_NAME /bin/bash
where SERVICE_NAME is the name of the service. If you are unsure of the service
name, then docker compose ps may help you find it. Assuming the default service
names are used, you may connect to the Python-based REST API container using:
docker compose exec python-api /bin/bash
This is useful, for example, to run unit tests in the container:
python -m pytest -x -v -m "not php_api"
Running Unit tests
Our unit tests are written with the pytest framework.
An invocation could look like this:
python -m pytest -v -x --lf -m "not php_api"
Where -v shows the name of each test run, -x ensures testing stops on first failure,
--lf will first run the test(s) which failed last, and -m "not php_api" specifies
which tests (not) to run (in this case, the tests that check against the PHP API).
The directory structure of our tests follows the structure of the src/ directory.
For files, we follow the convention of appending _test.
Try to keep tests as small as possible, and only rely on database and/or web connections
when absolutely necessary.
Instructions are incomplete. Please have patience while we are adding more documentation.
YAML validation
The project contains various yaml files, for example to configure
mkdocs or to describe Github workflows. For these yaml files we can configure
automatic schema validation, to ensure that the files are valid without having to run
the server. This also helps with other IDE features like autocomplete. Setting this
up is not required, but incredibly useful if you do need to edit these files.
The following yaml files have schemas:
| File(s) | Schema URL |
|---|---|
| mkdocs.yml | https://squidfunk.github.io/mkdocs-material/schema.json |
| .pre-commit-config.yaml | https://json.schemastore.org/pre-commit-config.json |
| .github/workflows/*.yaml | https://json.schemastore.org/github-workflow |
In PyCharm, these can be configured from settings > languages & frameworks >
Schemas and DTDs > JSON Schema Mappings. There, add mappings per file or
file pattern.
In VSCode, these can be configured from settings > Extetions >
JSON > Edit in settings.json. There, add mappings per file or
file pattern. For example:
"json.schemas": [
{
"fileMatch": [
"/myfile"
],
"url": "schemaURL"
}
]
Connecting to another database
In addition to the database setup described in the installation guide, we also host a database on our server which may be connected to that is available to OpenML core contributors. If you are a core contributor and need access, please reach out to one of the engineers in Eindhoven.
Instructions are incomplete. Please have patience while we are adding more documentation.