Writing Tests
tl;dr:
- Setting up the py_api
fixture to test directly against a REST API endpoint is really slow, only use it for migration/integration tests.
- Getting a database fixture and doing a database call is slow, consider mocking if appropriate.
Overhead from Fixtures
Sometimes, you want to interact with the REST API through the py_api
fixture,
or want access to a database with user_test
or expdb_test
fixtures.
Be warned that these come with considerable relative overhead, which adds up when running thousands of tests.
@pytest.mark.parametrize('execution_number', range(5000))
def test_private_dataset_owner_access(
execution_number,
expdb_test: Connection,
user_test: Connection,
py_api: TestClient,
) -> None:
fetch_user(ApiKey.REGULAR_USER, user_test) # accesses only the user db
get_estimation_procedures(expdb_test) # accesses only the experiment db
py_api.get("/does/not/exist") # only queries the api
pass
When individually adding/removing components, we measure (for 5000 repeats, n=1):
expdb | user | api | exp call | user call | api get | time (s) |
---|---|---|---|---|---|---|
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | 1.78 |
✅ | ❌ | ❌ | ❌ | ❌ | ❌ | 3.45 |
❌ | ✅ | ❌ | ❌ | ❌ | ❌ | 3.22 |
❌ | ❌ | ✅ | ❌ | ❌ | ❌ | 298.48 |
✅ | ✅ | ❌ | ❌ | ❌ | ❌ | 4.44 |
✅ | ✅ | ✅ | ❌ | ❌ | ❌ | 285.69 |
✅ | ❌ | ❌ | ✅ | ❌ | ❌ | 4.91 |
❌ | ✅ | ❌ | ❌ | ✅ | ❌ | 5.81 |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 307.91 |
Adding a fixture that just returns some value adds only minimal overhead (1.91s), so the burden comes from establishing the database connection itself.
We make the following observations:
- Adding a database fixture adds the same overhead as instantiating an entirely new test.
- Overhead of adding multiple database fixtures is not free
- The
py_api
fixture adds two orders of magnitude more overhead
We want our tests to be fast, so we want to avoid using these fixtures when we reasonably can.
We restrict usage of py_api
fixtures to integration/migration tests, since it is very slow.
These only run on CI before merges.
For database fixtures
We will write some fixtures that can be used to e.g., get a User
without accessing the database.
The validity of these users will be tested against the database in only a single test.
Mocking
Mocking can help us reduce the reliance on database connections in tests. A mocked function can prevent accessing the database, and instead return a predefined value instead.
It has a few upsides: - It's faster than using a database fixture (see below). - The test is not dependent on the database: you can run the test without a database.
But it also has downsides: - Behavior changes in the database, such as schema changes, are not automatically reflected in the tests. - The database layer (e.g., queries) are not actually tested.
Basically, the mocked behavior may not match real behavior when executed on a database. For this reason, for each mocked entity, we should add a test that verifies that if the database layer is invoked with the database, it returns the expected output that matches the mock. This is additional overhead in development, but hopefully it pays back in more granular test feedback and faster tests.
On the speed of mocks, consider these two tests:
@pytest.mark.parametrize('execution_number', range(5000))
def test_private_dataset_owner_access(
execution_number,
admin,
+ mocker,
- expdb_test: Connection,
) -> None:
+ mock = mocker.patch('database.datasets.get')
+ class Dataset(NamedTuple):
+ uploader: int
+ visibility: Visibility
+ mock.return_value = Dataset(uploader=1, visibility=Visibility.PRIVATE)
_get_dataset_raise_otherwise(
dataset_id=1,
user=admin,
- expdb=expdb_test,
+ expdb=None,
)