openml Database
The openml database contains core platform tables for user management, file storage, access control, and community features.
users
Stores registered user accounts and their authentication details.
For a little while, the username and email were synonymous.
| Column | Type | Optional | Default | References | Description | Example |
|---|---|---|---|---|---|---|
| id | mediumint unsigned | No | auto_increment | Primary key. | 2 | |
| ip_address | varchar(45) | No | IP address at registration. | 127.0.0.1 | ||
| username | varchar(100) | No | Unique login name. | foo@bar.com | ||
| password | varchar(255) | No | Hashed password. | - | ||
| varchar(254) | No | Email address. | foo@bar.com | |||
| activation_selector | varchar(255) | Yes | NULL | Selector token for account activation. | ||
| activation_code | varchar(255) | Yes | NULL | Code for account activation. | ||
| forgotten_password_selector | varchar(255) | Yes | NULL | Selector token for password reset. | ||
| forgotten_password_code | varchar(255) | Yes | NULL | Code for password reset. | ||
| forgotten_password_time | int unsigned | Yes | NULL | Timestamp of password reset request. | ||
| remember_selector | varchar(255) | Yes | NULL | Selector token for "remember me" sessions. | ||
| remember_code | varchar(255) | Yes | NULL | Code for "remember me" sessions. | ||
| created_on | int unsigned | No | Unix timestamp of account creation. | 1363880450 | ||
| last_login | int unsigned | Yes | NULL | Unix timestamp of last login. | 1763344931 | |
| active | tinyint unsigned | Yes | NULL | Whether the account is activated through the confirmation email, 1 or 0. | 1 | |
| first_name | varchar(50) | Yes | NULL | User's first name. | Joaquin | |
| last_name | varchar(50) | Yes | NULL | User's last name. | van Rijn | |
| company | varchar(100) | No | Organization or affiliation. | OpenML | ||
| phone | varchar(20) | Yes | NULL | Phone number. Not in use. | 0000 | |
| country | varchar(50) | No | Country of residence. No input validation was done. | rfr | ||
| image | varchar(128) | Yes | NULL | Path to profile image. | https://www.openml.org/data//view/21794253/joa.jpeg | |
| bio | text | No | User biography. | "My wonderful bio" | ||
| core | enum('true','false') | No | 'false' | Whether the user is a core team member. | false | |
| external_source | varchar(50) | Yes | NULL | External authentication provider (e.g., OAuth). not in use | 0000 | |
| external_id | varchar(50) | Yes | NULL | User ID from external authentication provider. not in use | 0000 | |
| session_hash | varchar(40) | Yes | NULL | Hash for API session authentication. 32 digit hexadecimal | - | |
| session_hash_date | timestamp | Yes | CURRENT_TIMESTAMP | When the session hash was last generated. | 2024-10-20 20:18:54 | |
| gamification_visibility | varchar(32) | No | 'show' | Visibility setting for gamification badges. One of 'show' or 'hidden' | hidden |
groups
Defines user groups for role-based access control. Currently the database recognizes three groups: admins, normal users, and read-only users.
| Column | Type | Optional | Default | References | Description | Example |
|---|---|---|---|---|---|---|
| id | mediumint unsigned | No | auto_increment | Primary key. | 2 | |
| name | varchar(20) | No | Group name. | members | ||
| description | varchar(100) | No | Description of the group's purpose. | normal read-write permissions |
users_groups
Associates users with groups (many-to-many relationship).
| Column | Type | Optional | Default | References | Description | Example |
|---|---|---|---|---|---|---|
| id | mediumint unsigned | No | auto_increment | Primary key. | 2 | |
| user_id | mediumint unsigned | No | users.id | The user. | 2 | |
| group_id | mediumint unsigned | No | groups.id | The group the user belongs to. | 2 |
file
Stores metadata about uploaded files (datasets, flows, predictions, etc.).
| Column | Type | Optional | Default | References | Description | Example |
|---|---|---|---|---|---|---|
| id | int | No | auto_increment | Primary key. | 1 | |
| creator | int | No | User ID of the uploader. | 2 | ||
| creation_date | datetime | No | When the file was uploaded. | 2015-11-30 06:48:32 | ||
| filepath | varchar(256) | No | Storage path on the server. | dataset/api/dataset_1_anneal.arff | ||
| filesize | int | No | File size in bytes. | 143338 | ||
| filename_original | varchar(256) | No | Original filename as uploaded. | dataset_1_anneal.arff | ||
| extension | varchar(16) | No | File extension (e.g., arff, csv). | arff | ||
| mime_type | varchar(32) | No | MIME type of the file. | application/octet-stream | ||
| md5_hash | varchar(64) | No | MD5 checksum for integrity verification. | 43b29a3eb09e8fac9a8525c3c83abec8 | ||
| type | enum('dataset','implementation','predictions','userimage','run_trace','run_uploaded_file','url','misc') | No | Category of the file. | dataset | ||
| access_policy | enum('public','private','none','deleted') | No | 'public' | Access control policy for the file. | public |
deprecated tables
There are also category and thread tables which were designed for a forum feature but are not used.
The meta_dataset table is for requesting automated metadata set building, a feature which is not enabled.
The access table is not used, access constraints are currently handled by columns in the respective table (e.g., dataset.visibility for datasets).