Custom Datasets¶
This module contains the custom datasets for OpenML datasets.
GenericDataset
¶
Bases: Dataset
Generic dataset that takes X,y as input and returns them as tensors
Source code in temp_dir/pytorch/openml_pytorch/custom_datasets/generic_dataset.py
OpenMLImageDataset
¶
Bases: Dataset
Class representing an image dataset from OpenML for use in PyTorch.
Methods:
__init__(self, X, y, image_size, image_dir, transform_x=None, transform_y=None)
Initializes the dataset with given data, image size, directory, and optional transformations.
__getitem__(self, idx)
Retrieves an image and its corresponding label (if available) from the dataset at the specified index. Applies transformations if provided.
__len__(self)
Returns the total number of images in the dataset.
Source code in temp_dir/pytorch/openml_pytorch/custom_datasets/image_dataset.py
OpenMLTabularDataset
¶
Bases: Dataset
OpenMLTabularDataset
A custom dataset class to handle tabular data from OpenML (or any similar tabular dataset). It encodes categorical features and the target column using LabelEncoder from sklearn.
Methods: init(X, y) : Initializes the dataset with the data and the target column. Encodes the categorical features and target if provided.
__getitem__(idx): Retrieves the input data and target value at the specified index.
Converts the data to tensors and returns them.
__len__(): Returns the length of the dataset.