cvtk.ml

This module provides high-level APIs for training, evaluating, and deploying models for classification, detection, and segmentation tasks. It supports PyTorch and MMDetection backends.

Core API

cvtk.ml.split_dataset(data: str | list[str, str] | tuple[str, str], output: str | None = None, ratios: list[float] | tuple[float] = [0.8, 0.1, 0.1], shuffle: bool = True, stratify: bool = True, random_seed: int | None = None) → list[list][source]

Split a dataset into multiple subsets with specified ratios.

Splits a dataset (from list or text file) into train/validation/test subsets with optional shuffling and stratified sampling for balanced class distribution.

Parameters:

data (str|list|tuple) – Dataset to split. Can be: - str: Path to text file (one sample per line, optional tab-separated label) - list/tuple: List of samples, each element or (sample, label) tuple
output (str|None) – Base path to save split subsets as text files. Files saved as output.0, output.1, etc. Default None (no output file).
ratios (list[float]|tuple[float]) – Split ratios. Must sum to 1.0. Default [0.8, 0.1, 0.1] (train/val/test split).
shuffle (bool) – Randomly shuffle dataset before splitting. Default True.
stratify (bool) – Maintain class distribution across splits if labels present. Default True.
random_seed (int|None) – Seed for reproducible shuffling. Default None (random).

Returns:

List of subsets. Each subset is a list of samples/records.

Return type:

list[list]

Raises:

ValueError – If ratios don’t sum to 1.0 or data format invalid.

Examples

>>> subsets = split_dataset('data.txt', ratios=[0.8, 0.1, 0.1])
>>> len(subsets)
3
>>> subsets = split_dataset(['img1.jpg', 'img2.jpg'], output='split',
...                          ratios=[0.7, 0.3], shuffle=True, random_seed=42)
>>> with open('split.0', 'r') as f:
...     print(f.readlines())

Data Utilities

class cvtk.ml.data.DataLabel(labels: list | tuple | str)[source]

Manage class labels for machine learning tasks.

Loads and manages class (category) labels from various sources (list, tuple, text file, or COCO JSON). Provides bidirectional lookup: get label name by index or get index by label name.

Parameters:: labels (list|tuple|str) – Class labels. Can be: - list or tuple: Direct list of label names - str: File path to text file (one label per line) or COCO JSON file

labels

List of all class labels.

Type:: list

Raises:

TypeError – If labels is not list, tuple, or str.
FileNotFoundError – If file path provided doesn’t exist.

Examples

>>> from cvtk.ml.data import DataLabel
>>> datalabel = DataLabel(['leaf', 'flower', 'root'])
>>> datalabel[0]
'leaf'
>>> datalabel['flower']
1
>>> len(datalabel)
3
>>> datalabel.labels
['leaf', 'flower', 'root']
>>> datalabel = DataLabel('labels.txt')

property labels

List of all class labels.

Returns:: All label names in order.
Return type:: list

save(output)[source]

Save class labels to a text file.

Saves one label per line in plain text format.

Parameters:: output (str) – File path for output text file.
Returns:: None. File saved to disk.

Examples

>>> datalabel = DataLabel(['cat', 'dog', 'bird'])
>>> datalabel.save('labels.txt')

class cvtk.ml.data.SquareResize(shape: int = 600, bg_color: tuple[int, int, int] | None = None, resample: object = PIL.Image.BILINEAR)[source]

Resize image to square with padding and optional color background.

Resizes an image to a square by: 1. Scaling the longest side to target shape size 2. Padding the shorter side with either blurred edge pixels or solid color

Useful as a preprocessing transform for image classification tasks.

Parameters:

shape (int) – Target square size (width and height in pixels). Default is 600.
bg_color (tuple[int,int,int]|None) – RGB color for padding area. If None, uses blurred edge pixels. Default is None.
resample (int) – PIL resampling filter for scaling. Default is PIL.Image.BILINEAR.

Returns:

Square image of shape (shape, shape) in RGB mode.

Return type:

PIL.Image.Image

Examples

>>> from cvtk.ml.data import SquareResize
>>> squareresize = SquareResize(shape=600)
>>> img = squareresize('image.jpg')
>>> img.save('image_square.jpg')
>>> squareresize = SquareResize(shape=600, bg_color=(0, 0, 0))
>>> img = squareresize('image.jpg')
>>> img.save('image_square.jpg')
>>> import torchvision.transforms
>>> transform = torchvision.transforms.Compose([
...     SquareResize(256),
...     torchvision.transforms.RandomHorizontalFlip(0.5),
...     torchvision.transforms.RandomAffine(45),
...     torchvision.transforms.ToTensor(),
...     torchvision.transforms.Normalize([0.485, 0.456, 0.406],
...                                      [0.229, 0.224, 0.225])
... ])

PyTorch Utilities for Classification

class cvtk.ml.torchapi.DataTransform(shape: int | tuple[int, int], is_train=False)[source]

Image preprocessing pipeline for classification tasks.

Composes common image preprocessing transforms for classification using torchvision. Provides separate pipelines for training (with augmentation) and inference (without augmentation). Intended for users who want a quick preprocessing setup. For advanced customization, use torchvision.transforms.Compose directly.

Parameters:

shape (int|tuple[int,int]) – Target image resolution. If int, creates square images.
is_train (bool) – If True, creates training pipeline with augmentation (random crop, flip, rotation). If False, creates inference pipeline (resize only). Default is False.

pipeline

The composed transform pipeline.

Type:: torchvision.transforms.Compose

Examples

>>> from cvtk.ml.torchapi import DataTransform
>>> transform_train = DataTransform(224, is_train=True)
>>> print(transform_train.pipeline)
>>> transform_inference = DataTransform(224)
>>> print(transform_inference.pipeline)

cvtk.ml.torchapi.Dataset(datalabel, dataset, transform, stream_data=False, oversample=False, image_root=None)[source]

Create a dataset for image classification.

Factory function that creates either a standard or iterable dataset based on arguments. Automatically extracts pipeline from DataTransform objects and handles image loading from directories, lists, tuples, or tab-separated files.

Parameters:

datalabel (cvtk.ml.data.DataLabel|str|list|tuple) – Class labels. Can be a DataLabel instance, file path, or list/tuple of label names.
dataset (str|list|tuple) – Image data source: - File path: TSV file, image directory, or single image file - List/tuple: Image paths with optional labels as nested lists/tuples
transform (cvtk.ml.torchapi.DataTransform|torchvision.transforms.Compose|None) – Image preprocessing pipeline.
stream_data (bool) – If True, returns iterable dataset for memory-efficient streaming. If False, returns standard dataset that loads all at once. Default is False.
oversample (bool) – If True, oversample minority classes to balance dataset. Only works with labeled data. Default is False.
image_root (str|None) – Base directory for relative image paths in dataset. If None, uses directory of dataset file (for TSV) or current directory. Default is None.

Returns:

A PyTorch dataset object ready for DataLoader.

Return type:

Dataset_|DatasetIterable_

Examples

>>> from cvtk.ml import DataLabel
>>> from cvtk.ml.torchapi import Dataset, DataTransform
>>> datalabel = DataLabel(['cat', 'dog'])
>>> transform = DataTransform(224, is_train=True)
>>> dataset = Dataset(datalabel, 'train.txt', transform)
>>> print(len(dataset))

class cvtk.ml.torchapi.Dataset_(*args: Any, **kwargs: Any)[source]

A class to manupulate image data for training and inference

Dataset is a class that generates a dataset for training or testing with PyTorch. It loads images from a directory (the subdirectories are recursively loaded), a list, a tuple, or a tab-separated (TSV) file. For the TSV file, the first column is recognized as the the path to the image and the second column as correct label if present. For traning, validation, and test, data should be input with TSV files containing two columns.

Imbalanced data will make the model less sensitive to minority classes with small sample sizes compared to normal data for balanced data. Therefore, if models are created without properly addressing imbalanced data, problems will arise in terms of accuracy, computational complexity, etc. It is best to have balanced data during the data collection phase. However, if it is difficult to obtain balanced data in some situations, oversampling is used so that the samples in the minority class are equal in number to those in the major class. In this class, oversampling is performed by specifying oversample=TRUE.

Parameters:

datalabel (cvtk.ml.data.DataLabel|str|list|tuple) – A DataLabel instance. This datalabel is used to convert class labels to integers.
dataset (str|list|tuple) – A path to a directory, a list, a tuple, or a TSV file.
transform (torchvision.transforms.Compose|DataTransform|None) – A transform pipeline of image processing.
oversample (bool) – If True, the number of images in each class is balanced.

Examples

>>> from cvtk.ml import DataLabel
>>> from cvtk.ml.torchapi import Dataset, DataTransform
>>>
>>> datalabel = DataLabel(['leaf', 'flower', 'root'])
>>>
>>> transform = DataTransform(224, is_train=True)
>>>
>>> dataset = Dataset(datalabel, 'train.txt', transform)
>>> print(len(dataset))
100
>>> img, label = dataset[0]
>>> print(img.shape)
>>> print(label)

class cvtk.ml.torchapi.DatasetIterable_(*args: Any, **kwargs: Any)[source]

class cvtk.ml.torchapi.DataLoader(*args: Any, **kwargs: Any)[source]

DataLoader for managing image classification datasets.

Wrapper around torch.utils.data.DataLoader with sensible defaults for image classification. Supports batching, shuffling, and parallel data loading via worker processes.

Parameters:

dataset (torch.utils.data.Dataset) – A PyTorch dataset object (typically from Dataset factory function).
batch_size (int) – Number of samples per batch. Default is 1.
shuffle (bool) – If True, shuffle data at every epoch. Default is False.
num_workers (int) – Number of worker processes for loading. Default is 0.
**kwargs – Additional arguments passed to torch.utils.data.DataLoader.

Examples

>>> from cvtk.ml import DataLabel
>>> from cvtk.ml.torchapi import DataTransform, Dataset, DataLoader
>>> datalabel = DataLabel(['cat', 'dog', 'bird'])
>>> transform = DataTransform(224, is_train=True)
>>> dataset = Dataset(datalabel, 'train.txt', transform)
>>> dataloader = DataLoader(dataset, batch_size=32, num_workers=4, shuffle=True)

class cvtk.ml.torchapi.BaseRunner(datalabel, workspace=None, device='auto')[source]

Base class for PyTorch model runners.

Abstract base class providing common utilities for training and inference runners. Handles device initialization, data label management, workspace setup, and checkpoint management. Task-specific subclasses (e.g., ClsRunner) implement training, evaluation, and inference logic.

class cvtk.ml.torchapi.ClsRunner(datalabel, model, weights=None, workspace=None, device='auto')[source]

Classification model runner for training and inference with PyTorch.

High-level interface for image classification training, evaluation, and inference. Supports model selection from torchvision, weight loading, mixed precision training, and various output formats. Automatically handles checkpoint management and logging.

Parameters:

datalabel (str|list|tuple|DataLabel) – Class labels. If string (file path), list, or tuple, converted to DataLabel instance.
model (str|torch.nn.Module) – Torchvision model name (e.g., ‘efficientnet_b7’) or torch.nn.Module instance.
weights (str|None) – Path to pretrained weights or torchvision weights enum (e.g., ‘EfficientNet_B7_Weights.DEFAULT’). Default is None.
workspace (str|None) – Directory for saving checkpoints and logs. If None, intermediate results not persisted. Default is None.
device (str) – Device to run on (‘auto’, ‘cuda’, ‘cpu’). Default is ‘auto’.

device

Actual device used (‘cuda’ or ‘cpu’).

Type:: str

datalabel

Class label manager.

Type:: DataLabel

model

The classification model.

Type:: torch.nn.Module

workspace

Checkpoint/log directory.

Type:: str|None

train_stats

Training statistics (epoch, loss, accuracy).

Type:: dict

test_stats

Test statistics (loss, accuracy, scores).

Type:: dict

Examples

>>> from cvtk.ml.torchapi import ClsRunner
>>> datalabel = ['cat', 'dog', 'bird']
>>> runner = ClsRunner(datalabel, 'efficientnet_b7', 'EfficientNet_B7_Weights.DEFAULT')

train(train, valid=None, test=None, epoch=20, optimizer='auto', criterion='auto', scaler='auto', resume=False)[source]

Train the model with provided dataloaders.

Trains the model for specified epochs with optional validation and testing. Training statistics are logged and saved to workspace if provided. Supports mixed precision training (AMP) when available and resumable checkpointing.

Parameters:

train (torch.utils.data.DataLoader) – DataLoader for training data.
valid (torch.utils.data.DataLoader|None) – DataLoader for validation. Default is None.
test (torch.utils.data.DataLoader|None) – DataLoader for testing at end of training. Default is None.
epoch (int) – Number of epochs to train. Default is 20.
optimizer (torch.optim.Optimizer|str) – Optimizer instance or ‘auto’ for SGD. Default is ‘auto’.
criterion (torch.nn.Module|str) – Loss function or ‘auto’ for CrossEntropyLoss. Default is ‘auto’.
scaler (torch.amp.GradScaler|str|None) – Gradient scaler for AMP or ‘auto’ for automatic selection. Default is ‘auto’.
resume (bool) – If True, resume from last checkpoint in workspace. Default is False.

Returns:

None. Training stats saved to train_stats attribute and workspace if provided.

Examples

>>> from cvtk.ml import DataLabel
>>> from cvtk.ml.torchapi import DataTransform, Dataset, DataLoader, ClsRunner
>>> datalabel = DataLabel(['leaf', 'flower', 'root'])
>>> model = ClsRunner(datalabel, 'efficientnet_b7', 'EfficientNet_B7_Weights.DEFAULT')
>>> transform_train = DataTransform(600, is_train=True)
>>> dataset_train = Dataset(datalabel, 'train.txt', transform_train)
>>> dataloader_train = DataLoader(dataset_train, batch_size=32, num_workers=4)
>>> model.train(dataloader_train, epoch=20)

save(output)[source]

Save model weights, data labels, and training logs.

Saves the trained model weights and associated metadata. Creates output directory if needed. Also saves training statistics and test outputs as separate files with same base name. The model and datalabel are saved as a pair for later loading.

Parameters:

output (str) – File path for model weights. Auto-appends ‘.pth’ if missing.

Returns:

{output}.pth: Model weights
{output}.dl.txt: DataLabel (class names)
{output}.train_stats.txt: Training statistics
{output}.test_outputs.txt: Test outputs (if available)

Return type:

None. Files saved

Examples

>>> from cvtk.ml import DataLabel
>>> from cvtk.ml.torchapi import ClsRunner
>>> datalabel = DataLabel(['leaf', 'flower', 'root'])
>>> model = ClsRunner(datalabel, 'efficientnet_b7', 'EfficientNet_B7_Weights.DEFAULT')
>>> # ... training ...
>>> model.save('output/plant_classifier.pth')

test(dataloader, criterion=None, score_type='softmax')[source]

Evaluate model on test dataset.

Runs model in evaluation mode on test data and computes loss, accuracy, and class scores. Results are stored in test_stats for later access or saving.

Parameters:

dataloader (torch.utils.data.DataLoader) – DataLoader for test data.
criterion (torch.nn.Module|None) – Loss function. Default is None (uses CrossEntropyLoss).
score_type (str) – Score format: ‘logits’, ‘softmax’, or ‘sigmoid’. Default is ‘softmax’.

Returns:

Test statistics containing:

’dataset’: The dataset object
’loss’: Average loss on test data
’acc’: Accuracy on test data
’scores’: Predicted scores/probabilities for each sample

Return type:

dict

Examples

>>> from cvtk.ml import DataLabel
>>> from cvtk.ml.torchapi import DataTransform, Dataset, DataLoader, ClsRunner
>>> datalabel = DataLabel(['cat', 'dog', 'bird'])
>>> model = ClsRunner(datalabel, 'efficientnet_b7', 'model.pth')
>>> transform = DataTransform(224, is_train=False)
>>> dataset = Dataset(datalabel, 'test.txt', transform)
>>> dataloader = DataLoader(dataset, batch_size=32)
>>> stats = model.test(dataloader)
>>> print(f"Test accuracy: {stats['acc']:.4f}")

inference(data, format='pandas', batch_size=32, num_workers=8, score_type='softmax')[source]

Perform inference on images with the trained model.

Runs the model in evaluation mode on input images and returns predictions in requested format. Handles various input types: DataLoader, image list, single image, or directory. Automatically uses appropriate image size from training.

Parameters:

data (torch.utils.data.DataLoader|str|list) – Input data: - DataLoader: PyTorch DataLoader with test data - str: File path to single image, TSV file, or directory of images - list: List of image paths with optional labels
format (str) – Output format: ‘pandas’ (DataFrame), ‘list’, ‘dict’, ‘numpy’, or ‘np’. Default is ‘pandas’.
batch_size (int) – Batch size for inference. Default is 32.
num_workers (int) – Number of workers for data loading. Default is 8.
score_type (str) – Score format: ‘logits’, ‘softmax’, or ‘sigmoid’. Default is ‘softmax’.

Returns:

‘pandas’: DataFrame with columns as class names, rows as images
’list’: List of score lists, one per image
’dict’: List of dicts with ‘label’ and ‘score’ keys per class
’numpy’/’np’: numpy array of shape (n_samples, n_classes)

Return type:

Predictions in requested format

Examples

>>> from cvtk.ml import DataLabel
>>> from cvtk.ml.torchapi import DataTransform, Dataset, DataLoader, ClsRunner
>>> datalabel = DataLabel(['cat', 'dog', 'bird'])
>>> model = ClsRunner(datalabel, 'efficientnet_b7', 'model.pth')
>>> transform = DataTransform(224, is_train=False)
>>> dataset = Dataset(datalabel, 'test_images/', transform)
>>> dataloader = DataLoader(dataset, batch_size=32)
>>> probs = model.inference(dataloader, format='pandas')
>>> probs.to_csv('predictions.txt', sep='\t')

PyTorch Utilities for Detection and Segmentation

class cvtk.ml.torchdetapi.DataTransform(is_train: bool = False)[source]

Image preprocessing pipeline for torchvision detection models.

This initial implementation keeps transforms simple and geometry-safe for detection targets. It converts images into float tensors in [0, 1].

class cvtk.ml.torchdetapi.Dataset(*args: Any, **kwargs: Any)[source]

Detection dataset supporting COCO annotations and raw image inputs.

Parameters:

datalabel – DataLabel-like object, label file, or list of class names.
dataset – COCO json path, image directory, image file path, or list of image paths.
transform – DataTransform or torchvision compose transform.
image_root – Base directory for COCO image paths.

class cvtk.ml.torchdetapi.DataLoader(*args: Any, **kwargs: Any)[source]: DataLoader wrapper for detection tasks with default collate function.

Torchvision detection runner.

Initial implementation supports Faster R-CNN and is intentionally structured to be extendable to other torchvision detection architectures later.

train(train: torch.utils.data.DataLoader, valid: torch.utils.data.DataLoader | None = None, test: torch.utils.data.DataLoader | None = None, epoch: int = 20, optimizer: torch.optim.Optimizer | str = 'auto', scheduler: Any | None = None) → None[source]

Train the detection model.

Parameters:

train (torch.utils.data.DataLoader) – Training dataloader built from labeled COCO targets.
valid (torch.utils.data.DataLoader|None) – Optional validation dataloader.
test (torch.utils.data.DataLoader|None) – Optional test dataloader evaluated after training completes.
epoch (int) – Number of epochs to train.
optimizer (torch.optim.Optimizer|str) – Optimizer instance or 'auto' to use SGD.
scheduler (torch.optim.lr_scheduler._LRScheduler|None) – Optional learning-rate scheduler stepped once per epoch.

Raises:

ValueError – If a training or validation batch has no targets.

test(dataloader: torch.utils.data.DataLoader, cutoff: float = 0.5) → dict[str, Any][source]

Evaluate predictions against COCO ground truth.

Parameters:

dataloader (torch.utils.data.DataLoader) – Detection dataloader built from a COCO dataset.
cutoff (float) – Minimum score threshold for predicted annotations.

Returns:

The computed COCO metrics dictionary.

Return type:

dict

Raises:

TypeError – If dataloader is not a torch dataloader.
ValueError – If the dataloader dataset does not expose a COCO annotation file.

save(output: str) → None[source]

Save the model, label map, and training statistics.

Parameters:: output (str) – Output path for the model checkpoint. .pth is appended automatically when missing.

inference(data: torch.utils.data.DataLoader | str | list[str] | tuple[str], cutoff: float = 0.5, batch_size: int = 4, num_workers: int = 0) → cvtk_data.ImageDataset[source]

Run inference and return an cvtk.data.ImageDataset.

Parameters:

data (torch.utils.data.DataLoader|str|list|tuple) – Existing dataloader or an image path, directory, list of image paths, or text file accepted by Dataset.
cutoff (float) – Minimum score threshold for returned predictions.
batch_size (int) – Batch size used when data is not already a dataloader.
num_workers (int) – Number of worker processes used when creating the dataloader.

Returns:

Predicted images and annotations as an image dataset.

Return type:

cvtk.data.ImageDataset

segmentation runner built on top of DetRunner.

Current only supports Mask R-CNN family of models. All methods and attributes are inherited from DetRunner. To use SegmRunner, simply replace DetRunner with SegmRunner in your code and refer to the documentation of DetRunner for usage.

MMDetection Utilities

class cvtk.ml.mmdetapi.DataPipeline(is_train: bool = False, with_bbox: bool = True, with_mask: bool = False)[source]

Generate image preprocessing pipeline

This class provides the basic image preprocessing pipeline used in MMDetection.

Parameters:

is_train (bool) – Whether the pipeline is for training. Default is False.
with_bbox (bool) – Whether the dataset contains bounding boxes. Default is True for object detection with bounding boxes only.
with_mask (bool) – Whether the dataset contains masks. Default is False.

Generate dataset configuration

This function generates the dataset configuration for MMDetection.

Parameters:

datalabel (cvtk.ml.data.DataLabel) – A DataLabel class object.
dataset (str|list[str]|dict|None) – A path to a COCO format file with extension ‘.json’, a path to a directory containing images, a path to an image file, or a list of paths to image files. Note that, for training, validation, and test, the COCO format file is required.
pipeline (DataPipeline|None) – A DataPipeline class object.
repeat_dataset (bool) – Whether to repeat the dataset. Default is False. Use the repeated dataset for training will be faster in some architecture.
image_root (str|None) – Base directory for resolving COCO image file_name paths. If None, image paths are resolved relative to the COCO annotation file directory.

class cvtk.ml.mmdetapi.DataLoader(dataset: Dataset | None = None, phase: str = 'inference', batch_size: int = 4, num_workers: int = 4)[source]

Generate dataloader configuration

This function generates the dataloader configuration for MMDetection.

Parameters:

dataset (Dataset|None) – A Dataset class object.
phase (str) – The purpose of DataLoader usage. It shold be specified as one ‘train’, ‘valie’, ‘test’, and ‘inference’.
batch_size (int) – Batch size.
num_workers (int) – Number of threads for data preprocessing and loading.

A class for object detection and instance segmentation

This class provides user-friendly APIs for object detection and instance segmentation using MMDetection. There are four main methods are implemented in this class: train, test, save, inference. The train method is used for training the model and perform validation and test if validation and test data are provided. The test method is used for testing the model with test data. In general, the performance test is performed automatically after the training, but user can also run the test independently from the training process with the test method. The save method is used for saving the model weights, configuration (design of model architecture), training log (e.g., mAP and loss per epoch), and test results. The inference method is used for running inference with the trained model. The detailed usage of each method is described in the method documentation.

Run mim search mmdet –model “faster r-cnn” to set the pre-defined configuration for cfg.

Parameters:

datalabel (cvtk.ml.data.DataLabel|str|list[str]|tuple[str]) – A DataLabel class object, a path to a file containing class labels, or a list of class labels.
cfg (str|dict) – A path to a file containing model configuration (usually with extension ‘.py’), a dictionary of a model configuration, or a keyword of configuration pre-defined in MMDetection. The pre-defined configuration can be found from MMDetection GitHub repository or list up with the mim command (e.g., mim search mmdet –model “faster r-cnn”).
weights (str|None) – A path to a file containing model weights (usually with extension ‘.pth’). If None, the function will download the pre-trained model weights from the MMDetection repository, or use the random weights if the download is not available.
workspace (str|None) – A path to a directory for storing the intermediate files. If not specified, this function creates a temporary directory in the OS temporary directory and removes it after the process is completed.
seed (int|None) – A seed for model training.

Examples

>>> from cvtk.ml.data import DataLabel
>>> from cvtk.ml.mmdetapi import DataPipeline, Dataset, DataLoader, DetRunner
>>>
>>> datalabel = DataLabel(['leaf', 'flower', 'stem'])
>>> cfg = 'faster_rcnn_r50_fpn_1x_coco'
>>> weights = None # download from MMDetection repository
>>> workspace = '/path/to/workspace'
>>>
>>> model = DetRunner(datalabel, cfg, weights, workspace)
>>>
>>> train = DataLoader(Dataset(datalabel, '/path/to/train/coco.json'), 'train')
>>> model.train(train, epoch=10)
>>> model.save('/path/to/model.pth')

Perform model training

The model can be trained with just the training data, but it is highly recommended to also provide validation and test data to thoroughly evaluate the model’s performance. If validation data is provided, the model’s performance will be evaluated after each epoch, and the metrics will be saved in the workspace. This allows the user to monitor the model’s progress and performance throughout the training process. Additionally, if test data is provided, the model will undergo a final evaluation at the end of training, and the test results will also be saved in the workspace. The test can also be performed independently from the training process, seed the test method for more details.

Args:
train (DataLoader): A DataLoader class object. valid (DataLoader|None): A DataLoader class object or None. test (DataLoader|None): A DataLoader class object or None. epoch (int): The number of epochs. optimizer (dict|str|None): A dictionary or string indicating optimizer for training. scheduler (dict|str|None): A dictionary or string indicating scheduler for training.

Examples

>>> from cvtk.ml.data import DataLabel
>>> from cvtk.ml.mmdetapi import DataPipeline, Dataset, DataLoader, DetRunner
>>>
>>> datalabel = DataLabel(['leaf', 'flower', 'stem'])
>>> cfg = 'faster_rcnn_r50_fpn_1x_coco'
>>> weights = None # download from MMDetection repository
>>> workspace = '/path/to/workspace'
>>>
>>> model = DetRunner(datalabel, cfg, weights, workspace)
>>>
>>> train = DataLoader(Dataset(datalabel, '/path/to/train/coco.json'), 'train')
>>> model.train(train, epoch=10)
>>> model.save('/path/to/model.pth')
>>>
>>>
>>> train = DataLoader(Dataset(datalabel, '/path/to/train/coco.json'), 'train')
>>> valid = DataLoader(Dataset(datalabel, '/path/to/valid/coco.json'), 'valid')
>>> test = DataLoader(Dataset(datalabel, '/path/to/test/coco.json'), 'test')
>>> model.train(train, valid, test, epoch=10)
>>> model.save('/path/to/model.pth')

test(test: DataLoader, cutoff: float = 0.5) → dict[source]

Validate the model with test data

This method is used to validate the model with test data. The test data should be COCO format file containing the annotations and converted to a dictionary withs DataLoader. The predicted annotations of test data will be stored in the workspace with the names of test_outputs.pkl in MMDetection format and test_outputs.coco.json in COCO format. The performance metrics (e.g., mAP) will be returned as a dictionary.

Parameters:

test (DataLoader) – A DataLoader class object configured for test phase.
cutoff (float) – A float value for the cutoff threshold of predicted scores.

Examples: >>> from cvtk.ml.data import DataLabel >>> from cvtk.ml.mmdetapi import DataPipeline, Dataset, DataLoader, DetRunner >>> >>> datalabel = DataLabel([‘leaf’, ‘flower’, ‘stem’]) >>> cfg = ‘faster_rcnn_r50_fpn_1x_coco’ >>> weights = ‘/path/to/model.pth’ >>> >>> model = DetRunner(datalabel, cfg, weights, workspace) >>> >>> test = DataLoader(Dataset(datalabel, ‘/path/to/test/coco.json’), ‘test’) >>> metrics = model.test(test) >>> print(metrics)

save(output: str)[source]

Save the model

Save the model. If training metrics and test results, usually generated from training process, are exists, they will be save in the same name of weights but with the different suffixes.

Parameters:: output (str) – A path to store the model weights and configuration.

inference(data: DataLoader | str | list[str], cutoff: float = 0.5) → ImageDataset[source]

Perform model inference on images.

Run inference on provided images and return results as an ImageDataset. Each image in the dataset contains InstanceAnnotation objects with predictions (bounding boxes, segmentations, scores).

Parameters:

data (DataLoader|str|list[str]) – A DataLoader object, a path to an image file, or a list of image paths.
cutoff (float) – Score threshold for filtering predictions. Default 0.5.

Returns:

Collection of ImageRecord objects with predictions.

Return type:

ImageDataset

Examples

>>> test_images = ['sample1.jpg', 'sample2.jpg', 'sample3.jpg']
>>> dataset = model.inference(test_images)
>>> for record in dataset:
>>>     bbox_img_fpath = os.path.splitext(str(record.source))[0] + '.bbox.png'
>>>     record.draw(layers=['bbox', 'segm'], output=bbox_img_fpath)
>>> dataset.to_coco('predictions.json')  # Export as COCO format

Minimal instance-segmentation runner wrapper around DetRunner.

This class keeps all training/testing/inference logic from DetRunner and only provides a task-specific alias for clearer API usage in segmentation workflows.