cvtk.ml
- cvtk.ml.split_dataset(data: str | list[str, str] | tuple[str, str], output: str | None = None, ratios: list[float] | tuple[float] = [0.8, 0.1, 0.1], shuffle: bool = True, stratify: bool = True, random_seed: int | None = None) list[list][source]
Split a dataset into multiple subsets with the given ratios
Split a dataset into multiple subsets with the given ratios.
- Parameters:
data – The dataset to split. The input can be a list of data (e.g., images) or a path to a text file. If list is given, each element of the list is treated as a sample.
output – The output file name will be appended with the index of the split subset.
ratios – The ratios to split the dataset. The sum of the ratios should be 1.
shuffle – Shuffle the dataset before splitting.
stratify – Split the dataset with a balanced class distribution if label is given.
random_seed – Random seed for shuffling the dataset.
- Returns:
A list of the split datasets. The length of the list is the same as the length of ratios.
Examples
>>> from cvtk.ml import split_dataset >>> >>> subsets = split_dataset('data.txt', ratios=[0.8, 0.1, 0.1]) >>> len(subsets) 3
- cvtk.ml.generate_source(project: str, task: str = 'cls', vanilla: bool = False) None[source]
Generate source code for classification or detection tasks
This function generates a Python script for training and inference of a model using PyTorch (for classification task) or MMDetection (for object detection and instance segmentation tasks). Two types of scripts can be generated based on the vanilla argument: one with importation of cvtk and the other without importation of cvtk. The script with importation of cvtk keeps the code simple and easy to understand, since most complex functions are implemented in cvtk. It designed for users who are beginning to learn deep learning for image tasks with PyTorch or MMDetection. On the other hand, the script without cvtk import is longer and more exmplex, but it can be more flexibly customized and further developed, since all functions is implemented directly in torch and torchvision.
- Parameters:
project – A file path to save the script.
task – The task type of project. Three types of tasks can be specified (‘cls’, ‘det’, ‘segm’). The default is ‘cls’.
vanilla – Generate a script without importation of cvtk. The default is False.
- cvtk.ml.generate_demoapp(project: str, source: str, label: str, model: str, weights: str, vanilla: bool = False) None[source]
Generate a FastAPI application for inference of a classification or detection model
This function generates a FastAPI application for inference of a classification or detection model.
- Parameters:
project – A file path to save the FastAPI application.
source – The source code of the model.
label – The label file of the dataset.
model – The configuration file of the model.
weights – The weights file of the model.
module – Script with importation of cvtk (‘cvtk’) or not (‘fastapi’).
Examples
>>> from cvtk.ml import generate_app >>> generate_app('./project', 'model.py', 'label.txt', 'model.cfg', 'model.pth')