Object Classification ##################### This tutorial describes how to use the **cvtk** package to build a model of an object classification task, from training the to inference. .. note:: The **cvtk** package internally calls functions implemented in the **torch** (`PyTorch `_) and **torchvision** packages for object classification tasks. Ensure that **PyTorch** is installed correctly without any errors before using the **cvtk** package. .. code-block:: python import torch import torchvision print(torch.__version__) print(torchvision.__version__) Source Code Preparation *********************** User can import **cvtk** package to build and train a model for object classification tasks and use the model for inference by refering to the **cvtk** package documentation. Alternatively, to get the quick start, user can generate an example Python source code using the ``cvtk`` command. The source code for object classification tasks can be generated by ``cvtk create`` command with the ``--task cls`` argument. For those new to programming or deep learning, it is recommended to run the command with default options. It generates simple source code that contains only the essential processes, with all complex processes imported from the **cvtk** package. This makes the source code easy to read and helps in understanding the flow of deep learning for beginners. .. code-block:: sh cvtk create --script cls.py --task cls After running the command, the source code will be generated in the file :file:`cls.py`. By default, the network architecture ResNet18 (``torchvision.models.resnet18``) is used. User can change the ``resnet18`` to other keywords to use different network architectures by editing :file:`cls.py`. The available network architectures can be found on the PyTroch website (e.g., `Models and pre-trained weights `_). Additionally, for those who are already familiar with deep learning, it is recommended to run the command with the additional argument ``--vanilla``. It generates source code that uses only the **PyTorch** package functions. Users can then customize the source code to suit their needs, for example, adding data augmentation processes and changing optimization algorithms. .. code-block:: sh cvtk create --script cls.py --task cls --vanilla Model Training and Validation ***************************** To train the model, open the source code generated above and execute it by providing training, validation, and test data to the input of the ``train`` function. Alternatively, the source code can be executed directly from the command line as follows: .. code-block:: sh python cls.py train \ --label ./data/fruits/label.txt \ --train ./data/fruits/train.txt \ --valid ./data/fruits/valid.txt \ --test ./data/fruits/test.txt \ --output_weights ./outputs/fruits.pth The weights of the trained model will be saved in :file:`fruits.pth`, and the loss and accuracy data during the training process will be saved in :file:`fruits.train_stats.txt` and showed in figure :file:`fruits.train_stats.png`. The file :file:`fruits.train_stats.txt` is a tab-separated file consiting of five columns: epoch, train_loss, train_acc, valid_loss, and vlaid_acc, as follows: :: epoch train_loss train_acc valid_loss valid_acc 1 1.40679 0.22368 1.24780 0.41667 2 1.21213 0.48684 1.09401 0.83334 3 1.00425 0.81578 0.88967 0.83334 4 0.78659 0.82894 0.64055 0.91666 5 0.46396 0.96052 0.39010 0.91666 .. image:: ../_static/fruits.train_stats.png :width: 70% :align: center Additionally, if the test data is provided, the model will be evaluated using the test data. The test results will be saved in :file:`fruits.test_outputs.txt` and confusion matrix will be saved in :file:`fruits.test_outputs.cm.txt` and :file:`fruits.test_outputs.cm.png`. The file :file:`fruits.test_outputs.txt` is a tab-separated file, where the first column is the path to the image, the second column is the true label, and the following columns are the predicted probabilities for each class. :: # loss: 0.021113455295562744 # acc: 0.944932234 image label cucumber eggplant strawberry tomato 44a0ceae.jpg cucumber 0.97071 0.00400 0.01282 0.01248 4b0249f4.jpg cucumber 0.81493 0.09675 0.04698 0.04134 14c6e557.jpg strawberry 0.00000 0.00028 0.99940 0.00032 18174d63.jpg strawberry 0.00000 0.00045 0.99904 0.00051 2a43e151.jpg tomato 0.00004 0.00119 0.00404 0.99473 35235e30.jpg eggplant 0.00000 1.00000 0.00000 0.00000 667a045f.jpg cucumber 0.96733 0.00430 0.01193 0.01644 ... The file :file:`fruits.test_outputs.cm.txt` is a tab-separated file, representing a confusion matrix of test data. The class labels shown in the header are the predicted labels while the class labels shown in the first column are the ground truth. :: # Confusion Matrix # prediction cucumber eggplant strawberry tomato cucumber 8 0 0 0 eggplant 0 8 0 0 strawberry 0 0 8 0 tomato 0 0 0 8 The file :file:`fruits.test_outputs.cm.png` is a figure showing the confusion matrix. .. image:: ../_static/fruits.test_outputs.cm.png :width: 70% :align: center Inference ********* To perform inference using the constructed model, refer to the ``inference`` function in the source code. Alternatively, it can also be executed directly from the command line as follows: .. code-block:: sh python cls.py inference \ --label ./data/fruits/label.txt \ --data ./data/fruits/test.txt \ --model_weights ./outputs/fruits.pth \ --output ./outputs/fruits.inference_results.txt The inference results will will be saved in :file:`fruits.inference_results.txt`. The file is a tab-separated file, where the first column is the path to the image, the second column is the predicted label, and the following columns are the predicted probabilities for each class. :: image prediction cucumber eggplant strawberry tomato 44a0ceae.jpg cucumber 0.99384 0.00226 0.00081 0.00308 14c6e557.jpg strawberry 0.00000 0.00003 0.99965 0.00032 c937b2d9.jpg eggplant 0.00177 0.99704 0.00031 0.00088 1fd32b2f.jpg eggplant 0.00001 0.99994 0.00003 0.00000 cad59952.jpg tomato 0.00000 0.00000 0.00001 0.99999