In this blog post, we will cover the necessary steps to train a custom image classification model and test it on images.
The Ikomia API simplifies the development of Computer Vision workflows and provides an easy way to experiment with different parameters to achieve optimal results.
You can train a custom classification model with just a few lines of code. To begin, you will need to install the API within a virtual environment.
How to install a virtual environment
In this tutorial, we will use the Rock, Paper, Scissor dataset from Roboflow.
Ensure that the dataset is organized in the correct format, as shown below:
(Note: The "validation" folder should be renamed to "val".)
You can also charge directly the open-source notebook we have prepared.
After 5 epochs of training, you will see the following metrics:
Before experimenting with TorchVision ResNet, let's dive deeper into image classification and the characteristics of this particular algorithm.
Image classification is a fundamental task in Computer Vision that involves categorizing images into predefined classes based on their visual content. It enables computers to recognize objects, scenes, and patterns within images. The importance of image classification lies in its various applications:
It allows computers to identify and categorize objects in images, essential for applications like autonomous vehicles and surveillance systems.
Classification helps machines interpret image content and extract meaningful information, enabling advanced analysis and decision-making based on visual data.
By assigning tags or labels to images, classification models facilitate efficient searching and retrieval of specific images from large databases.
Image classification aids in automatically detecting and flagging inappropriate or offensive content, ensuring safer online environments.
Classification assists in diagnosing diseases and analyzing medical images, enabling faster and more accurate diagnoses.
By classifying images, defects or anomalies in manufactured products can be identified, ensuring quality control in various industries.
Image classification enhances recommendation systems by analyzing visual content and suggesting related items or content.
Classification enables the identification of objects or individuals of interest in security and surveillance applications, enhancing threat detection and public safety.
In summary, image classification is essential for object recognition, image understanding, search and retrieval, content moderation, medical imaging, quality control, recommendation systems, and security applications in computer vision.
TorchVision is a popular Computer Vision library in PyTorch that provides pre-trained models and tools for working with image data. One of the widely used models in TorchVision is ResNet. ResNet, short for Residual Network, is a deep convolutional neural network architecture introduced by Kaiming He et al. in 2015. It was designed to address the challenge of training deep neural networks by introducing a residual learning framework.
ResNet uses residual blocks with skip connections to facilitate information flow between layers, mitigating the vanishing gradient problem and enabling the training of deeper networks.
The key idea behind ResNet is the use of residual blocks, which allow the network to learn residual mappings. These residual blocks contain skip connections that bypass one or more layers, enabling the flow of information from earlier layers to later layers.
This helps alleviate the vanishing gradient problem and facilitates the training of deeper networks.
This allows information to pass through multiple layers without degradation, making training and optimization easier.
The Microsoft Research team won the ImageNet 2015 competition using these deep residual layers, which use skip connections. They used ResNet-152 convolutional neural network architecture, comprising a total of 152 layers.
ResNet models are available in torchvision with different depths, including ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152. These pre-trained models have been trained on large-scale image classification tasks, such as the ImageNet dataset, and achieved state-of-the-art performance.
By using pre-trained ResNet models from torchvision, researchers and developers can leverage the learned representations for various Computer Vision tasks, including image classification, object detection, and feature extraction.
With the dataset of Rock, Paper & Scissor images that you have downloaded, you can easily train a custom ResNet model using the Ikomia API. Let's go through the process together:
Initialize a workflow instance by creating a ‘wf’ object. This object will be used to add tasks to the workflow, configure their parameters, and run them on input data.
Now, let's add the train_torchvision_resnet task to train our custom image classifier. We also need to specify a few parameters for the task:
Next, provide the path to the dataset folder for the task input.
Finally, it's time to run the workflow and start the training process.
First, we can run a rock/paper/scissor image on the pre-trained ResNet34 model:
We can observe that ResNet34 pre-trained model doesn’t detect rock signs. This is because the model has been trained on the ImageNet dataset, which does not contain images of rock/paper/scissor hand signs.
To test the model we just trained, we specify the path to our custom model and class names using the ’model_weight_file’ and “class_file” parameters. We then run the workflow on the same image we used previously.
Here are some more examples of image classification using the pre-trained (left) and our custom model (right):
To learn more about the API, refer to the documentation. You may also check out the list of state-of-the-art algorithms on Ikoma HUB and try out Ikomia STUDIO, which offers a friendly UI with the same features as the API.