Using dataset loaders to train a custom model with the Ikomia API

Allan Kouidri
-
8/29/2023
Ikomia workflow using a dataset loader and a training algorithm

With the Ikomia API, we can train a custom model using various dataset formats with just a few lines of code.

To begin, you'll need to install the API within a virtual environment.‍

How to install a virtual environment


pip install ikomia

API documentation

API repo

A comprehensive guide to the different dataset formats

As artificial intelligence and machine learning continue to advance, good-quality datasets are of the highest importance. A well-annotated dataset serves as the foundation for training robust AI models that can understand real-world data (remember, "garbage in, garbage out").

To assist AI developers, various dataset formats such as COCO and Pascal VOC have emerged.

The challenge of diverse dataset formats

Unfortunately, there is no standard format to represent data across all AI tasks. To address this issue, Ikomia has developed its own annotation format, taking inspiration from popular frameworks like TensorFlow and PyTorch. Then we provide specific algorithms in the HUB to load dataset from a third-party format to Ikomia one.  

Ikomia's solution: bridging the format gap

To accommodate datasets in various formats, Ikomia provides specific dataset loader algorithms in the HUB that can load datasets from third-party formats, such as COCO, YOLO Darknet, Pascal VOC, into the Ikomia format. This system ensures seamless connectivity between any valid dataset reader and all training algorithms available on the platform.

Seamless integration: from third-party formats to Ikomia

The Ikomia format offers a flexible and consistent way to store annotations and metadata for various AI applications. This system ensures that all training algorithms can be connected behind any valid dataset reader.

Ikomia workflow using a dataset loader and a training algorithm

Benefits of the Ikomia dataset

Key features of Ikomia format:

  • Flexibility: The Ikomia format can adapt to different AI tasks, making it suitable for object detection, image classification, instance segmentation, OCR, and more.
  • Consistency: The format ensures a consistent representation of data, which simplifies the development and integration of AI models into the Ikomia platform.
  • Third-party support: Ikomia enables seamless connectivity between dataset loader and training algorithms on the platform, supporting datasets in various formats.

Task

Object detection

Instance segmentation

Semantic segmentation

Keypoints

Text detection

Text recogntion

Classification

dataset_coco

   

dataset_yolo

     

dataset_pascal_voc

       

dataset_wildreceipt

         

dataset_via

     

dataset_classification

           

Table summarizing the Ikomia dataset loaders and their compatibility with the different training tasks.

Working with COCO (Common Objects in Context) dataset format

The Common Objects in Context (COCO) dataset format is a widely used and standardized format for various Computer Vision tasks, including object detection, segmentation, and keypoint detection. It was introduced by Microsoft in collaboration with several academic institutions.

The COCO dataset is well-known for its large-scale and diverse collection of images, making it valuable for training and evaluating AI models to comprehend real-world scenes and objects.

Key features of COCO format:

Image information

Each image in the dataset is uniquely identified by an image identifier or filename. This identifier is used to link the annotations with their corresponding images during training and evaluation.

Object annotations

The COCO format provides detailed annotations for objects present in each image. The annotations include the following information:

  • Object class: The class label of the object is represented as an integer corresponding to a specific class index. For example, if there are 80 classes in the dataset, they could be encoded as integers from 1 to 80, where each integer represents a specific object category (e.g., "person," "car," "dog," etc.).

  • Object bounding box: The bounding box annotation for each object is specified as a list of four values (x, y, width, height). The (x, y) coordinates represent the top-left corner of the bounding box, while the width and height represent the size of the bounding box in pixels.

  • Segmentation masks (Optional): For instance segmentation tasks, COCO can provide pixel-level segmentation masks for objects, outlining the exact area occupied by each object within the image.

  • Keypoint annotations (Optional): For tasks like human pose estimation, COCO can also include keypoint annotations, specifying the locations of keypoints (e.g., nose, eyes, hands) on each person.

Categories information

Information about the categories/classes present in the dataset. This file contains a list of dictionaries, each representing a unique object class with its associated category ID and name.

The COCO dataset format organizes all this information in a structured manner, typically using JSON files for annotations and categories.

Here's an example of how the COCO format annotation might look like for an image with two annotated objects (person and car):


 {
    "image_id": 12345,
    "file_name": "example_image.jpg",
    "width": 640,
    "height": 480,
    "annotations": [
        {
            "id": 1,
            "category_id": 1,
            "bbox": [100, 150, 200, 300],
            "area": 60000,
            "iscrowd": 0
        },
        {
            "id": 2,
            "category_id": 2,
            "bbox": [400, 100, 100, 150],
            "area": 15000,
            "iscrowd": 0
        }
    ]
}


In this example, the "image_id" uniquely identifies the image, and each annotation provides details about an object's class, bounding box, area, and whether it represents a crowd (0 for individual objects). The corresponding category information is stored in a separate file.

Overall, the COCO format's rich and structured annotations make it a valuable resource for training and benchmarking advanced Computer Vision models across a wide range of tasks.

Loading COCO dataset with the ‘dataset_coco’ algorithm

In the guide on How to train YOLOv8 instance segmentation on a custom dataset with the Ikomia API we saw in detail how to load your dataset using the dataset_coco module. In this example, we downloaded the coral dataset from Roboflow.

Diagramme of Ikomia workflow using dataset coco and train_yolo_v8_seg algoritm



from ikomia.dataprocess.workflow import Workflow

# Initialize the workflow
wf = Workflow()

# Add the dataset loader to load your custom data and annotations
dataset = wf.add_task(name='dataset_coco')

# Set the parameters of the dataset loader
dataset.set_parameters({
    'json_file': 'Path/To/Mesophotic Coral/Dataset/train/_annotations.coco.json',
    'image_folder': 'Path/To/Mesophotic Coral/Dataset/train',
    'task': 'instance_segmentation'
}) 

# Add the YOLOv8 segmentation algorithm
train = wf.add_task(name='train_yolo_v8_seg', auto_connect=True)


# Set the parameters of the YOLOv8 segmentation algorithm
train.set_parameters({
    'model_name': 'yolov8m-seg',
    'batch_size': '4',
    'epochs': '50',
    'input_size': '640',
    'dataset_spit_ratio': '0.8',
    'output_folder':'Path/To/Folder/Where/Model-weights/Will/Be/Saved'
}) 

# Launch your training on your data
wf.run()

  • ‘json_file’: path to the .JSON annotation file
  • ‘image_folder’: path to the folder containing the images
  • ‘task’: task performed by the training algorithm, ‘instance_segmentation’, ‘semantic_segmentation’, ‘detection’, ‘keypoints’.

Working with Pascal VOC 2012 dataset format

The Pascal VOC (Visual Object Classes) 2012 dataset format is a widely used annotation format for object detection, segmentation and image classification tasks. It was introduced by the PASCAL Visual Object Classes challenge, which was a series of annual competitions organized to advance the field of Computer Vision.

The VOC format has been adopted as a standard by the Computer Vision community and has become popular for benchmarking object detection and classification algorithms.

Key features of Pascal VOC format

Image information

Each image in the dataset is identified by a unique image identifier or filename. This identifier is used to link the annotations with their corresponding images during training and evaluation.

Object annotations

The Pascal VOC format provides detailed annotations for objects present in each image. The annotations include the following information:

  • Object class: The class label of the object is represented as a string corresponding to a specific class name (e.g., "person," "car," "dog," etc.).
  • Object bounding box: The bounding box annotation for each object is specified as a list of four values (xmin, ymin, xmax, ymax). The (xmin, ymin) coordinates represent the top-left corner of the bounding box, while the (xmax, ymax) coordinates represent the bottom-right corner of the bounding box.

Segmentation annotations (Optional)

For instance segmentation tasks, Pascal VOC can also provide pixel-level segmentation masks for objects, outlining the exact area occupied by each object within the image.

  • Metadata: The Pascal VOC format may include metadata information, such as the image size, segmentation mask encoding format, and additional attributes specific to the dataset.

The Pascal VOC dataset format is typically organized into separate XML files, one for each image in the dataset. Each XML file contains all the necessary object annotations and related details for the corresponding image.

Here's an example of how the Pascal VOC format annotation might look like for an image with two annotated objects (person and car):

example_folder example_image.jpg 640 480 3 person 100 150 300 450 car 400 100 500 250

Load Pascal VOC dataset with the ‘dataset_pascal_voc’ module

In this example we use the Personal Protective Equipment (PPE) dataset. You also need to write a classes.txt file containing the following labels:


Safe Worker
Unsafe Worker
Worker -only hat-
Worker -only vest-
hardhat
vest


Diagramme of Ikomia workflow using dataset dataset_pascal_voc and train_yolo_r


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils import ik


# Initialize the workflow
wf = Workflow()


# Add the dataset loader to load your custom data and annotations
dataset = wf.add_task(ik.dataset_pascal_voc(
                    annotation_folder= "Path/To/PPE DETECTION.v1i.voc/train",
                    dataset_folder= "Path/To/PPE DETECTION.v1i.voc/train",
                    class_file= "Path/To/classes.txt"
                        )
                    )


# Add the Yolov7 training algorithm
yolo = wf.add_task(ik.train_yolor(
                    model_name = "yolor_p6",
                    batch_size="4", 
                    epochs="10"), 
                    auto_connect=True
                    )


# Launch your training on your data
wf.run()


Parameters of the dataset_pascal_voc module: 

  • ‘annotation_folder’ (str): path to annotations folder containing the .xml files
  • ‘dataset_folder’ (str): path to folder containing the images
  • ‘instance_seg_folder’ (str) [Optional]: path to segmentation masks folder
  • ‘class_path’ (.txt): path to text file containing class names

Working with YOLO darknet dataset format

The YOLO (You Only Look Once) dataset format is a widely used format for object detection tasks, similar to the COCO format. It is designed to annotate images for training YOLO-based object detection and segmentation models. The YOLO format provides essential information about object locations, class labels, and bounding boxes required for effective model training.

Key features of YOLO format

Image information

Each image in the dataset is identified by a unique image identifier or filename. This identifier is used to associate annotations with the corresponding image during training and evaluation.

Object Information

For each object present in the image, the following information is provided:

  • Object class: The class label of the object. It is represented as an integer corresponding to the class index. For example, if there are three classes (person, car, and dog), they could be encoded as 0, 1, and 2, respectively.
  • Object center coordinates: The (x, y) coordinates of the center of the bounding box, relative to the width and height of the image. These coordinates are normalized to values between 0 and 1. For example, if the center of the object is at (250, 200) in an image with dimensions (500, 400), the normalized coordinates would be (0.5, 0.5).
  • Object width and height: The width and height of the bounding box, also normalized to values between 0 and 1, based on the image dimensions.
  • Segmentation annotations (Optional): For instance segmentation tasks, provide pixel-level segmentation masks for objects, outlining the exact area occupied by each object within the image.

The YOLO format maintains a specific order for object information in each line of the annotation file, and it is typically represented as:

For example, if an image contains two objects - a person and a car - the corresponding YOLO format annotation file might look like this:


0 0.62 0.75 0.30 0.60
1 0.40 0.30 0.20 0.40

In this example, the first object has a class index of 0 (person), and its center is at approximately (62%, 75%) of the image, with a bounding box width and height of approximately 30% and 60% of the image size, respectively.

Similarly, the second object has a class index of 1 (car), and its center is at approximately (40%, 30%) of the image, with a bounding box width and height of approximately 20% and 40% of the image size, respectively.

The YOLO format is an efficient and straightforward way to represent annotated object detection and segmentation data, enabling AI developers to train YOLO models effectively to detect and localize objects in diverse real-world scenarios.

Load YOLO Darknet dataset with the ‘dataset_yolo’ module

Here is an example of training a custom YOLOv7 model with a YOLO darknet dataset format. You can find this workflow described in detail in the guide on How to train a custom YOLOv7 model with the Ikomia API.

Diagramme of Ikomia workflow using dataset_yolo and train_yolo_v7

Using as an example the aerial airport dataset in YOLO format: each image in each folder (test, val, train), has a corresponding .txt file containing all bounding box and class information associated with airplanes.

Additionally, there is a  ’_darknet.labels’ file containing all class names. We will use the dataset_yolo module provided by Ikomia API to load the custom data and annotations.


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils import ik

# Initialize the workflow
wf = Workflow()

# Add the dataset loader to load your custom data and annotations
dataset = wf.add_task(ik.dataset_yolo(
                    dataset_folder= "path/to/aerial/dataset/train",
                    class_file= "path/to/aerial/dataset/train/_darknet.labels"
                        )
                    )

# Add the Yolov7 training algorithm
yolo = wf.add_task(ik.train_yolo_v7(
                    batch_size="4", 
                    epochs="10", 
                    output_folder="path/to/output/folder",
                        ), 
                    auto_connect=True
                    )

# Launch your training on your data
wf.run()

Working with VGG Image Annotator (VIA) dataset format

What is the VGG Image Annotator (VIA)?

The VGG Image Annotator (VIA) is a versatile and user-friendly annotation tool that facilitates manual annotation of images. While VIA is not a dataset format itself, it enables the creation and export of datasets in various formats, including JSON and CSV.

Key features of VIA:

  • Customizable annotations: VIA allows annotators to define custom annotation attributes, making it suitable for a wide range of tasks.
  • Support for multiple formats: VIA can export annotations in formats compatible with popular deep learning frameworks, making it easy to integrate into existing AI pipelines.

Load VIA (.json) dataset with the ‘dataset_via’ module

Diagramme of Ikomia workflow using dataset_via and train_mmlab_detection

We created a tiny car license plate dataset with the VGG Image Annotator. In the following example, the dataset is used to train the custom yolox model implemented in the train_mmlab_detection algorithm.


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils import ik

# Initialize the workflow
wf = Workflow()

# Add the dataset loader to load your custom data and annotations
dataset = wf.add_task(ik.dataset_via(
                    via_json_file="Path/to/via_license_plate/via_project_26Jul2023_14h53m.json"
                        )
                    )

# Add the Yolov7 training algorithm
yolo = wf.add_task(ik.train_mmlab_detection(
                    model_name = "yolox",
                    batch_size="4", 
                    epochs="10", 
                        ), 
                    auto_connect=True
                    )

# Launch your training on your data
wf.run()

Working with wildreceipt dataset format

The Wildereceipt format is specifically designed for receipt OCR (Optical Character Recognition) tasks. Receipt data is vital for various applications, including expense tracking, financial analysis, and business expense management.

Key features of Wildereceipt format

  • Text annotations: Wildereceipt datasets focus on accurately annotating the text present in receipts, including store names, dates, item names, prices, and more.
  • Key-Value pairs: The format often adopts a key-value structure, where the key corresponds to the type of text (e.g., "total," "date") and the value contains the corresponding text content.

Load wildreceipt dataset with the ‘dataset_wildreceipt’ module

Here the wildreceipt dataset, a collection of receipts, will be used to train a custom satrn text recognition model. 

Diagramme of Ikomia workflow using dataset_wildreceipt and train_mmlab_recognition


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils import ik

# Initialize the workflow
wf = Workflow()

# Add the dataset loader to load your custom data and annotations
dataset = wf.add_task(ik.dataset_wildreceipt(
                    dataset_folder= 'Path/to/dataset/wildreceipt'
                        )
                    )

# Add the training algorithm
train = wf.add_task(ik.train_mmlab_text_recognition(
                    model_name = "satrn",
                    batch_size="4", 
                    epochs="10", 
                        ), 
                    auto_connect=True
                    )

# Launch your training on your data
wf.run()

Working with a classification dataset format

A classification dataset format is a structured representation of data specifically designed for image classification tasks. In image classification, the goal is to categorize an input image into one of several predefined classes or categories.

The dataset format is essential for organizing and providing the necessary information required to train and evaluate machine learning models for image classification.

Classification datasets should follow the following structure:

diagramme of the a classification dataset structure

In this format, images are organized into separate subdirectories based on their class labels. Each subdirectory represents a specific class, and images belonging to that class are stored within that directory.

This hierarchical structure makes it easy to maintain and manage the dataset, especially when dealing with multiple classes and a large number of images.

Load a classification dataset 

There are two approaches to load a classification dataset for training your custom classification model.

The first approach involves launching your training task on the dataset by providing the folder path: wf.run_on(folder=dataset_folder). You can find more details in the case study titled "How to Train a Classification Model on a Custom Dataset with the Ikomia API."

The second approach consists in using the dataset_classification module. In this example, we use the "Rock, Paper, Scissor" dataset from Roboflow. You can download this dataset by following this link: Dataset Download Link. Please note that the "validation" folder should be renamed to "val" for proper use.

Diagramme of Ikomia workflow using dataset_classification and train_torchvision_resnet


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils import ik

# Init your workflow
wf = Workflow()

# Add the dataset loader to load your custom data and annotations
dataset = wf.add_task(ik.dataset_classification(
                            dataset_folder= "Path/To/Rock Paper Scissors.v1-raw-300x300.folder")
)

# Add the training task to the workflow
resnet = wf.add_task(ik.train_torchvision_resnet(
                                model_name="resnet18",
                                batch_size="16",
                                epochs="20",
                                output_folder="Path/To/Output/Folder"
                                ),
                            auto_connect=True
                            )

# Launch your training on your data
wf.run()

Embracing dataset diversity and simplifying workflows with Ikomia

In conclusion, the availability of various dataset formats, such as COCO, Pascal VOC, YOLO Darknet, VIA, empowers AI researchers and practitioners to work with diverse data and develop models capable of tackling real-world challenges effectively.

The Ikomia API simplifies the development of Computer Vision workflows, by providing a standardized and adaptable annotation format. The AI community can leverage a unified framework to streamline the development and deployment of AI solutions.

To learn more about the API, refer to the documentation. You may also check out the list of state-of-the-art algorithms on Ikomia HUB and try out Ikomia STUDIO, which offers a friendly UI with the same features as the API.

Arrow
Arrow
No items found.