With the Ikomia API, we can train a custom model using various dataset formats with just a few lines of code.
To begin, you'll need to install the API within a virtual environment.
How to install a virtual environment
As artificial intelligence and machine learning continue to advance, good-quality datasets are of the highest importance. A well-annotated dataset serves as the foundation for training robust AI models that can understand real-world data (remember, "garbage in, garbage out").
To assist AI developers, various dataset formats such as COCO and Pascal VOC have emerged.
Unfortunately, there is no standard format to represent data across all AI tasks. To address this issue, Ikomia has developed its own annotation format, taking inspiration from popular frameworks like TensorFlow and PyTorch. Then we provide specific algorithms in the HUB to load dataset from a third-party format to Ikomia one.
To accommodate datasets in various formats, Ikomia provides specific dataset loader algorithms in the HUB that can load datasets from third-party formats, such as COCO, YOLO Darknet, Pascal VOC, into the Ikomia format. This system ensures seamless connectivity between any valid dataset reader and all training algorithms available on the platform.
The Ikomia format offers a flexible and consistent way to store annotations and metadata for various AI applications. This system ensures that all training algorithms can be connected behind any valid dataset reader.
Key features of Ikomia format:
Table summarizing the Ikomia dataset loaders and their compatibility with the different training tasks.
The Common Objects in Context (COCO) dataset format is a widely used and standardized format for various Computer Vision tasks, including object detection, segmentation, and keypoint detection. It was introduced by Microsoft in collaboration with several academic institutions.
The COCO dataset is well-known for its large-scale and diverse collection of images, making it valuable for training and evaluating AI models to comprehend real-world scenes and objects.
Each image in the dataset is uniquely identified by an image identifier or filename. This identifier is used to link the annotations with their corresponding images during training and evaluation.
The COCO format provides detailed annotations for objects present in each image. The annotations include the following information:
Information about the categories/classes present in the dataset. This file contains a list of dictionaries, each representing a unique object class with its associated category ID and name.
The COCO dataset format organizes all this information in a structured manner, typically using JSON files for annotations and categories.
Here's an example of how the COCO format annotation might look like for an image with two annotated objects (person and car):
In this example, the "image_id" uniquely identifies the image, and each annotation provides details about an object's class, bounding box, area, and whether it represents a crowd (0 for individual objects). The corresponding category information is stored in a separate file.
Overall, the COCO format's rich and structured annotations make it a valuable resource for training and benchmarking advanced Computer Vision models across a wide range of tasks.
In the guide on How to train YOLOv8 instance segmentation on a custom dataset with the Ikomia API we saw in detail how to load your dataset using the dataset_coco module. In this example, we downloaded the coral dataset from Roboflow.
The Pascal VOC (Visual Object Classes) 2012 dataset format is a widely used annotation format for object detection, segmentation and image classification tasks. It was introduced by the PASCAL Visual Object Classes challenge, which was a series of annual competitions organized to advance the field of Computer Vision.
The VOC format has been adopted as a standard by the Computer Vision community and has become popular for benchmarking object detection and classification algorithms.
Each image in the dataset is identified by a unique image identifier or filename. This identifier is used to link the annotations with their corresponding images during training and evaluation.
The Pascal VOC format provides detailed annotations for objects present in each image. The annotations include the following information:
For instance segmentation tasks, Pascal VOC can also provide pixel-level segmentation masks for objects, outlining the exact area occupied by each object within the image.
The Pascal VOC dataset format is typically organized into separate XML files, one for each image in the dataset. Each XML file contains all the necessary object annotations and related details for the corresponding image.
Here's an example of how the Pascal VOC format annotation might look like for an image with two annotated objects (person and car):
In this example we use the Personal Protective Equipment (PPE) dataset. You also need to write a classes.txt file containing the following labels:
Parameters of the dataset_pascal_voc module:
The YOLO (You Only Look Once) dataset format is a widely used format for object detection tasks, similar to the COCO format. It is designed to annotate images for training YOLO-based object detection and segmentation models. The YOLO format provides essential information about object locations, class labels, and bounding boxes required for effective model training.
Each image in the dataset is identified by a unique image identifier or filename. This identifier is used to associate annotations with the corresponding image during training and evaluation.
For each object present in the image, the following information is provided:
The YOLO format maintains a specific order for object information in each line of the annotation file, and it is typically represented as:
For example, if an image contains two objects - a person and a car - the corresponding YOLO format annotation file might look like this:
In this example, the first object has a class index of 0 (person), and its center is at approximately (62%, 75%) of the image, with a bounding box width and height of approximately 30% and 60% of the image size, respectively.
Similarly, the second object has a class index of 1 (car), and its center is at approximately (40%, 30%) of the image, with a bounding box width and height of approximately 20% and 40% of the image size, respectively.
The YOLO format is an efficient and straightforward way to represent annotated object detection and segmentation data, enabling AI developers to train YOLO models effectively to detect and localize objects in diverse real-world scenarios.
Here is an example of training a custom YOLOv7 model with a YOLO darknet dataset format. You can find this workflow described in detail in the guide on How to train a custom YOLOv7 model with the Ikomia API.
Using as an example the aerial airport dataset in YOLO format: each image in each folder (test, val, train), has a corresponding .txt file containing all bounding box and class information associated with airplanes.
Additionally, there is a ’_darknet.labels’ file containing all class names. We will use the dataset_yolo module provided by Ikomia API to load the custom data and annotations.
The VGG Image Annotator (VIA) is a versatile and user-friendly annotation tool that facilitates manual annotation of images. While VIA is not a dataset format itself, it enables the creation and export of datasets in various formats, including JSON and CSV.
We created a tiny car license plate dataset with the VGG Image Annotator. In the following example, the dataset is used to train the custom yolox model implemented in the train_mmlab_detection algorithm.
The Wildereceipt format is specifically designed for receipt OCR (Optical Character Recognition) tasks. Receipt data is vital for various applications, including expense tracking, financial analysis, and business expense management.
Here the wildreceipt dataset, a collection of receipts, will be used to train a custom satrn text recognition model.
A classification dataset format is a structured representation of data specifically designed for image classification tasks. In image classification, the goal is to categorize an input image into one of several predefined classes or categories.
The dataset format is essential for organizing and providing the necessary information required to train and evaluate machine learning models for image classification.
Classification datasets should follow the following structure:
In this format, images are organized into separate subdirectories based on their class labels. Each subdirectory represents a specific class, and images belonging to that class are stored within that directory.
This hierarchical structure makes it easy to maintain and manage the dataset, especially when dealing with multiple classes and a large number of images.
There are two approaches to load a classification dataset for training your custom classification model.
The first approach involves launching your training task on the dataset by providing the folder path: wf.run_on(folder=dataset_folder). You can find more details in the case study titled "How to Train a Classification Model on a Custom Dataset with the Ikomia API."
The second approach consists in using the dataset_classification module. In this example, we use the "Rock, Paper, Scissor" dataset from Roboflow. You can download this dataset by following this link: Dataset Download Link. Please note that the "validation" folder should be renamed to "val" for proper use.
In conclusion, the availability of various dataset formats, such as COCO, Pascal VOC, YOLO Darknet, VIA, empowers AI researchers and practitioners to work with diverse data and develop models capable of tackling real-world challenges effectively.
The Ikomia API simplifies the development of Computer Vision workflows, by providing a standardized and adaptable annotation format. The AI community can leverage a unified framework to streamline the development and deployment of AI solutions.
To learn more about the API, refer to the documentation. You may also check out the list of state-of-the-art algorithms on Ikomia HUB and try out Ikomia STUDIO, which offers a friendly UI with the same features as the API.