Train YOLOv9: Custom Object Detection Made Easy

‍

In this guide, we will walk you through the entire process of training a YOLOv9 model using a custom dataset. This comprehensive tutorial will specifically demonstrate training a vision model to recognize basketball players on a court, but the principles and methods can be applied to any dataset you choose. Whether you are new to YOLO models or looking to upgrade your skills to YOLOv9, this guide will provide you with the necessary steps and insights.

‍

⭐ Follow along with this guide using the train YOLOv9 notebook 📃:

Go to notebook

Go to Colab

‍

What is YOLOv9?

With the continuous evolution of computer vision technologies, YOLOv9 emerges as the latest advancement, developed by Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao.

‍

This trio of researchers has a rich history in the field, having contributed to the development of preceding models such as YOLOv4, YOLOR, and YOLOv7. YOLOv9 not only continues the legacy of its predecessors but also introduces significant innovations that set new benchmarks in object detection capabilities.

‍

YOLOv9 is an advanced object detection model that represents a significant leap forward in computer vision technology. It is the latest iteration in the "You Only Look Once" (YOLO) series, known for its high speed and accuracy in detecting objects in images.

‍

YOLOv9 stands out due to its incorporation of Programmable Gradient Information (PGI) and the introduction of the Generalized Efficient Layer Aggregation Network (GELAN), two groundbreaking innovations designed to enhance model performance and efficiency.

‍

By integrating these advanced features, YOLOv9 ensures more precise control over gradients during the training process, improving learning outcomes and model accuracy. This makes YOLOv9 an excellent choice for tasks requiring high-speed and high-accuracy object detection.

‍

Accuracy and Performance

YOLOv9 performance comparison on MS COCO dataset — Comparisons of the real-time object detectors on MSCOCO dataset [1]

‍

The YOLOv9 model is available in four variants, categorized based on their parameter count:

‍

Model	size (pixels)	AP^val	AP₅₀^val	AP₇₅^val	Params (M)	FLOPs (G)
YOLOv9-S	640	46.8%	63.4%	50.7%	7.1	26.3
YOLOv9-M	640	51.4%	68.1%	56.1%	20.0	76.3
YOLOv9-C	640	53.0%	70.2%	57.8%	25.3	102.1
YOLOv9-E	640	55.6%	72.8%	60.6%	57.3	189.0

‍

As of the latest update, the weights for the YOLOv9-S and YOLOv9-M models remain unpublished. The differentiation in model sizes caters to a range of application needs, from lightweight models for edge devices to more comprehensive models for high-performance computing environments.

‍

In terms of performance, YOLOv9 sets a new standard in the field of object detection. The smallest model configuration, despite its limited size, achieves an impressive 46.8% AP (Average Precision) on the MS COCO dataset's validation set. Meanwhile, the largest model variant, v9-E, boasts a remarkable 55.6% AP, establishing a new state-of-the-art benchmark for object detection performance.

‍

This leap in accuracy demonstrates the effectiveness of YOLOv9's innovative optimization strategies.

‍

Architecture and Innovations

The YOLOv9 architecture introduces a significant advancement in the field of object detection by incorporating Programmable Gradient Information (PGI) and a new network architecture called Generalized Efficient Layer Aggregation Network (GELAN). Here's an explanation of these key components:

‍

Programmable Gradient Information (PGI)

PGI is a novel concept aimed at addressing the challenge of data loss within deep neural networks. In traditional architectures, as information passes through multiple layers, some of it gets lost, leading to less efficient learning and model performance. PGI allows for more precise control over the gradients during the training process, ensuring that critical information is preserved and utilized more effectively. This leads to improved learning outcomes and model accuracy.

‍

Generalized Efficient Layer Aggregation Network (GELAN)

GELAN represents a significant innovation within the YOLOv9 architecture. It is designed to enhance the model's performance and efficiency by optimizing how different layers in the network aggregate and process information. The key focus of GELAN is to maximize parameter utilization, ensuring that the model can achieve higher accuracy without a proportional increase in computational resources or model size.

‍

This architecture allows YOLOv9 to tackle object detection tasks with greater precision and efficiency, setting a new benchmark in the performance of deep learning models for computer vision.

‍

The combination of PGI and GELAN in YOLOv9 represents a holistic approach to improving the learning capabilities of neural networks, focusing not just on the depth or width of the model, but also on how effectively it can learn and retain information throughout the training process.

‍

This leads to a model that is not only highly accurate but also efficient in terms of computational resources, making it suitable for a wide range of applications from edge devices to cloud-based systems.

‍

Easily train YOLOv9 on a custom dataset

The Ikomia API allows to train and infer YOLOv9 object detector with minimal coding.

‍

Setup

To begin, it's important to first install the API in a virtual environment [3]. This setup ensures a smooth and efficient start to using the API's capabilities.


pip install ikomia

‍

Dataset

For this tutorial, we're using a Basketball dataset [4] from Roboflow with 539 images to illustrate the training of our custom YOLOv9 object detection model. The dataset contains nine labels:

Real Physical Objects: Player, Referee, Hoop, Ball
TV Screen Information: Team Name, Team Points, Time Remaining, Period, Shot Clock

‍

These labels encompass both tangible objects on the basketball court and digitally overlaid information typically displayed on a TV screen, offering a comprehensive approach to object detection within the context of a basketball game.

‍

Train YOLOv9 with a few lines of code

You can also directly charge the notebook we have prepared.

Go to notebook

Go to Colab


from ikomia.dataprocess.workflow import Workflow
import os


#----------------------------- Step 1 -----------------------------------#
# Create a workflow which will take your dataset as input and
# train a YOLOv9 model on it
#------------------------------------------------------------------------#
wf = Workflow()

#----------------------------- Step 2 -----------------------------------#
# First you need to convert the COCO format to IKOMIA format.
# Add an Ikomia dataset converter to your workflow.
#------------------------------------------------------------------------#

dataset = wf.add_task(name="dataset_coco")

dataset.set_parameters({
    "json_file":"Path/To/Dataset/train/_annotations.coco.json",
    "image_folder":"Path/To/Dataset/train",
    "task":"detection",
    "output_folder":os.getcwd()+"/dataset"
})

#----------------------------- Step 3 -----------------------------------#
# Then, you want to train a YOLOv9 model.
# Add YOLOv9 training algorithm to your workflow
#------------------------------------------------------------------------#

train = wf.add_task(name="train_yolo_v9", auto_connect=True)
train.set_parameters({
    "model_name":"yolov9-c",
    "epochs":"50",
    "batch_size":"8",
    "train_imgsz":"640",
    "test_imgsz":"640",
    "dataset_split_ratio":"0.8",
    "output_folder":os.getcwd(),
}) 

#----------------------------- Step 4 -----------------------------------#
# Execute your workflow.
# It automatically runs all your tasks sequentially.
#------------------------------------------------------------------------#
wf.run()

Here are the configurable parameters :

‍

model_name (str) - default 'yolov9-c': Model architecture to be trained. Should be one of :

- yolov9-s

- yolov9-m

- yolov9-c

- yolov9-e

train_imgsz (int) - default '640': Size of the training image.
test_imgsz (int) - default '640': Size of the eval image.
epochs (int) - default '50': Number of complete passes through the training dataset.
batch_size (int) - default '8': Number of samples processed before the model is updated.
dataset_split_ratio (float) – default '0.9': Divide the dataset into train and evaluation sets ]0, 1[.
output_folder (str, optional): Path to where the model will be saved.
config_file (str, optional): Path to hyperparameters configuration file .yaml.
dataset_folder (str, optional): Path to where the re-formatted dataset will be saved.
model_weight_file (str, optional): Path to pretrained model weights. Can be used to fine tune a model.

‍

The training process for 50 epochs was completed in approximately 50 mins using an NVIDIA L4 24GB GPU.

‍

Once your model has completed its training phase, you can assess the performance by analyzing the graphs produced by the YOLOv9 training process. These visualizations represent various metrics that are crucial for understanding the effectiveness of your object detection model.

‍

In summary, these plots suggest that the model has learned and improved its ability to detect and classify objects as training progressed. The high precision along with increasing recall and mAP values are indicative of a well-performing model. However, we can see that the model would have benefited from being trained for longer.

‍

Run your fine-tuned YOLOv9 model

We can test our custom model using the ‘infer_yolo_v9’ algorithm. While by default the algorithm uses the COCO pre-trained Yolov9-c model, we can apply our trained model by specifying the 'model_weight_file' and 'class_file' parameters accordingly.


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display

# Create your workflow for YOLO inference
wf = Workflow()

# Add the YOLOv9 algorithm to your workflow
yolov9 = wf.add_task(name="infer_yolo_v9", auto_connect=True)

yolov9.set_parameters({
    "model_weight_file":"Path/To/[Timestramp]/weights/best.pt",
    "class_file":"Path/To/[Timestramp]/classes.yaml",
    "conf_thres":"0.3",
    "iou_thres":"0.25"
})

# Run on your image
wf.run_on(url="https://pbs.twimg.com/ext_tw_video_thumb/1660454979298115585/pu/img/A_Jrl2uawkkDi_Kf.jpg")
# wf.run_on(path=os.getcwd()+"/test/youtube-128_jpg.rf.2723e31eec77e1ff7b73c45c625082f6.jpg")

# Get the object detection image output
img_bbox = yolov9.get_image_with_graphics()

# Display
display(img_bbox)

Inference trained YOLOv9 model on Basket court

Our trained model successfully identified players, referees, and hoop, team point, period and time remaining.

‍

Build your own Computer Vision workflow

Consult the documentation for detailed API information.
Access the latest State-of-the-art algorithms via Ikomia HUB.
Use Ikomia STUDIO for an intuitive experience with these technologies.

‍

FAQs

How do I train YOLOv9 on a custom dataset?

To train YOLOv9 on a custom dataset, you need to set up the appropriate environment, prepare your dataset, and use the Ikomia API for training. Detailed steps and code examples are provided in the guide above. Your custom dataset can be loaded in different format, such as YOLO and COCO dataset format.

‍

What are the key features of YOLOv9?

Key features of YOLOv9 include high accuracy, the incorporation of PGI for better gradient management, and GELAN for optimized layer aggregation. These innovations make YOLOv9 highly efficient and effective for various object detection tasks.

‍

Can I use YOLOv9 for real-time object detection?

Yes, YOLOv9 is designed for real-time object detection. Its high speed and accuracy make it suitable for applications requiring quick and reliable object detection.

‍

Can YOLOv9 be fine-tuned with pre-trained weights?

Yes, YOLOv9 can be fine-tuned using pre-trained weights. This is useful for adapting the model to specific tasks or improving performance on a new dataset.

‍

What hardware is recommended for training YOLOv9?

Training YOLOv9 is more efficient on a GPU. We recommend at least 6Gb GPU VRAM.

‍

How do I evaluate the performance of my YOLOv9 model?

After training, you can evaluate your YOLOv9 model by analyzing the graphs produced during the training process. Metrics such as precision, recall, and mAP are crucial for assessing model performance.

‍