Master Video Object Detection with YOLOv7

Allan Kouidri
-
7/10/2023
Yolov7 video object detection on an webcam image of a person sitting of her office desk

In this blog post, we will outline the essential steps for achieving real-time video object detection using the Ikomia API alongside your webcam.

The Ikomia API enables you to utilize a ready-to-use detection model for real-time video object detection in a video stream captured from your camera. To begin, you'll need to install the API within a virtual environment.

How to install a virtual environment


pip install ikomia

API documentation

API repo

Running YOLOv7 algorithm on your webcam using Ikomia API

Alternatively, you can directly access the open-source notebook that we have prepared.


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils import ik
from ikomia.utils.displayIO import display
import cv2


stream = cv2.VideoCapture(0)

# Init the workflow
wf = Workflow()

# Add color conversion
cvt = wf.add_task(ik.ocv_color_conversion(code=str(cv2.COLOR_BGR2RGB)), auto_connect=True)

# Add YOLOv7 detection
yolo = wf.add_task(ik.infer_yolo_v7(conf_thres="0.7"), auto_connect=True)

while True:
    ret, frame = stream.read()
    
    # Test if streaming is OK
    if not ret:
        continue

    # Run workflow on image
    wf.run_on(frame)

    # Display results from "yolo"
    display(
        yolo.get_image_with_graphics(),
        title="Object Detection - press 'q' to quit",
        viewer="opencv"
    )

    # Press 'q' to quit the streaming process
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# After the loop release the stream object
stream.release()

# Destroy all windows
cv2.destroyAllWindows()


Camera stream processing 

Camera stream processing involves the real-time analysis and manipulation of images and video streams captured from a camera. This technique finds widespread application in diverse fields such as Computer Vision, surveillance, robotics, and entertainment.

In Computer Vision, camera stream processing plays a pivotal role in tasks like object detection and recognition, face detection, motion tracking, and image segmentation.

  • For surveillance purposes, camera stream processing aids in detecting anomalies and events such as intrusion detection and crowd behavior analysis.
  • In the realm of robotics, camera stream processing facilitates autonomous navigation, object detection, and obstacle avoidance.
  • The entertainment industry leverages camera stream processing for exciting applications like augmented reality, virtual reality, and gesture recognition.

Camera stream processing assumes a critical role across various domains, enabling the realization of numerous exciting applications that were once considered unattainable.

To embark on camera stream processing, we will make use of OpenCV and VideoCapture with the YOLOv7 algorithm.

YoloV7 detection (Original photo by Gustavo Juliette)

Step by step: camera stream processing for video object detection using Ikomia API

Here are the detailed steps followed in the first code snippet with all parameters explained.

Step 1: import dependencies


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils import ik
from ikomia.utils.displayIO import display
import cv2

  • The ‘Workflow’ class is the base object for creating a workflow. It provides methods for setting inputs (image, video, directory), configuring task parameters, obtaining time metrics, and retrieving specific task outputs, such as graphics, segmentation masks, and texts.
  • ik’ is an auto-completion system designed for convenient and easy access to algorithms and settings.
  • The ‘display’ function offers a flexible and customizable way to display images (input/output) and graphics, such as bounding boxes and segmentation masks.
  • ‘cv2’ corresponds to the popular OpenCV library.

Step 2: define the video stream

Initialize a video capture object to retrieve frames from a camera device. Use the following code:


stream = cv2.VideoCapture(0)

The parameter `0` passed to VideoCapture indicates that you want to capture video from the default camera device connected to your system. If you have multiple cameras connected, you can specify a different index to capture video from a specific camera (e.g., `1` for the second camera), or you can give the path to a video. 

Step 3: create workflow

We initialize a workflow instance using the following code:


wf = Workflow()

The ‘wf’ object can then be used to add tasks to the workflow instance, configure their parameters, and run them on input data.

Step 4: add the OpenCV color conversion algorithm


cvt = wf.add_task(ik.ocv_color_conversion(code=str(cv2.COLOR_BGR2RGB)), auto_connect=True)

By default, OpenCV uses the BGR color format, whereas Ikomia works with RGB images. To display the image output with the right colors, we need to flip the blue and red planes.

Step 5: add the YOLOv7 Object Detection Model 

Add the ‘infer_yolo_v7’ task, setting the pre-trained model and the confidence threshold parameter using the following code:


yolo = wf.add_task(ik.infer_yolo_v7(model_name='yolov7', conf_thres="0.7"), auto_connect=True)

Step 6:  run the workflow on the stream

We read the frames from a video stream using a continuous loop. If there is an issue reading a frame, it skips to the next iteration. 

It then runs the workflow on the current frame and displays the results using OpenCV. The displayed image includes graphics generated by the "YOLO" object detection system. 

The displayed window allows the user to quit the streaming process by pressing the 'q' key. If the 'q' key is pressed, the loop is broken, and the streaming process ends.


while True:
    ret, frame = stream.read()
    
    # Test if streaming is OK
    if not ret:
        continue


    # Run workflow on image
    wf.run_on(frame)


    # Display results from "yolo"
    display(
        yolo.get_image_with_graphics(),
        title="Object Detection - press 'q' to quit",
        viewer="opencv"
    )


    # Press 'q' to quit the streaming process
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

Step 7: end the video stream

After the loop, release the stream object and destroy all windows created by OpenCV. 


# After the loop release the stream object
stream.release()

# Destroy all windows
cv2.destroyAllWindows()

Perform real-time video object detection from your own video stream

By leveraging Ikomia API, developers can streamline the creation of Computer Vision workflows and explore various parameters to attain the best possible outcomes.

For additional insights into the API, we recommend referring to the comprehensive documentation. Additionally, you can explore the selection of cutting-edge algorithms available on Ikomia HUB and experiment with Ikomia STUDIO, a user-friendly interface that encompasses the same functionality as the API. Take advantage of these resources to further enhance your Computer Vision endeavors.

Source of the illustration image:  Photo by Drazen Zigic.

Arrow
Arrow
No items found.