Revolutionizing Object Counting with T-Rex

Allan Kouidri
-
12/27/2023
T-Rex segmentation and counting of a group of bird

What is T-Rex?

T-Rex stands is a pioneering interactive object counting model. Its primary function is to detect and count objects in a given visual field, a task it accomplishes with remarkable precision and flexibility. 

The model's distinctive features include:

  • Open-Set Capability: Unlike many of its counterparts, T-Rex is not limited to predefined categories. It has the remarkable ability to count any object, offering a broad range of applications.
  • Visual Prompting: Users can directly influence the counting process by providing visual examples. This feature enhances the model's accuracy and adaptability to specific tasks.
  • Intuitive Visual Feedback: T-Rex employs a detection-based approach, which includes visual feedback like detected boxes. This allows users to easily verify and assess the accuracy of the results.
  • Interactive Nature: The model's interactive design lets users participate in the counting process, offering opportunities to correct errors and refine results.

T-Rex is object counting
T-Rex is object counting model distinguished by four key features: it is detection-based, visually promptable, interactive, and open-set in nature [1].

How T-Rex works?

T-Rex incorporates various workflows to facilitate interactive object counting and detection:

  • Positive-only prompt mode: In this mode, T-Rex identifies and counts similar objects with a simple click or box drawing. Users can add additional prompts for more complex scenarios like densely packed or small objects.
  • Positive with negative prompt mode: This mode is particularly useful for correcting false detections. Users can add negative prompts to falsely-detected objects, enhancing the accuracy of the results.
  • Cross image prompt mode: An innovative feature, this mode enables counting across different images. By prompting on a reference image, T-Rex can detect objects in other target images. This feature is especially useful for automatic annotation, although it is still under development.

T-Rex provides three major interactive workflows
T-Rex provides three major interactive workflows, designed to be versatile and applicable across a wide range of real-world scenarios. [1]

Overview to the T-Rex model

T-Rex functions as a detection-based model with three main components:

  • Image Encoder: This extracts image features from both the target image and optionally a reference image. In cases where there is no separate reference image, the target image itself serves as the reference.
  • Prompt Encoder: Utilizing user-drawn boxes or points as prompts on the reference image, this encoder extracts the encoded visual prompt from the reference image feature.
  • Box Decoder: This component combines the target image feature with the encoded visual prompt, resulting in detected boxes along with their confidence scores. A predetermined score threshold is then applied to filter these boxes, with the remaining ones being counted to determine the final object count.

Overview to the T-Rex model
[1]

Performance evaluation of T-Rex

T-Rex's performance is its combination of zero-shot counting excellence and unparalleled adaptability.

Exceptional zero-shot counting and adaptability

T-Rex demonstrates outstanding proficiency in zero-shot counting, setting a new benchmark in this area. Its adaptability across various domains further enhances its appeal. T-Rex consistently outperforms other leading models like Grounding DINO and GPT-4V.

This superior performance is attributed to its excellent zero-shot counting abilities, which enable it to accurately count objects it has never encountered before in training datasets.

Benchmark setting in diverse domains

The adaptability of T-Rex is particularly noteworthy. Unlike models that are constrained to specific categories or settings, T-Rex can be applied to an extensive range of domains. Whether it's in complex industrial settings, crowded urban landscapes, or intricate biological environments, T-Rex maintains its high accuracy and reliability.

Mean Average Error (MAE) on detecting and counting objects in the image
Mean Average Error (MAE) on detecting and counting objects in the image. [1]

Applications of T-Rex

T-Rex's versatility allows it to be applied across various domains, including but not limited to:

  • Agriculture, Industry, and Livestock: For counting and monitoring purposes.
  • Biology and Medicine: Useful in research and diagnostics.
  • Retail, Electronic, and Transportation: Helpful in inventory and logistic management.
  • Human-related Applications: Can be used for crowd counting and monitoring.

As an open-set object detector, T-Rex is exceptionally useful for automatic annotation, particularly in dense and overlapping scenes. Its zero-shot detection capability makes it a powerful tool in scenarios where predefined object categories are either unavailable or insufficient.

Conclusion and Future Perspectives

The ability of T-Rex to adapt and perform with high accuracy in zero-shot counting scenarios marks a significant leap forward in the field of object detection and counting. Its success heralds a new era of intelligent, adaptable, and user-friendly machine learning models that can cater to a wide array of industries and applications.

Integration with Ikomia API

An exciting development in this realm is the integration of such models with the Ikomia API. This API serves as a gateway to utilizing advanced models like Grounding DINO and SAM. For those keen on utilizing Grounding DINO or SAM, the Ikomia API provides a seamless and user-friendly platform to do so:

Guide to Segment Anything Model (SAM) →

Explore Grounding Dino zero-shot detection model →

References

‍[1] T-Rex: Counting by Visual Prompting

Arrow
Arrow
No items found.