Master Text Extraction with MMOCR: A Comprehensive Guide

‍

MMOCR shines as a top-tier Optical Character Recognition (OCR) toolbox, especially within the Python community. For those unfamiliar with the platform, navigating through its documentation and installation steps can be somewhat daunting.

‍

In this article, we'll guide you step-by-step, highlighting the essentials and addressing potential challenges of using the MMOCR inferencer, a specialized wrapper for OCR tasks.

‍

Then, we'll present a simplified method to harness MMOCR's capabilities via the Ikomia API.

Go to notebook

Go to Colab

‍

Dive in and elevate your OCR projects!

‍

MMOCR: the Optical Character Recognition toolbox

Optical Character Recognition (OCR) technology has come a long way. From the early days of digitizing printed text on paper to the modern applications that can read text from almost any surface, OCR has emerged as a pivotal element in the advancement of information processing.

‍

One of the latest advancements in this domain is MMOCR—a comprehensive toolset for OCR tasks.

‍

What is MMOCR?

MMOCR is an open-source OCR toolbox based on PyTorch and is an integral component of the OpenMMLab initiative.

‍

OpenMMLab is known for its commitment to pioneering Computer Vision tools such as MMDetection, and MMOCR is their answer to the growing demands of the OCR community.

‍

The toolbox offers a variety of algorithms for different OCR tasks, including text detection, recognition, and key information extraction.

‍

Features and advantages of MMOCR

MMOCR, backed by the extensive resources and expertise of OpenMMLab, brings a multitude of features to the table. Let's explore some of the standout attributes:

‍

Wide Range of Algorithms

As of September 2023, MMOCR supports 8 state-of-the-art algorithms for text detection and 9 for text recognition. This ensures users have the flexibility to choose the best approach for their specific use case.

‍

Modularity

MMOCR is designed with modularity in mind. This means users can easily integrate different components from various algorithms to create a custom OCR solution tailored to their needs.

‍

End-to-end Capabilities

For users who need a comprehensive solution, MMOCR offers end-to-end OCR systems that combine both text detection and recognition.

‍

Training and Benchmarking Tools

MMOCR is not just about inference. The toolbox provides utilities for training new models, as well as benchmarking them against standard datasets.

‍

Community and Ecosystem

Being part of the MMDetection ecosystem, MMOCR benefits from a robust community of researchers and developers. This ensures regular updates, improvements, and access to the latest advancements in the field.

‍

Applications of MMOCR

The flexibility and power of MMOCR make it suitable for a range of applications:

Document digitization: Converts printed or handwritten documents into machine-readable format.
License plate recognition: Automatically reads and recognizes vehicle license plates.
Retail: Recognizes product labels, barcodes, and price tags.
Mobile applications: Integrates OCR capabilities into mobile apps for real-time text reading.
Augmented reality: Overlays digital information on physical text in AR applications.

‍

Getting Started with MMOCR

For this section, we will navigate through the MMOCR documentation for text extraction from documents. Before jumping in, we recommend that you review the entire process as we encountered some steps that were problematic.

‍

Prerequisite

OpenMMLab suggests specific Python and PyTorch versions for optimal results:

Linux - Windows - macOS
Python 3.7
PyTorch 1.6 or higher
torchvision 0.7.0

For this demonstration, we used a Windows setup.

‍

Environment setup

Setting up a working environment begins with creating a Python virtual environment and then installing the Torch dependencies.

‍

Creating the virtual environment

Though the prerequisites suggest Python 3.7, the code example provided in MMOCR documentation employs Python 3.8. We chose to follow the recommended Python version:


python -m virtualenv openmmlab  --python=python3.7

If you're unfamiliar with virtual environments, here's a guide on setting one up.

‍

Installing Torch and Torchvision

After activating the 'openmmlab' virtual environment, we proceed to install the PyTorch dependencies:


pip install torch==1.6.0 torchvision==0.7.0

However, we encountered an error, as pip could not locate a compatible distribution.


pip install torch==1.7.0 torchvision==0.7.0

‍

This attempt was met with the 'invalid wheel' error.

‍

To resolve this, we consulted the official PyTorch documentation, which suggested the following versions: torch 1.7.1 and torchvision 0.8.2. However, the torchvision version recommended by the documentation, 0.7.0, is not compatible with torch 1.7.1.

‍

Installing MMlab dependencies

Then the weinstall the following dependencies: MMEngine, MMCV and MMDetection using MIM.


pip install -U openmim
mim install mmengine
mim install mmcv
mim install mmdet

While 'openmim' and 'mmengine' installed smoothly, the'mmcv' installation was prolonged, taking about 35 minutes.

Subsequently, we installed 'mmocr' from the source, as per the recommendation:


git clone https://github.com/open-mmlab/mmocr.git
cd mmocr
pip install -v -e .

‍

Inference

We evaluated our setup using a sample image, applying DBNet for text detection and CRNN for text recognition:


from mmocr.apis import MMOCRInferencer
ocr = MMOCRInferencer(det='DBNet', rec='CRNN')
ocr('demo/demo_text_ocr.jpg', show=True, print_result=True)

‍

To evaluate MMOCR's performance further, we executed OCR on an invoice image using the DBNetpp model for text detection and ABINet_Vision for text recognition:


from mmocr.apis import MMOCRInferencer
ocr = MMOCRInferencer(det='DBNetpp', rec='ABINet_Vision')
ocr('demo/invoice.png', show=True, print_result=True)

‍

The MMOCR Inferencer experience: OCR integration in under an hour.

MMOCRInferencer serves as a user-friendly interface for OCR, integrating text detection and text recognition. From the initial setup to obtaining inference results, the entire process took approximately 50 minutes. Notably, the 'mmcv' compilation and installation took up a significant chunk of this time.

‍

Given the numerous dependencies required to run MMOCR, we encountered outdated dependencies and subsequent conflicts—issues all too familiar to Python developers.

‍

As library versions rapidly evolve, the challenges we encountered here may change, potentially improving or worsening in the coming weeks or months.

‍

In the following section, we'll demonstrate how to simplify the installation and usage of MMOCR via the Ikomia API, significantly reducing both the steps and time needed to execute your OCR tasks.

‍

Easier MMOCR text extraction with a Python API

With the Ikomia team, we've been working on a prototyping tool to avoid and speed up tedious installation and testing phases.

‍

We wrapped it in an open source Python API. Now we're going to explain how to use it to extract text with MMOCR in less than 10 minutes instead of 50.

‍

If you have any questions, please join our Discord.

‍

Environment setup

As usual, we will use a virtual environment.

Then the only thing you need to install is Ikomia API:


pip install ikomia

‍

OCR inference

You can also charge directly the open-source notebook we have prepared.

Go to notebook

Go to Colab


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display


# Init your workflow
wf = Workflow()

# Add text detection algorithm
text_det = wf.add_task(name="infer_mmlab_text_detection", auto_connect=True)

# Add text recognition algorithm
text_rec = wf.add_task(name="infer_mmlab_text_recognition", auto_connect=True)

# Run the workflow on image
wf.run_on(url="https://github.com/open-mmlab/mmocr/blob/main/demo/demo_text_ocr.jpg?raw=true")

# Display results
img_output = text_rec.get_output(0)
recognition_output = text_rec.get_output(1)
display(img_output.get_image_with_mask_and_graphics(recognition_output), title="MMLAB text recognition")

By default this workflow uses DBNet for text detection and SATRN for text recognition.

‍

To adjust the algorithm parameters, consult the HUB pages, a multi-framework library documenting all available algorithms in Ikomia API:

- infer_mmlab_text_detection

- infer_mmlab_text_recognition

‍

Fast MMOCR execution: from setup to results in just 8 minutes

To carry out OCR, we simply installed Ikomia and ran the workflow code snippets. All dependencies were seamlessly handled in the background. By using a pre-compiled mmcv. We progressed from setting up a virtual environment to obtaining results in approximately 8 minutes.

‍

Easily run OCR using Ikomia STUDIO

For those of you who prefer working with a user interface, the Ikomia API is also available as a desktop app called STUDIO. The functioning and results are the same as with the API, with a drag and drop interface.

‍

Crafting production-ready Computer Vision applications with ease

Real-world text extraction applications often necessitate a broader spectrum of algorithms beyond just text detection and extraction. This includes tasks like document detection/segmentation, rotation, and key information extraction.

‍

One of the standout benefits of the API, aside from simplifying dependency installations, is its innate ability to seamlessly interlink algorithms from diverse frameworks, including frameworks like OpenMMLab, YOLO, Hugging Face, and Detectron2.

‍

Deployment is often a big hurdle for Python developers. To bridge this gap, we introduce SCALE, a user-centric SaaS platform. SCALE is designed to deploy all your Computer Vision endeavors, eliminating the need for specialized MLOps knowledge.

‍