Thursday, June 21, 2012

Tesseract OCR: Interactive Debugging Continued. Baseline Viewer

Here I'll describe a method of viewing baselines in Tesseract's interactive debug environment.

Those who use Tesseract 3.02 should first read my former post called Tesseract OCR: Setting Up Interactive Debug Environment On Windows and complete all steps from it. However instead of the installation suite mentioned there you would need another which contains updated Tess config files as Tesseract developers had renamed/removed a number of internal debug parameters since version 3.01 used in that tutorial. Download the updated suite at Version 3.01 users can still use the old installation suite.

So now that you've completed the step 5 from the former tutorial and the debug window has appeared, do the following:
  1. In the main menu choose Modes->Show BL Norm Word. No apparent reaction from the UI should follow. This is normal.
  2. Now click on any word you're interested in. A new window titled BlnWords should appear.
  3. At first sight the BlnWords window is empty. But in fact this is not true. Nothing is visible only because of the quirky scaling logic used by ScrollView. To find something inside the window you need to use window scrollbars to pan and mouse scroll wheel to scale up/down. I suggest the following sequence for initial setting of the view:
    • slowly drag down vertical scrollbar thumb until you see baselines and/or outlines,
    • move horizontal scrollbar thumb approximately to the center,
    • use mouse wheel to scale the window contents properly,
    • you may also resize the window to your taste.
  4. While you click other words in the main window the contents of the BlnWords window updates. You can adjust the view as needed using the methods described above.
What is displayed inside the BlnWords window are so called baseline normalized words. In this type of view words are shown as if their baselines (which can be curved and/or inclined in the source image) get straightened and positioned strictly horizontally. In addition to the baseline the window shows also x-height, ascender and descender lines. See more at Wikipedia: x-height. Using this view you can clearly see if a baseline found by Tesseract is right or wrong: incorrect baselines cause characters to "jump" or "fall."

Baseline finding greatly influences character classification. Various baseline-relative positions of the same character can lead to completely different recognition results. That's why incorrect baselines often serve as sources of errors in Tesseract recognition.

A few examples. Let's take the "conventional" phototest.tif file:
The main debug window should look like this:
All baselines seem to be found perfectly:
For more complex images things go worse. Here I've taken an photographic image of a restaurant receipt. In the image the receipt appears to be inclined and perspectively distorted. The paper is a bit curved, just like it usually happens with receipts. The image is precooked by my image processor (only done binarization and noise cleanup) so that Tesseract is able process it, at some degree of success.
The main debug window already shows several segmentation failures. Some characters are grayed out and some are missing completely:
In BlnWords one can see that many baselines are good but some are determined incorrectly, for instance:
Also there are some epic failures, like these (meaning that characters from adjacent rows get segmented into a single word):
So why would you want to use this debugging method? It can be of use when you're investigating the reasons of some Tesseract failure. Baseline viewer can help you to see that an additional preprocessing is required to cope with the image or a set of images, either programmatic or by means of 3rd party software such as ImageMagick. Passing image block by block (i.e. full or partial pre-segmentation) might also help. Another approach is tweaking internal Tesseract segmentation and baseline finding parameters via config files. Yet another approach is source code changes.


Unknown said...

Hi Dmitri,
Very cool article! Thanks!
I'm curious to know if Tesseract can somehow compute for you automatically the degree to rotate the image to get more accurate results? Can ImageMagick be used to do this (in addition to it's image enhancement capabilities like grayscale)? By the way, what are the ideal image enhancements needed on an image to help Tesseract's accuracy besides grayscale? Contrast?

Perhaps another approach would be to let Tesseract recognize at least one letter and figure out at what degree the letter is angled (using some image analysis techniques), and then rotating the entire image at the same degree the letter you found was rotated. What do you think?

Dmitri Silaev said...

Tess only can automatically rotate an image by 90-180-270 degrees.

Finding arbitrary angles from a single character (or few characters) seems obvious, but in fact leads to tons of complications in practice, thus rendering it highly inaccurate.

William Xue said...

I am a beginner of java. And I was struggled to find the baseline from Tess4J.
Can you tell me how to deal with it?
I use eclipse.

He Young said...

Hello Dmitri,
I doubt that the link you gave: is not available to download, would you send me a cocy. Thanks a lot :)