Thursday, June 21, 2012

Tesseract OCR: Interactive Debugging Continued. Baseline Viewer

Here I'll describe a method of viewing baselines in Tesseract's interactive debug environment.

Those who use Tesseract 3.02 should first read my former post called Tesseract OCR: Setting Up Interactive Debug Environment On Windows and complete all steps from it. However instead of the installation suite mentioned there you would need another which contains updated Tess config files as Tesseract developers had renamed/removed a number of internal debug parameters since version 3.01 used in that tutorial. Download the updated suite at http://www.4shared.com/zip/FnP8RSu0/tess_debug_3_02.html. Version 3.01 users can still use the old installation suite.

So now that you've completed the step 5 from the former tutorial and the debug window has appeared, do the following:
  1. In the main menu choose Modes->Show BL Norm Word. No apparent reaction from the UI should follow. This is normal.
  2. Now click on any word you're interested in. A new window titled BlnWords should appear.
  3. At first sight the BlnWords window is empty. But in fact this is not true. Nothing is visible only because of the quirky scaling logic used by ScrollView. To find something inside the window you need to use window scrollbars to pan and mouse scroll wheel to scale up/down. I suggest the following sequence for initial setting of the view:
    • slowly drag down vertical scrollbar thumb until you see baselines and/or outlines,
    • move horizontal scrollbar thumb approximately to the center,
    • use mouse wheel to scale the window contents properly,
    • you may also resize the window to your taste.
  4. While you click other words in the main window the contents of the BlnWords window updates. You can adjust the view as needed using the methods described above.
What is displayed inside the BlnWords window are so called baseline normalized words. In this type of view words are shown as if their baselines (which can be curved and/or inclined in the source image) get straightened and positioned strictly horizontally. In addition to the baseline the window shows also x-height, ascender and descender lines. See more at Wikipedia: x-height. Using this view you can clearly see if a baseline found by Tesseract is right or wrong: incorrect baselines cause characters to "jump" or "fall."

Baseline finding greatly influences character classification. Various baseline-relative positions of the same character can lead to completely different recognition results. That's why incorrect baselines often serve as sources of errors in Tesseract recognition.

A few examples. Let's take the "conventional" phototest.tif file:
The main debug window should look like this:
All baselines seem to be found perfectly:
For more complex images things go worse. Here I've taken an photographic image of a restaurant receipt. In the image the receipt appears to be inclined and perspectively distorted. The paper is a bit curved, just like it usually happens with receipts. The image is precooked by my image processor (only done binarization and noise cleanup) so that Tesseract is able process it, at some degree of success.
The main debug window already shows several segmentation failures. Some characters are grayed out and some are missing completely:
In BlnWords one can see that many baselines are good but some are determined incorrectly, for instance:
Also there are some epic failures, like these (meaning that characters from adjacent rows get segmented into a single word):
So why would you want to use this debugging method? It can be of use when you're investigating the reasons of some Tesseract failure. Baseline viewer can help you to see that an additional preprocessing is required to cope with the image or a set of images, either programmatic or by means of 3rd party software such as ImageMagick. Passing image block by block (i.e. full or partial pre-segmentation) might also help. Another approach is tweaking internal Tesseract segmentation and baseline finding parameters via config files. Yet another approach is source code changes.
more >>

Monday, March 7, 2011

Visual C++ 2010: Detecting Memory Leaks For Global Variables

This article is a hint for those who feel desperate with finding the origin of a memory leak in their programs. I suppose you've already read the Microsoft's documentation on this topic - Finding Memory Leaks Using the CRT Library - and set up your project to enable leak detection.

The documentation states that if you did everything properly, the Output window should display all the leaked memory blocks along with source file names and line numbers. However sometimes it's not the case and there's nothing shown except a bare memory address and allocation size. And you end up staring at these numbers, puzzling over what you could do wrong with leak detection setup, placing exit()s all over the code and trying to understand logically where those damned leaks could originate from.

Actually this might be not your fault that you don't see line numbers. To be precise, it's not your fault but it's a problem of your project's design. Probably your project uses many global variables which get initialized long before source line number tracking is in effect and thus in the end debug CRT detects leaks but cannot report line numbers. Global variable usage is unavoidable in large and complex projects so we need some method to fix such leaks.

The good news is that such method exists. The bad news is that it involves much manual work. But at least it works.
  • First you'll need to make a whole program run to get the entire memory leak report in the Output window. Copy the memory leak report somewhere, e.g. to a Notepad window, as later you'll need these numbers in curly braces called memory allocation numbers.
  • Open the crt0dat.c file in Visual Studio. I assume that during the Visual C++ installation you had chosen the default folder, so that file should be located in "C:\Program Files\Microsoft Visual Studio 10.0\VC\crt\src".
  • Within the crt0dat.c file, search for the following string (no quotes): "__cdecl _initterm_e". Place a breakpoint at the first statement of the _initterm_e() function.
  • Run your program again. The execution stops at your breakpoint. Now go to the Watch window and type "_crtBreakAlloc" (no quotes) in the Name column. In the Value column most probably you'll see -1.
  • Disable the breakpoint in crt0dat.c. You won't need it to be hit again during this program run.
  • Now get back to your Notepad window and copy to the clipboard the first of the memory allocation numbers in curly braces. Go to the Watch window in Visual Studio and in the Value column replace the value shown with the value in the clipboard. Press Enter.
  • Resume the execution by hitting F5 or choosing Continue from the menu. After a while Visual Studio should display a message that reads "<YourProgram> has triggered a breakpoint" and stops at some location within the CRT debug code, most likely in the dbgheap.c file.
  • Now go to the Call Stack window and scan it from the top to the bottom until you find a function that is known to be written by you. Now you can conclude on what can be the reason of the memory leak. It might turn so that at the top or in the middle of the call stack there are gray lines containing only addresses. This means that symbol information is absent for some libraries used in your project. Hit Shift-F11 until you get rid of gray lines before the known functions. Ignore "No source code available" messages if they appear and keep hitting Shift-F11.
  • Once you finished your investigation with one of the leaks, you may continue with another without restarting the program. Just get the next memory allocation number from the previous memory leak report, paste it into the Value column for _crtBreakAlloc in the Watch window and hit F5. Investigate the cause of the leak. Repeat these steps until you examine all leaks you have. This works thanks to that memory leaks reported in the same order as the corresponding memory allocations happen.
Why do we need a breakpoint inside crt0dat.c? Because need to capture a memory allocation event before our main() function starts. Once main() is entered, all global variable initializations already happened and we've lost the chance to track allocation events either by hardcoding allocation numbers (using statements like "_crtBreakAlloc = 1234;") or by editing the _crtBreakAlloc watch value during the debug suspend mode. The _initterm_e() function just seems to be a good choice to place a breakpoint.

It is crucial to run your program every time with the same input conditions so as to memory allocation numbers stay unchanged between runs. Once you fixed a leak, you'll need to repeat the whole process from the start as allocation numbers likely have changed.

Hope this info will help fixing your very own leaks.
more >>

Sunday, February 6, 2011

Tesseract OCR: Setting Up Interactive Debug Environment On Windows

The following are the step-by-step instructions for setting up and running Tesseract’s internal state viewer (called "ScrollView") on Windows.

Although there already exists a dedicated wiki article (and the instructions herein are based upon it), it can cause some confusion for Tesseract newbies and those who don’t feel comfortable with the technology mixture required for the setup.
  1. First off, you need to make sure you have Java Runtime Environment (or simply “Java”) installed. If you haven’t, then go to http://www.java.com/en/download/manual.jsp and download it. Most likely, an offline version for Windows will suit you well. After the download completes, run the downloaded executable, follow several wizard steps and wait until the installation is finished.
  2. Tesseract’s viewer requires a few JAR files which hadn’t been changed for years and are a bit of hassle to get. So I decided to pack them all into a single archived installation suite along with the Tesseract 3.01 executable and other required minimal infrastructure. You can grab it here: http://www.4shared.com/get/Z4gnbJdP/tess_debug.html
  3. Then create some folder say C:\tess_debug and extract into it all the files from the downloaded installation suite preserving the folder structure.
  4. Launch the Windows Command Prompt and change the current directory to your folder by running the command
    cd C:\tess_debug
  5. Now you are ready to launch the Tesseract debug environment. My installation suite contains the test file phototest.tif so the command to display segmentation data for it would be
    tesseract phototest.tif test1 segdemo inter
    Type the above command in the Windows Command Prompt. The viewer window containing letter outlines should appear shortly.
    A few words on the command-line parameters used:
    • test1 indicates the name of the txt file which will be created as a result of Tesseract’s work. It will contain the recognized text.
    • segdemo and inter are config files required to run Tesseract in this kind of debug mode (segmentation debugging); you can see these within installation suite’s folder.
    • To run segmentation debugging with your file, indicate its name instead of phototest.tif. If your file is located outside installation suite’s folder then you’ll need to prefix the filename with the path.
    • The above command runs recognition using the default language file eng.traineddata. To use your own language file, specify it using the -l command-line argument e.g.
      tesseract image.tif test1 -l yourlang segdemo inter
      In order for this command to run successfully, the language file called yourlang.traneddata should be placed into the tessdata subfolder of the installation suite folder.
  6. The above paragraph describes how to debug the segmentation. Nearly the same technique is used to debug the classifier. One thing you need in order to change the debugging mode is to replace in the command line segdemo with matdemo, like this:
    tesseract phototest.tif test1 matdemo inter
    NOTE: The matdemo config file can also be found in the installation suite folder.
This is all that can be said about installation of and launching the Tesseract viewer. For information on how to use Tesseract viewer’s user interface please refer to http://code.google.com/p/tesseract-ocr/wiki/ViewerDebugging
more >>

Monday, January 24, 2011

Yii: Getting to Understand Hierarchical RBAC Scheme

Yii is a very powerful PHP framework, and among other PHP frameworks it is distinguished by its great object-oriented design, MVC support, speed, flexibility and many other virtues. Recently I had some experience with the PRADO framework and chose Yii to build my image processing web application demo. It took some time for my app to stay in my home server sandbox and now it's matured enough to be revealed to public. And at this point I came close to setting up my app's security.

Yii is also known for its comprehensive documentation and well-written tutorials. Authentication and Authorization is a good tutorial too. Among other topics, it describes basic aspects of Yii's RBAC implementation. That's what I needed to understand in order to start building my own primitive authorization system. But however hard I read the tutorial, I couldn't understand how exactly the hierarchy works. I found how to define authorization hierarchy, how business rules are evaluated, how to configure authManager, but almost nothing about how I should build my hierarchy, in what sequence its nodes are checked, when the checking process stops and what would be the checking result.

There was no other way for me but to dig into Yii's code and I would like to present my findings in this post. I have to mention that digging in Yii's code is not difficult at all, it's well-structured and everyone can do it, but the following info can save you a bit of time when you're a Yii newbie.

I must say it would be much easier for you to understand the article if you got familiar with the above-mentioned tutorial especially with the topics starting from Role-Based Access Control.

Let's consider the hierarchy example from the tutorial (this example illustrates how security can be built for some blog system):


$auth=Yii::app()->authManager;
 
$auth->createOperation('createPost','create a post');
$auth->createOperation('readPost','read a post');
$auth->createOperation('updatePost','update a post');
$auth->createOperation('deletePost','delete a post');
 
$bizRule='return Yii::app()->user->id==$params["post"]->authID;';
$task=$auth->createTask('updateOwnPost','update a post by author himself',$bizRule);
$task->addChild('updatePost');
 
$role=$auth->createRole('reader');
$role->addChild('readPost');
 
$role=$auth->createRole('author');
$role->addChild('reader');
$role->addChild('createPost');
$role->addChild('updateOwnPost');
 
$role=$auth->createRole('editor');
$role->addChild('reader');
$role->addChild('updatePost');
 
$role=$auth->createRole('admin');
$role->addChild('editor');
$role->addChild('author');
$role->addChild('deletePost');

First of all I'd like to convert this to a more human-readable form:

Sample blog system authorization hierarchy
The turquoise boxes represent roles, the yellow box is a task, and the most fine-grained level of the authorization hierarchy - operations - are tan. Collectively roles, tasks and operations are called authorization items. You should keep in mind that functionally all auth item types are equal. It's completely up to you to make some auth item a role or a task - still it would do the same thing. Different types of auth items are introduced solely for the purpose of naming convenience. You are not limited to the three authorization levels: there can be multiple levels of roles, tasks and operations. (Getting back to our diagram, you can see this point illustrated by multiple levels of roles.) Also you may skip any of these levels (the role author has immediate child operation create). The only restriction is that in the auth hierarchy roles should stay higher than tasks and tasks should stay higher than operations.

Now let's take a quick look at what was on blog system creator's mind. Everything seems to be quite logical. The weakest role is reader: the only thing he is allowed to do is to read. An author has a bit more power: he also can create posts and update his own posts. Editors can read posts and update (edit) all posts, not own ones (in fact, according to the hierarchy, editors can't create posts and that's why editors haven't got any own posts at all). And of course, the most powerful role is admin which can do anything.

If you are familiar with the principles of object-oriented hierarchy, your former knowledge may lead you to a confusion. In every subsequent level of an object tree, objects obtain (inherit) all (or part) of the features of their parent (base) objects. This results in that bottommost objects are most "loaded" with features, while the root objects have only basic features. The opposite happens with RBAC hierarchy in Yii. The bottommost items in the authorization hierarchy represent basic operations, while the topmost authorization items (usually roles) are the most powerful and compound ones in the whole authorization system.

So now that the idea behind the hierarchy is clear, let's understand how the access checking works. To check if the current user as allowed to perform a particular action, you should call the the checkAccess method, for example:

if(Yii::app()->user->checkAccess('createPost'))
{
    // create post
}

How our hierarchy is used by Yii to check the access? Although you are not required to read this to understand the rest of the article, I provide here an example piece of Yii's code responsible for access checking (an implementation of CAuthManager for databases - CDbAutManager) for your reference:

if(($item=$this->getAuthItem($itemName))===null)
 return false;
Yii::trace('Checking permission "'.$item->getName().'"','system.web.auth.CDbAuthManager');
if($this->executeBizRule($item->getBizRule(),$params,$item->getData()))
{
 if(in_array($itemName,$this->defaultRoles))
  return true;
 if(isset($assignments[$itemName]))
 {
  $assignment=$assignments[$itemName];
  if($this->executeBizRule($assignment->getBizRule(),$params,$assignment->getData()))
   return true;
 }
 $sql="SELECT parent FROM {$this->itemChildTable} WHERE child=:name";
 foreach($this->db->createCommand($sql)->bindValue(':name',$itemName)->queryColumn() as $parent)
 {
  if($this->checkAccessRecursive($parent,$userId,$params,$assignments))
   return true;
 }
}
return false;

When you call checkAccess, Yii begins to recursively climb along the authorization hierarchy and check each item's business rule. For instance, when you make a call like this
Yii::app()->user->checkAccess('readPost')

Yii first checks the readPost's business rule (recall that an empty business rule is equivalent to a business rule always returning true). Then it searches for all readPost's parents - these are author and editor - and checks their business rules as well. The process doesn't stop when a business rule has been evaluated to true; it only stops when some rule returned false or we have reached the top of the hierarchy and there are no more parents to check.

So what are ways for the checkAccess method to return true? They are two. First, the iteration can stop with a positive result when Yii encounters in the hierarchy a so-called default role - a role that is assigned by default to all authenticated users. For our blog system this can be the reader role. Default roles can be set up in the web app configuration file; how this is done is described thoroughly in the Using Default Roles section of the tutorial.

The second way to make checkAccess return true is explicitly creating an authorization assignment which is basically defining an <auth item>-<user> pair. In code, this can be done like this:

$auth->assign('reader','Pete');
$auth->assign('author','Bob');
$auth->assign('editor','Alice');
$auth->assign('admin','John');

which is semantically equivalent to assigning roles to users. You're not limited to assigning roles; individual tasks and operations can be assigned to users as well. In real life, it is more practical not to hard code all auth assignments but to store them in a database. You can implement this scenario using the CDbAuthManager component which is described in the Yii tutorial.

Let's get back to the checkAccess discussion. Before hierarchy iteration begins, Yii collects all authorization items assigned to the current user and at each iteration step checks if current hierarchy's auth item is in the assignment list. If it is, the iteration stops and returns a positive result.

Assume we are implementing security for the "update post" user action. Whoever is logged in into our blog system should pass our authorization check before he is able to edit a post. Therefore the most appropriate place to check the access is the beginning of the respective controller action:

public function actionUpdatePost()
{
 if(!Yii::app()->user->checkAccess('updatePost'))
  Yii::app()->end();

 // ... more code
}

Suppose the current user is Alice. Let's see how Yii processes the auth hierarchy. Although updateOwnPost is an immediate parent of updatePost and returns false, Yii quickly finds another parent auth item which returns true - the editor role. As a result, Alice gets a permission to do a post update. What happens if Bob logs in? In this case the branch of the hierarchy going through the editor item is also processed but no item along it returns true. The only possible way for the access check to succeed is then to go through the updateOwnPost item.

But instead of an empty "always-true" business rule updateOwnPost has a more complex one (see the first code snippet at the beginning of the article) and for the evaluation it requires the post creator's ID. How can we supply it to the business rule? In the form of checkAccess's parameter. To achieve this we need to modify our controller action handler in the following way:

public function actionUpdatePost()
{
 // here we obtain $post, probably via active record ...

 if(!Yii::app()->user->checkAccess('updatePost', array('post'=>$post)))
  Yii::app()->end();

 // ... more code
}

Note that despite updateOwnPost returns true for Bob, the iteration through auth hierarchy still goes on. It only stops and returns success when it reaches the author item.

I think now you're able to figure out how Yii would check access given that Pete or John logged in.

Returning to the above code snippet, it may seem that we're providing the post parameter to the updatePost operation whose business rule is empty and requires no parameters at all. This is truth but not all of it. In fact Yii passes the same parameter set (there can be several parameters as they are passed as an array) to every hierarchy item at every iteration. If item's business rule requires no parameters, it simply ignores them. If it does require them, it takes only those that it needs.

This leads to the two possible parameter passing strategies. The first one is to remember for every auth item what other auth items can be reached from it in the hierarchy and provide each call to checkAccess with the exact number of parameters. The advantage of this strategy is code brevity and probably efficiency. The other strategy is to always pass all parameters to every auth item, no matter if they would actually be used for business rule evaluation. This is a "fire-and-forget" method which can help to avoid much of trial and error while implementing you app's security. Its downside is possible code clutter and maybe drop in script performance.

This is only basic information about the RBAC authorization model in Yii; much more advanced security models can be built using it. Please refer to The Definitive Guide to Yii and Class Reference for more details. Also there's a number of web interfaces implemented as extensions which can help you do the Yii RBAC administration.
more >>

Monday, January 17, 2011

Visual C++ 2010: How To Fix The "Up-to-date Project Always Gets Rebuilt" Problem

Sometimes when you hit F5 or F7, Visual Studio acts as if something has changed in your project and rebuilds it, even immediately after a fresh rebuild. This might be very annoying and time-consuming, especially when debugging big projects. I'll try to summarize what can be done to eliminate this problem.

It's all about the new MSBuild build system. Something is fooling it and instead of seeing in the build output window a message like this:

========== Build: 0 succeeded, 0 failed, 1 up-to-date, 0 skipped ==========

you always see this:

========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

The following can be done then:
  • Check your project settings regarding the intermediate output directory. If you have more than one project in your solution, no two projects can share the same intermediate output directory. No manually placed files should reside in this directory and no manual corrections to automatically generated files should be made. This directory should be exclusively under Visual Studio's control. Try to "Clean" the project or the entire solution using Visual Studio's command and then test the build behavior. If no luck then try to clean the directory manually and test again.
  • Your project might reference a non-existent file. MSBuild's up-to-date check mechanism will assume that a new build is required. Try to locate and remove non-existent files from the project.
  • Your project is converted from a project of a previous Visual Studio's version. And your project somehow became "broken". It also can "break" in some other mysterious ways during "normal use", even if it's not a conversion project. The cure is to re-create the project from scratch, despite how dull and tedious it may sound.
I personally encountered all the three above situations and managed to solve the problem. If you have something else to say regarding this topic please let me know.
more >>