FreeStyle Handwriting Recognition

Learn more about the extraordinary technologies that power CereSoft solutions:

Dynamic Document Data Capture

Specific product information is available online:

FormAgent

DocAgent

EOBAgent

InvoiceAgent

HealthAgent

For information on Industry Solutions visit:

Healthcare Solutions

Financial Services Solutions

CereSoft solutions are powered by two extraordinary technologies: our freestyle recognition engine and our Universal Script that captures data from unstructured documents. The following white paper outlines our approach to recognizing unconstrained, free-flowing handwriting with neural net technologies. CereSoft incorporates our world-class FreeStyle Recognition engine into industry-specific applications to help organizations improve productivity and data accuracy. 

 

FreeStyle Handwriting Recognition

H. H. Chen and Songnian Qian 
CereSoft, Inc. 

Summary
Automatic recognition of freestyle handwriting (FHR) has been studied without much success, until now. A reliable system of this kind would be the ultimate solution to the problem of laborious data entry. This article discusses the many issues that a successful FHR system must face and how the FreeStyle™ recognition engine handles them. We will also discuss the relationship of the innovative FHR to the more traditional character recognition technologies such as ICR and OCR. The technology behind FreeStyle is CerebralNet™, a super-network of neural nets that integrates, in real time, a bottom-up character recognizer and a top-down linguistic model.

FreeStyle reads 10 words (or 50 characters) per second on a Pentium II processor. The user has the option of working in a dictionary-assisted mode or without a dictionary. The dictionary is an important component since a default dictionary with 120,000 words can enhance the handwriting recognition engine's read rate from 5 to 15 percentage points, and the average accuracy could reach 99% per word for reasonably clean handwriting. In addition, the same system will recognize handwriting and machine print seamlessly and automatically. This comprehensive recognition capability is the natural consequence of freestyle handwriting recognition design, since human handwriting ranges from very neat print style to the very sloppy cursive style. Thus, the system has been designed to encompass this wide range of styles. 
Back to the Top

Introduction
Automatic text reading by computers has made impressive progress in the past decade: reliable OCR software packages for full text recognition are widely available and isolated handprint character recognition (ICR) systems have been commercially viable for about five years. A new Forms Processing industry has grown rapidly as it proves its capability to dramatically cut down on labor costs for data entry services. However, until FreeStyle, there was no product available that could recognize freestyle handwriting and machine print text comprehensively. After all, OCR products can only read machine-print text, and ICR products can only read isolated handprint. Handprint in these systems must be written in boxes or combs that are unnatural to the writer, so forms used in these systems have to be specially designed. A few companies have been able to offer connected digit recognition only. Another company offers cursive word recognition based on a dictionary; unfortunately, without the dictionary, it will not work at all. It also does not read machine print well. It is therefore not an FHR system. Worst of all, its excruciatingly slow recognition ensures that its adoption in a production environment is out of the question. 

FreeStyle is designed to address these problems. It is comprehensive, which means it will read machine print as well as handwriting of any style whether it be isolated or touching print, cursive or mixed-writing. In the business forms environment, the most frequently occurring styles are machine print and mixed-writing that combines cursive and print. FreeStyle is ideal for these applications. 
Back to the Top

Different Styles of Text
Machine-print
Machine-print text is much easier to recognize than handwriting, primarily because of its regularity. Each time a letter appears, it appears the exact same way throughout the text. Isolating words or characters from the given text can be done reliably from a few heuristic rules, and variations among different fonts are very limited. Degraded images from faxes, copiers or scanners are now the major sources of OCR errors. These errors can sometimes be remedied with linguistic information. There are several ways to do this. Most vendors provide post-processing correction to OCR errors using statistical N-grams, OCR error matrices or dictionaries to suggest correction candidates much like a spelling checker does in a word processor. This off-line post-processing is effective only if the original character error rate is already low. Therefore, there is no significant gain using this kind of context enhancement. 

In order to achieve a more dramatic gain from the context information, context checking should be done on-line concurrently with recognition, before the OCR errors have already been made. This context-driven (versus image-driven) recognition approach is one of the guiding principles for CerebralNet, the architecture behind the FreeStyle handwriting recognition system. 
Back to the Top

Isolated Handprint
Besides machine-print, FreeStyle reads isolated handprint, touching handprint, cursive and mixed-writing (part cursive, part print). Right now, the most widely used ICR system is the isolated handprint recognition system. This system is popular because characters are written in pre-designed boxes and combs, eliminating the difficult word or character isolation problem. To help understand the intricacies of ICR systems, we will define the following two measuring parameters: 

Character Read Rate - Character read rate is the percentage of characters accepted by the recognizer as readable among all characters processed. The remaining characters are rejected as unreadable. 
Character Accuracy - Character accuracy is the percentage of those characters correctly recognized among all characters previously accepted as readable. 

In order to maintain the integrity of the read results, the accuracy of read characters should be high, say 99.5 per cent. To achieve this high accuracy, we may have to reject a fraction of correctly recognized characters as un-readable. This is because many of these character images are ambiguous; they may resemble more than one character label. Although they may be recognized correctly, their confidence level is low and therefore may be rejected to avoid possible errors. As a rule, the more we reject, the higher the accuracy but the lower the read rate. A good ICR system should have a steep accuracy-rejection curve: achieving high accuracy by rejecting relatively fewer characters. 

FreeStyle is designed to achieve the steepest accuracy-rejection curve possible. In a benchmark test on the US government's National Institute of Standards and Technology digit data, FreeStyle reached 99.9% accuracy with only 2.3 percent rejection. This rejection rate is about five times less than that of the closest competitor in the worldwide study. FreeStyle achieved this remarkable performance by employing a sophisticated confidence algorithm. A good ICR system will achieve a high read rate by using a good confidence modeling system, and a high accuracy by employing a good pattern classification system. FreeStyle, of course, has both of these. 

Besides character accuracy and read rate, the other performance measure of an ICR system is its read speed or throughput. FreeStyle uses multiple engines internally to achieve a throughput of more than 500 characters per second on a Pentium II machine. 
Back to the Top

FreeStyle Handwriting - Touching-print, Cursive and Mixed-style Handwriting
When we go beyond isolated handprint recognition, we face a multitude of problems. One of the foremost problems is being able to find the boundaries of a word or character. A text line is composed of words, a word is composed of characters, and a character is composed of strokes or fragments. Once these characters are no longer separated neatly by boxes or combs, we have lost the crucial information regarding the number and the location of these characters. A fragment or a wrong combination of fragments may be identified as a single character, or a wrong combination of characters may be identified as a single word, etc. The identity of each character itself may also become more ambiguous when people write in freestyle because when humans read handwriting, a few sloppily written characters inside a word do not necessarily significantly affect the word's overall readability. Noises, fragments, and punctuation marks all have the ability to further confuse the recognition process. All these difficulties indicate that a purely bottom-up image-driven approach will fail to reliably recognize freestyle handwriting except in the rare instance where the handwriting has been written in neatly printed characters. 

There are several approaches to recognizing freestyle handwriting words. These are: 

Word level - In this approach every word has its own image models. Recognition is on each word as a single unit. Constituent characters are not segmented out in the recognition process. Recognition at this level may tolerate high sloppiness because no attempt has been made to recognize individual characters. However, a very large set of word models will be needed and new words are difficult to accommodate. 

Character level - At this level, word models are assembled from character models. Each character is recognized as a single unit. Fragments of characters are classified only as non-characters in the process. Since recognition of individual characters is important, the words must have reasonable legibility of their constituent characters. The advantage of this level is the relatively small number of character models and the on-the-fly construction of word models for new words. Since fragments are not classified, they are easy to manage. Hidden Markov Model systems belong to this level. 

Fragment level - Word models are assembled from character models, which have been in turn assembled from fragment models. The recognition engine first segments out fragments and then classifies them according to established fragment models and assembles them into characters and then into words. This level of recognition needs fragment models in addition to character models. The reliability of fragment or stroke construction and their specific modeling is difficult to control. 

In FreeStyle, the assemblage of words and phrases from fragments and characters is done as a combination of bottom-up and top-down processes. The bottom-up process depends very much on the strength of FreeStyle's isolated letter recognizer. It returns highly reliable confidence values for the letter and fragment images sent to it for recognition. This ensures that the assembled words and phrases will also be very reliable. In addition, FreeStyle integrates this bottom-up assembly with a top-down guidance utilizing linguistic and geometric context. The whole process is accomplished using a very flexible super-structure of neural nets called CerebralNet. It works with a wide spectrum of writing styles ranging from machine-print to handprint to freestyle handwriting, with or without dictionary assistance. FreeStyle takes advantage of the mutual support of the many modules within CerebralNet and its flexibility to deliver the highest accuracy and throughput available in the world. 
Back to the Top

Data Field Objects
In real applications of FHR, we may have to recognize a phrase, a sentence or even a paragraph of text instead of just a single word. Having a higher-level understanding of such phrases or sentences is important in guiding and checking the recognition processes and their results. In a business forms processing environment, data is usually arranged into fields. A data field usually contains one or more words with a fixed set of vocabulary and grammar rules. For example, a date field containing "March 21, 1997" could be written in any of the following styles: 

3/21/97 
3-21-97 
03-21-97 
3-21-1997 
3.21.97 
Mar. 21, 97 
March 21, 1997 

and others. 

If we know beforehand that the data field is a date field, we can limit our recognition results to the above choices. Context for Data Field Objects are critical for FHR, much more so than other type of recognition. In OCR or isolated character ICR, Data Field Object context is not that critical. We can manage without them because other constraints limit the possibility for errors. In the case of freestyle handwriting, however, data field context provides the critical information needed to successfully recognize a piece of written data. FreeStyle has compiled a large set of these data field objects to cover the usage in a typical business forms processing environment.
Back to the Top

 

 



Please email questions and comments on products and partnerships to sales@ceresoft.com
Please email questions and comments about this website to website@ceresoft.com
Phone:301-445-8413 x210

©2006 CereSoft, Inc. All Rights Reserved.
CereSoft, Inc. 1738 Elton Road, Suite 121, Silver Spring, MD 20903, USA