Home | Blog | Custom AI models: the cornerstone of better handwriting OCR

Custom AI models: the cornerstone of better handwriting OCR

We test custom OCR against Google Vision.

Oct 02, 2020 by Jakub Czerny & Jess McCuan

A deep dive into the data science behind good handwriting OCR

It’s tough to find a word or phrase that a computer cannot translate. Colleague sent an email in French? Zut alors! Where there was once only Google Translate to help you out, now a dozen credible French-to-English translation tools can easily decipher whole sentences. But one area that even the most sophisticated software still struggles with: reading human handwriting. That’s not to say that plenty of companies aren’t trying. The market for handwriting OCR (optical character recognition) is increasingly competitive, with one report projecting it will be worth more than $13 billion by 2025.

And yet, even the biggest players still fail to produce handwriting OCR tools that can read and digest large volumes of handwritten documents accurately. The reason? The intricate nature of handwriting itself. 

In this blog, we’ll discuss why handwriting OCR that’s driven by deep learning makes the process more precise, and we’ll compare the benefits of a custom OCR model to out-of-the box software. 

Hairy, or Larry? Humans aren’t confused

First, handwriting comes with layers of complexity, including the country and culture where a person grew up, their writing style, the time they had to fill out a form, the repetitiveness of the task, and how the document is processed next. If a document is for record-keeping only and rarely referenced again, chances are the writing in it won’t be very clear. Also, the definition of what’s clear for one person might not be the same for others as we all have different biases.

Below are two first names, and you’ll have no problem recognizing them, despite the handwriting not being clear. The first is “Larry,” but a machine, not knowing this was a first name, might read the word as “harry” or “hairy,” because in fact, the “r’s” aren’t that clear either. It’s the context and our knowledge that makes it easy for us to decrypt, even if we can’t read it letter by letter.

Handwriting OCR driven by deep learning is ultimately much more precise

To make things more difficult, handwriting styles are influenced by culture and region. Below are different styles and ways of writing the German word “aber” (meaning “but”).

Because our handwriting OCR involves deep learning, it's more accurate than off-the-shelf software.

You might also think it would be impossible to mistake the letter “A” for the number 1, but remember: on paper, people make stray marks and mess up letters, and the paper they’re writing on may not be perfect to start with. With enough scanning artifacts and noise, it would be easy for a machine to get confused.

And, on handwritten forms when people write in cursive, there’s no separation between letters. That’s less of a problem for human translators, who rely on the context in which the text appears. A military clerk, for example, has no problem understanding that a soldier intends the “s” below as part of “serve,” instead of an “8”. 

The curves and connected characters in this image make it difficult in handwriting OCR

A further complication for modern companies, especially those doing business around the globe: notations for dates and dollar amounts vary widely depending on where you’re from. Europeans commonly use a comma as a decimal delimiter, but no comma in between thousands. Most Americans write a comma to note thousands, and whole dollar amounts end in periods. 

Domain-specific data for handwriting OCR

Deciphering tricky cases like the ones above is where a custom solution for handwriting OCR comes in handy. There are two main challenges an engine has to deal with. 

Off-the-shelf OCR software like Google Vision, for example, must start with the widest range of possibilities due to lack of context.

For a mark like the one above, the software would need to consider whether this is a letter “A”, some variation of the number 1 or maybe a Chinese word meaning “person.” This could be narrowed down by whitelisting allowed characters, but to the best of our knowledge, Google Vision solution does not support such a feature. 

At Automation Hero, we build a custom OCR model trained on a domain-specific dataset, one that’s unique to an industry, company or country. By doing that we can limit the set of possible characters but also let the model learn text characteristics such as characters at certain positions being always of a certain type. A good example here is IBAN, an International Bank Account Number, where the first 2 characters indicating the country are letters and they are always followed by 2 digits. Knowing this means we can prevent mistakes such as reading “BE18” as “8EIB”, even in case of messy handwriting. 

Train AI models with a blast from your past

Where it makes sense, a dataset might even be of your company’s own past documents. This ensures that the model’s practice data is a relatively narrow sphere of information, meaning our platform is much more likely to accurately interpret industry-specific words and their context. It’s simply better equipped to translate problem cases than software you can buy off the shelf. 

The second challenge concerns data formatting to ensure consistency in all records. As we mentioned, dates, prices and numbers can take different formats when written in different parts of the world. When we talk about process automation, these might need to be compared with each other or simply added to a database. To ensure consistency and make this possible, the values need to follow the same format. In the case of Google Vision, even when it can read all above samples correctly, an extra formatting step would be needed to make the values take the same format. Our solution can have that feature built-in. In the case of the dollar amounts, Automation Hero can build a model that normalizes the correct notation for the country the company does business in, ignoring, for example, periods where they should be ignored or replacing a decimal each time. 

Testing Google Vision

The result on the left is what happens if no “language” hint is provided to Google Vision. This highlights how important the context is. Dates can also be similarly flipped, depending on where a person is from. 

Handwriting OCR can be tricky for Google Vision, since it lacks cultural context

If we limit the set of characters to latin + digits, Google Vision gets it right.

With Automation Hero’s custom OCR, that hint gets built in from the start. 

Again, humans often know the context in which the given text appears, so when we read, we can limit the set of characters, making it easy to decrypt. This is an extra step that a general model such as the one provided by Google needs to do. Our custom OCR model has been trained on the normalized dates, so it has been told to read 14/2/20 as 14/02/20, which reduces further need for post processing to obtain consistency. Internally, the model learns that each component of the date should have 2 digits, and if only one is present, the other should be a leading 0.

Similarly when it comes to prices, the model has been told to ignore the currency sign.

Frankly, Google Vision gives fair interpretations of the below, because it has been given no hints that the first is a date and the second is a dollar amount.

Custom OCR for companies: a no-brainer

Why would a company opt for off-the-shelf handwriting OCR software? To be sure, it’s widely available, and it may work well for some use cases. And, such models are already trained and simply need to be implemented. Generally, they are good at reading generic text, as they do not show bias towards any specific format. That situation is rare, however, when trying to automate processes inside any complex enterprise company. 

With a large volume of handwritten material or intricate use cases that involve industry-specific jargon, a custom AI model for interpretation leads to higher accuracy rates and better data consistency across the board, which makes that data easier to query later. In an end-to-end solution, the post processing is baked into the model itself, rather than tacked on as a second (or third) step. Our data science team can specify the kinds of text or a list of words that might cause trouble for the model, which means the model will be better equipped to handle problem cases in the end. 

Category