Archive for January, 2007

Data entry is a necessary evil – Do less of it!

January 11, 2007 6 comments

We have thousands of paper documents with valuable information. Before we can use that information, someone needs to take the time to key the data. If you work with people whose days are consumed with tedious data entry, consider simplifying their workday with automated data extraction. Data extraction allows you to reduce manual data entry, increase throughput and often even reduce errors. Such technology is referred to as OCR, ICR, or MICR. It is easy to see how someone interested to automatic data extraction can get lost in a sea of acronyms. Here is a brief overview of some data extraction technologies and how they may benefit you.

Quick Note: Data extraction technology does not completely eliminate data entry. If you get a slick salesperson that promises OCR will magically make your data entry needs disappear, do the following. Allow him take you out to a free meal (order the lobster), smile and nod at everything he says and then never return his phone calls. It is the least you can do to someone that knowingly deceives you. OCR technology will allow your people to do more work in less time – making them more productive.

Image capture is the first step in electronic data capture. Image capture is the process of converting a paper document into an electronic image. Usually these documents are stored as Tagged Image File Format (TIF) or Portable Document Format (PDF). There are many benefits to document imaging beside automatic data extraction. I’ll cover those in later articles.

The image is typically captured with a scanner. There is a wide variety of scanners available – from single workstation (five pages / minute) to full-scale production scanners (fifty pages / minute). Of course the price reflects the features of the scanner.

Many companies have electronic fax servers such as RightFax or Biscom. These fax servers convert incoming faxes into images automatically. These solutions can be very costly. If you are looking for a low cost alternative to expensive fax servers consider email-based fax solutions such as eFax. These solutions send inbound faxes to an email address of your choosing.

Optical Character Recognition (OCR) software reads an image and converts the information into digital data. Such software is capable of processing machine print, handwritten or even cursive text. OCR of handwritten text is often referred to as ICR (see below).

OCR of machine written text is largely considered a solved problem and yields high accuracy. Clean machine text may conservatively reach 95% character accuracy. In the real world documents are rarely perfect when they are scanned. Lines running through text or smudged ink can reduce the accuracy level. However, significant productivity gains are typical.

Intelligent Character Recognition (ICR), or Handwritten OCR, has come a long way in the last decade or so. Accuracy of handwritten data extraction is enhanced using constrained print fields. You may receive recognition rates of 80 to 90%.

ICR implementation uses constraint print fields to maximize recognition rates. These print fields encourage the user to separate each character and prevent written text from “running together”. Here are a couple examples of print constraint fields.

Example Print Fields

Magnetic Ink Character Recognition (MICR) is used by the banking industry to facilitate the processing of checks. MICR characters are the odd looking numbers and symbols written at the bottom of all our checks – often called the MICR line. In addition to the special font the MICR line is written using special magnetic ink. The ink allows the text to be accurately captured – even if someone writes over the MICR line.

Example MICR Line

After a data has been extracted, the results shall require review. This step is necessary to ensure the data is accurate. Suppose an OCR program needs to extract data from the following region. Notice the smudge in the fist zero.

Example OCR Region

You and I can easily recognize the smudge and understand the zero is a zero. An OCR program, however, may not be so sure. The OCR application may recognize the smudged zero as an “8”.

The person reviewing the extracted data will have an opportunity to change the “8” to a “0”. This should require only one keystroke – as opposed to the four needed to data enter “2100”. This reduction in keystrokes is a primary source of productivity gains. There are other reason these techniques increase productivity. I’ll cover these in later articles.

If your people spend their days entering data, you should consider data capture as a strategy for increasing productivity. Such technology allows you to focus on the core business at hand – running your business – instead of pounding away on a keyboard. If you have any questions about how to best take advantage of OCR technology, feel free ask.

Categories: Articles