Reading the Fine Print
Enhancing OCR software helps a drug manufacturer handle print inconsistencies.
|Enhanced software for reading ink-jet print runs on MGS Machineï¿½s Stealth cartoner.|
Fast and programmable, ink-jet printers are used widely in the pharmaceutical industry. The nature of ink-jet printing and the wide variations in print quality, however, have given some users trouble during automatic reading of ink-jet print. According to Marcel Singleton, an independent engineer who works with Concepts In Computing (South Beloit, IL), the width, height, and spacing of ink-jet characters often vary; general variations in dot placement can alter overall character shape; characters are often printed at angles; ink often smears or spatters, connecting or degrading characters; and multiple lines of characters may be printed too closely. Optical character recognition (OCR) systems often have trouble reading such inconsistently printed characters.
Packaging equipment manufacturer and integrator MGS Machine Corp. (Maple Grove, MN) recently faced such a challenge. Its client, a major Midwestern drug manufacturer, needed ink-jet-printed date and lot codes and replacement part numbers printed in three lines to be read automatically.
The ink-jet-printed labels were relatively clean, exhibiting good contrast. However, over the course of long printing runs, characters printed in a 7 X 5 dot matrix format varied in placement. A random movement of the dots resulted in characters with altered shapes and varying sizes. The movement sometimes caused the loss of vertical separation between adjacent lines. As the speed of the line increased, the characters began printing at a slant such that the right edge of one character overlapped the left edge of the one beside it. The bottom rows of dots developed a wavy or rolling form, yielding characters with varying aspect ratios (the relationship of width to height). The print often rotated up to 3 degrees.
These print variations affected reading accuracy. Initially, 85% was the highest level that could be achieved. When variations occurred, reading accuracy continued to decline.
MGS and its customer tried another font recommended by the printhead manufacturer, but the new font proved to be even worse for OCR than the first font. At this point, all parties agreed that print quality variation was far too likely to result in similar characters being recognized incorrectly. Singleton suggested using the original 7 X 5 dot font and modifying the printer's software to maintain better control over line straightness and spacing.
Extensive on-line testing and analysis of the resulting data revealed that, to be able to recognize ink-jet-printed characters, the OCR system needed to incorporate the following features:
- Be insensitive to normal and unavoidable variations.
- Not require preprocessing to merge individual dots or character segments together before recognition.
- Work just as well when the characters in a string are unevenly spaced.
- Be capable of reading characters that are rotated several degrees.
- Properly segment characters that touch or connect when ink smears.
- Feature a fielding function to enhance reading accuracy by resolving ambiguities using advance knowledge of how a particular character string is marked.
A reader with these capabilities, when used with illumination that produces sufficient contrast between characters and backgrounds, would generally provide acceptable levels of reading performance of ink-jet print.
Every company involved took action. Domino Amjet (Gurnee, IL) modified its ink-jet printer software. Concepts In Computing enhanced its EconoCR Version 5 OCR software, which was incorporated into DVT Corp.'s (Norcross, GA) upgraded SmartImage sensor OCR tool. The resulting reader provides up to eight times the training data samples of previous versions, three resolution levels, simplified training, increased character specificity, greater sensitivity, and new noise reduction capabilities.
During initial testing, one instance of each character in the 7 X 5 dot font was taught. After a print run of about 1100 labels, the acceptance rate was 100%. A second run of 2500 randomly selected labels yielded an acceptance rate of 99.4%, with no rejections due to failed or misread characters. In fact, all failed reads were the result of positioning errors or some other non-OCR cause.
Finding the right OCR engine proved to be key to achieving success for this challenging automatic identification application.