Machine Learning vs. Traditional Automation in Document Processing

There is a lot of ongoing talk about artificial intelligence and machine learning and how it can provide superior results for document processing automation. Terms like “learning” and “training” are touted by many vendors to get attention, but what are the practical applications and expectations of machine learning for document processing? Is this technology any better than what you are currently using?

Let's take a critical look using a loan documentation compliance example to compare two approaches: machine learning and user-created algorithms. Regardless of the loan type—an auto loan, home loan, or a home equity loan—many documents are needed to satisfy credit and compliance requirements. Verifying documentation requirements are met and that individual documents contain the necessary data are time-consuming tasks. Many lending organizations struggle to make this process more reliable and efficient.

Traditional automation
Traditional automation starts with manual evaluation. A properly managed project requires an inventory of documentation for a given lender and loan type. Next, the staff collects examples of all document types from different sources. They organize hundreds of these documents by type and review them to identify unique characteristics for automatic identification in a typical loan approval workflow.

Once the staff identifies all the documents’ characteristics, an analyst encodes them as rules within a document capture system. Rules must be tested in order to uncover any misclassifications that require adding new rules or fine-tuning existing ones. After completing testing and fine-tuning, the rules go into the production workflow.

Next, it’s necessary to create rules that locate needed data within the documents before the data can be extracted and validated. A similar process of analysis, testing, and tuning takes place to ensure the maximum amount of data is extracted and to understand the accuracy that governs when manual review is needed. Since many documents are not standardized, a wide range of rules must be created.

Some industry solutions offer automation of document processing configuration as an initial step or during the pre-production or production process. The workflow involves locating samples of a given document type (maybe two or three) and stepping the software through the location of each field—typically, a point-and-click process. If a vendor solution includes a pre-built document type (e.g., an invoice), many rules for locating fields or data are predefined, and the user only identifies samples where data was not found.

Locating these fields “trains” the system. While not machine learning, the system incorporates the coordinates of these fields and creates a new template for the document or it adds general location rules. The result is a blend of the benefits of a more traditional expert system along with efficiencies gained from the training function.

The pros:

Troubleshooting is straightforward. Rules are identified and encoded by humans, so understanding why the system provides output under various circumstances is clear.
Also, many times, documents are highly standardized or “structured forms.” Applying template-based rules ensures errors associated with location of data are mostly removed. Overall accuracy can be much greater than provided by a machine learning system that attempts to generalize the location of data.

The cons:

One disadvantage of encoded rules is the time it takes to achieve an acceptable level of automation. Developing acceptable performance on semi-structured data, such as invoices, definitely takes time.
Another disadvantage is the need for a subject matter expert (SME) to encode the system. SME availability can be hard to come by initially and when the system must be revised due to changing requirements.

Training the machine
Here’s where there’s a gray area, which is complicated by various vendor use of the terms "learning" or "training."

Let’s examine the same set of requirements using machine learning technologies. For the initial document discovery, a technique called “clustering” can be used to automate the logical grouping of like documents. Documents can be organized automatically. Applications can be grouped with applications. Photos of driver’s licenses are grouped with identification documents and so on. The result is a set of documents grouped by likeness that can then be further evaluated. These groupings form the basis of document class samples that are put into the system, where the software automatically identifies features of each and creates rules.

Learning-on-the-fly for high-quality data results
Creating data extraction rules is a similar approach. The biggest requirement is that the organization must have a greater number of samples but also have “ground truth data,” which is the actual results of what data extraction should be. For example, a user supplies the system with a sample along with the actual field values that need to be extracted. Together, these automatically train the software to locate the matching data and derive positional rules for each data field. It will do this for each sample and then automatically create algorithms based upon exact location, changes in placement across each example, and relative position to other data.

The pros:

Creation of rules is simple, since the user doesn’t encode the rules.
Additionally, adjustments down the line can be handled by adding samples of the new documents along with the associated ground-truth data.
Having SMEs and a technical person to encode the system are less necessary.

The cons:

The drawbacks of this type of approach are mirror images of the expert system approach.
Troubleshooting incorrect output is challenging and requires adding new samples and reviewing results to see if they have improved.
Having a learning system develop its own field location rules could have more errors than just simply providing coordinates—especially for structured forms.

Ultimately, the tradeoffs for both approaches must be fully considered before implementing a capture system. Knowing how these approaches work arms decision makers to make the best selection based upon their specific needs.

Greg Council is Vice President of Marketing and Product Management at Parascript, responsible for market vision and product strategy. He oversees all aspects of Parascript Artificial Intelligence software life cycles, leading the successful development and introduction of advanced machine learning technology to the marketplace. Contact him by visiting www.parascript.com or follow Parascript on Twitter @ParascriptLLC.