Modernization of Document Capture Technology

Plenty has been written and will continue to be written about the modernization of document capture technology. After all, it's not every day that tech giants Microsoft, Google and Amazon suddenly jump in and launch new products in seemingly mature legacy markets. But they did, and their cloud-based capture services are proving to be very popular. However, it's a little too easy, and in my opinion, misleading to see modern OCR & NLP tools as an upgrade on the old ones. At one level, they are indeed upgrades on the old, but they are much more than that and herald the start of a capture revolution that opens up new markets, opportunities, and approaches. New and innovative applications of capture technology would have been impractical if not impossible previously.

Traditional capture tools digitized paper documents, which is still and likely always will be a great and widely used practice. However, cognitive capture is about reading and understanding text, structures, symbols, images and sounds. In a sense, it has always been possible, if a bit unwieldy, to read a lot of things digitally. However, understanding what has been read, its context, value and place and then acting on those insights has not been possible, until now.

Artificial intelligence is transforming capture, but that is only the starting point of the transformation to come. That is literally what the capture, be it traditional or cognitive, is, 'a starting point'. That could be a starting point to digitize and process invoices, or it could be for something much more wide-ranging and ambitious. Indeed, the possibilities are endless and yet to even be explored. Underpinning cognitive capture, what makes it work is AI, often a particular form of deep learning. This is a form of AI that relies on vast amounts of data, equally vast amounts of computer processing, and by default, enormous amounts of electricity. It's brilliant, it's powerful, and again, by default, only a few firms (aka Google, Microsoft, IBM and Amazon) have anything close to resources available to afford to build effective deep learning.

In practice this has meant that, though, huge steps have been made over the past few years in the world of capture. However, there has been a reluctance to embrace or explore the opportunities cognitive capture brings. That reluctance can come in the form of an inherent and long-standing wariness of entrusting enterprise data to these major firms by buyers. Or the reluctance of software vendors to become dependent on AWS or Azure (for example) services, few want to become resellers of a more prominent firm's technology. The use of advanced AI techniques will become the norm. Still, before they do, we need to debunk some seemingly immutable truths about deep learning that stand in the way of its practical use for information management in the enterprise. The first truth is that deep learning works best when it can learn by experience from millions of samples.

A well-known example is how a computer can learn to distinguish if that is a cat or not in the photo. On the other hand, most business processes involve unstructured or semi-structured business documents of incredible variety and diversity with nowhere near that volume of samples. In some cases, the total available sample size may only be in the tens of thousands. This leads some cognitive capture vendors to argue that deep learning will never be commercially viable for applications such as invoice processing, mortgage loan files or contract intelligence.

The second truth is that deep learning can quickly run rampant with compute time costs. There is a reason why Google, Microsoft, Amazon and IBM were the first to market with deep learning algorithms the rest of us could rent for our projects. The former three are members of the trillion-dollar market cap club, while the latter bet the company on Watson. They can all afford the vast computing and data acquisition bills to generate their deep learning models. Who else can?

The third truth is you need the most clever data scientists and programmers to create deep learning models. With the cost of a data scientist skyrocketing, this again seems like a game at which only the most extensive and wealthiest companies can hope to compete.

Not accurate any longer — on all three counts. We have spoken with several small software vendors who bring deep learning power into intelligent process automation and cognitive capture in the past two months. These innovative companies have shown us their pre-trained models for standard business documents such as healthcare claims forms, lending documents, invoices, contracts and general company records using neural networks that train on as few as 50,000 samples. The models are handed to end users who now have a running head start on the training process for their samples. As the models run in production, the learning is further refined, and lessons are applied to the next batch.

That takes care of the hurdles of the enormous data set and colossal cost. What about the colossal skill set hurdle? The software we've seen in each case is user-friendly and can be operated by a business analyst with no data science training and no programming skills. One company refers to its users as "data shepherds," whoever is capable of labeling and tagging unstructured data: records managers, subject matter experts of all kinds, data privacy managers, business analysts, compliance officers, legal, etc. Another company created what we're tempted to call "Deep Learning for Dummies" with step-by-step instructions to walk a novice user through the process of building a sustainable AI model to sort the scanned mail or something equally prosaic. This is why we predict that deep learning models will disrupt the status quo of document capture and classification over the next 12-24 months, as customers discover that they can train an AI classifier with as few as five samples and deploy it in a matter of hours. Without the need for Amazon, Google, Microsoft or IBM, and without the traditional massive compute costs and data sets associated with deep learning to date.

Time will tell if we are right or not, but change is definitely on the horizon. But what still intrigues us the most is the future use cases yet uncovered beyond the tried use cases like accounts payable and receivable. All we know for sure is that startups the world over are taking on investments to explore and develop solutions that use cognitive capture and deep learning that have the potential to bring real disruption, change and innovation.

Modernization of Document Capture Technology

Change is definitely on the horizon

Most Read