For all the focus on data and its critical role in gaining a competitive edge, most organizations today still struggle to access and use it at scale. The foundational “first mile” of data processing — getting it from disparate forms and documents filled out by customers, suppliers and other partners into a useable format for an enterprise system — remains overly complex and manual. Despite talk of being “digital first,” many global enterprises still rely on teams of data entry clerks sitting in front of two computer screens — one with a scanned or electronic document, the other with the fields where the data needs to be input — manually transcribing and entering data. This is slow, expensive, and error-prone, and organizations are spending over $60 billion each year to convert these documents into formats that can be used by various systems.
Even purely digital documents, like invoices that are rendered to PDF and sent via email, require some level of understanding of the underlying content. In the case of an invoice, for example, there is no standard format. There can be any number of dates (the date the items were purchased, the date the invoice was generated, the due date, etc.) or ways that invoice numbers are represented. Knowing which is which – and how to enter that information into a system – requires comprehension.
Where Automation Can Help
Bill Strogis is the Vice President of Worldwide Sales at Hyperscience, the automation company that enables data to flow within and between the world's leading firms in financial services, insurance, healthcare and government markets. He can be reached on LinkedIn or by email.
This is bad for businesses and customers. Inefficient, manual back-office operations can create an information bottleneck that affects all downstream processes, making it difficult to respond to customer needs (such as a loan applications, medical claims or other services) in a timely fashion. It also prevents valuable employee resources from focusing on higher-level activities that drive a business forward.
Simply put, when organizations are stuck trying to keep up with what’s happening today — pushing paper, keying data, fixing errors — they can’t focus on delivering more to customers tomorrow.
So how did we end up here and what’s the best way forward?
Businesses typically handle two types of documents: those that are filled out by hand or typed, such as claims, mortgage applications or new bank account forms; and those, like invoices, that are generated on a computer and printed out or emailed as PDFs.
The challenge is that while both kinds of documents can be processed by humans, neither kind of document is inherently “machine-readable” by any computer system. Put another way, the information needed to process and extract data from these documents is not fully contained within the documents.
Anyone who has ever tried reading messy handwriting has had the experience of hitting a word, being unable to read it, skipping ahead, and then, with the benefit of further context, realizing what that hard-to-read word was. (Just think of the forms you’ve encountered with messy scrawl, crossed outlines, text continuing outside the box or other real-world imperfections.)
Because machines, and legacy systems in particular, have traditionally struggled with comprehension, they have instead required rigid, structured data formats (e.g. due date in the top left corner, written as MM/DD/YY). In exchange for that rigidity, machines are incredibly fast and make very few errors.
But without a global standard for most document types, relying on a more rules-based, rigid approach is unrealistic and unscalable. Legacy systems are limited in their processing capabilities and produce unreliable results that require employees to go back and double-check fields or fix errors before the data can be used downstream. However, while humans may be slow and error-prone (particularly if they are entering data for hours on end), they bring context to the data and have vastly superior flexibility in understanding data as it appears in different document formats - even formats they’ve never seen before.
In absence of a better alternative, and in a world where accuracy makes all the difference (and one incorrect digit within an account number or policy amount can have devastating consequences), organizations have continued to rely on humans to manually extract data from the plethora of diverse documents.
The good news is that ever-evolving advances in artificial intelligence — and machine learning, specifically — are bridging the gap between human understanding and machine processing, transforming the first mile of data processing and business operations in the process.
Whereas legacy software relies on explicit rules (“if this, then that”) and rigid data inputs, machine learning models train on real-world data and continue to learn in response to the data they are exposed to. This critical development changes everything. Instead of organizations trying to undertake the impossible task of writing enough software code to cover the seemingly-infinite number of document nuances and text inputs — they can now turn to a machine that can train and teach itself to understand and process this.
Overcoming the data limitations of the first mile is a game-changer for businesses. Freed from the inefficiencies and cost of outdated, manual approaches, organizations can now extract and capture all of the data in a document. Companies can leverage these data insights to improve the customer experience, uncover new business opportunities, and enable a more agile, strategic organization that is primed for long-term success.