Big Data and analytics is a multibillion-dollar market with wide-ranging opportunities across infrastructure, software and services. IDC sizes the Big Data market at $13 billion and its superset business analytics market at $104 billion. The market consists of many segments where large information technology (IT) vendors, along with a very long tail of smaller specialists and start-ups, compete. There are vendors involved in data creation, extraction and integration and the preparation steps of the big data analytics (BDA) process; there are vendors that provide technology and services for data management in either relational or a range of non-relational repositories; there are vendors focused on analysis; and finally, there are vendors that provide services to apply the results of the analysis to all the relevant business processes to drive value out of Big Data.

Success in achieving the promise of Big Data is dependent on the efficient access and analysis of all relevant data.

Big Data solutions are radically changing how information is captured, accessed and analyzed to drive better decisions that are improving how organizations interact with customers, optimize operations, assess risks, comply with regulations and provide products or services as they pursue strategic goals. Success in achieving the promise of Big Data is dependent on the efficient access and analysis of all relevant data.

Yet, most organizations have significantly overinvested in improving access and analysis of only structured data, ignoring the lion's share of knowledge locked up in the remaining 90% of information residing in unstructured formats throughout the enterprise. Today, unlocking the value hidden in unstructured content is more critical than ever.

Although much of the public focus on Big Data has been on consumer behavioral data and content from social media applications, a significant source of Big Data comes from document and image repositories as well as the phenomenon of the Internet of Things. The latter refers to a network—either wired or wireless—connecting devices, or "things," including printers and multi-function printers (MFP) with embedded sensors. Connected MFPs serve as the on- and off-ramps to a digital world.

Organizations have the opportunity to develop additional competencies related to the extraction, management and analysis of insights from unstructured text as well as from other forms of rich media, such as images, audio and video. The same way that industrial equipment manufacturers "own" the source data produced by their connected equipment, machinery, and infrastructure, organizations with MFPs can leverage data produced in the digitization of documents and images—a rich source of new insight once analytics are applied to this data.

In fact, IDC research shows that the biggest difference in positive outcomes from BDA projects comes from the incorporation of unstructured content from enterprise repositories. A 2013 study segmented organizations into high and low achievers based on the outcomes of their BDA projects. High achievers are those whose BDA project outcomes met or exceeded expectations, and low achievers are those whose projects fell short of expectations or resulted in no benefits. Among high achievers, 46% integrated and analyzed content text from content repositories (such as product descriptions, engineering or maintenance notes, research reports, patents, insurance claims, internal memos and competitive intelligence) along at least one other type of data. That is more than twice the rate of low achievers.

The volume and variety of data being captured, as digitization of content and physical things proceeds at a rapid pace, will see continued growth. Unstructured content in the form of documents, images, audio and video already dominates the Digital Universe from the storage perspective. In the past, the analysis of this content has taken a backseat to the analysis of structured, transactional data, but this trend is starting to change.

Holly Muscolino is the research vice president of Document Solutions at IDC. She is responsible for all written research related to document services and the solutions that enable them, including managed print services, related software solutions, the scanning ecosystem and document outsourcing. Follow her on Twitter @hmuscolino.

  • Digital Asset Management (DAM) is a system designed for organizing, storing and retrieving media files and managing digital rights and permissions. DAM systems have become a core component of creative
  • Is Generative AI tipping the scales in favor of building Enterprise Content Management (ECM) software, or will it ever get to that point?
  • Information technology has undergone a major transformation in recent years, sparked by the rise of “big data.”
  • Every day, large organizations face multiple challenges with the hundreds or thousands of pieces of mail received through the USPS and other carriers, documents that include general business mail
  • Personalizing things is not new. We have engraved items and composed personal letters and communications for centuries, but can we do this economically and efficiently?

Most Read  

This section does not contain Content.