Image by: violetkaipa, ©2016 Getty Images
Do you have unstructured data? This term is used more and more each day. What is it? To best understand unstructured data, perhaps it is easier to define what it's not—structured.
All businesses have both structured and unstructured data
Structured data is enterprise information that resides in a traditional, row-and-column table database—data stored neatly in fields of a database. Examples of structured information include enterprise resource planning (ERP) systems, Oracle, and Microsoft SQL databases as other data-centric information. The advantages of structured data use, aside from efficiencies in storing the records as data, also include the searchable and identifiable context about the document. This context for structured data in databases may be used to understand what the documents are, the value they provide to the business, and meaningful information that may be culled and reported on to senior management to make effective business decisions.
Unstructured data is the opposite of structured data
Structured data generally resides in a relational database, and as a result, it is often called relational data. This type of data can be easily mapped into pre-designed fields and has a great deal of contextual information embedded in the database design and table relationships—in short, its structure.
By contrast, unstructured data is not relational and doesn't fit into pre-defined data models. The difference may initially seem trivial, but a look at the vast volumes of information being stored in organizations today immediately uncovers the challenge that is at the root of this difference between unstructured versus structured data.
Organizations that can overcome the difference and identify, classify, and act on their structured and unstructured data will have a tremendous competitive advantage over companies that do not. Clearly, in order to leverage information, you need to be collecting the right data, and whether it is structured or not, it needs to be classified and have sufficient context applied so it can be recognized by systems that can use it effectively in your organization. Structured data often has that context built into the data model as additional tables, rows, and columns. Unstructured data does not, but that doesn’t mean that it cannot be a great benefit.
Global research firm Gartner defines unstructured data as "content that does not conform to a specific, pre-defined data model. It tends to be the human-generated, people-oriented, and document-centric content that does not fit neatly into database tables" and, thus, has limited contextual information associated with it; rather, the contextual information is within the content itself. "Within the enterprise, unstructured content takes many forms, chief amongst them are traditional business documents (reports, presentations, spreadsheets) email, and web content."
The rapid growth of unstructured data is putting greater pressure on businesses
In today’s digital enterprise environment, organizations are amassing huge amounts of unstructured content sitting in file shares, SharePoint sites, and other document management and collaboration platforms. These organizations attempt to apply metadata—or data about data—to unstructured information by storing the information in these document management and other content repositories. The difficulty in employing and maintaining information in this way is that it relies on humans to classify records and information at the time the information is being created.
Most staff members create information, not complete business records. Therefore, generating and maintaining metadata about what they are authoring is extremely problematic, and by the time the record is declared in the business process, multiple hands have been involved. Often, no one takes responsibility for adding the classification information, like record type, security classification, business area, or process, that will help manage and sort the information in the future.
What does this mean for a business? This veritable jungle of information results in massive data bloat, generates challenges in finding information, and ends up increasing organizations’ compliance risks. In fact, unstructured data has become so prominent that it helped give rise to the term "Big Data." Identifying unmanaged content that may not be managed in alignment with corporate records, legal, or information technology (IT) policies is an important step in increasing compliance.
Information analytics and file remediation solutions allow organizations to gain insight into unmanaged file content and can help to reduce overall risk by identifying, analyzing, and supporting the remediation of content for ongoing retention management.
Ask questions
Key questions organizations should ask to evaluate how adequately their records are being managed and whether, or perhaps more likely how much, unstructured data is bloating the enterprise include these nine queries:
1. What content resides in our storage systems, and what do we know about it?
2. Is it easily searchable, and can the content be retrieved and viewed when required?
3. Is it sensitive or confidential?
4. Can the organization comply with internal policies based on the knowledge of the content that exists?
5. What is the cost and/or risk of maintaining unmanaged, unstructured content as is?
6. What business records are unmanaged?
7. Do we have an archive strategy for records and information?
8. Can content without business value be identified successfully and safely deleted?
9. What information needs to be retained and for how long?
To succeed in identifying, understanding, and effectively managing unstructured content, enterprises must secure buy-in from the C-suite on the approaches to be used. People, process, and technology must all be considered holistically in order to effectively plan, manage, and execute on programs that will enable defensible outcomes for classifying an organization’s unstructured content. Today's digital enterprises can achieve success and attain the highest levels of efficiency in managing unstructured content through digital transformation strategies, one of which is classifying and acting on a large chunk of unstructured data through a file analysis and remediation strategy.
One way to manage unstructured data is file analysis and remediation
File share analysis and remediation strategically reduces the volume of files that need to be maintained, thus, reducing the amount of storage space an enterprise utilizes. This decrease makes for optimized, less expensive file content management practices that benefit organizations in at least three critical ways:
- Allowing organizations to gain insight into unmanaged file content.
- Reducing risk by analyzing and remediating content for ongoing retention management.
- Enacting defensible disposition, should the information not be required for any legal, business, or regulatory requirements, including hold orders. Eliminating redundant, outdated, and trivial content based on an organization's retention rules, which allows organizations to compliantly reduce their overall volumes of information.
For instance, companies must not delete records and information without a retention policy and schedule and a firm destruction policy that contains associated procedures and guidelines. Furthermore, increasing concerns about privacy and security means electronic data disposal must be carefully and systematically handled to minimize the risk of illegal or unauthorized access to information. Digital shredding techniques must be used to safely obliterate digital information that is being defensively destroyed.
Good news, for many businesses, unstructured data represents a relative treasure trove of information that, when organized, can enhance business decisions, tap new sources of revenue, or help provide better customer service. The ability to analyze this data effectively and to take actionable knowledge from it is a key driver of information management at this time.
Brett Claffee is a Principal Consultant of Information Governance and Compliance from Paragon Solutions' Life Sciences Practice and has over 15 years of pharmaceutical industry experience in areas ranging from discovery research and BioPharm GxP manufacturing to records and document management. For more information, visit www.consultparagon.com or follow @consultparagon.