If left unmanaged, your data can become overwhelming, making it difficult to procure information you need when you need it. While software is designed to address archiving, e-discovery, compliance, etc., the overarching goal is most always the same: to make managing and maintaining data a feasible task. Below, you’ll see two types of data you’re accustomed to working with, paying close attention to the differences between structured and unstructured data.

What is structured data?
Before getting into unstructured data, you need to have an understanding for its structured counterpart. Structured data is information, usually text files, displayed in titled columns and rows, which can easily be ordered and processed by data mining tools. This could be visualized as a perfectly organized filing cabinet where everything is identified, labeled and easy to access. Most organizations are likely to be familiar with this form of data and are already using it effectively, so let’s move on to the hotter question.

What is unstructured data?
Believe it or not, your database of structured information doesn’t even contain half of the information available for your use. Seth Grimes, a leading industry analyst on the confluence of structured and unstructured data sources, published an article that stated, “80% of business-relevant information originates in unstructured form, primarily text.”

80% of business-relevant information originates in unstructured form, primarily text.

Unstructured data, usually binary data that is proprietary, has no identifiable internal structure. It can be visualized as a "level five" hoarder’s living room; it’s a massive, unorganized conglomerate of various objects that are worthless until identified and stored in an organized fashion. Once this organizational process has taken place, the items can be searched through and categorized (to an extent) for obtaining insights. While data mining tools might not be equipped to parse information in email messages (however organized it may be), you may have very good reason to collect and categorize data from this source. This illustrates the importance and plausible breadth of unstructured data.

Email has structure, right?

The term “unstructured” has faced major scrutiny for several reasons. One argument is that although some form of structure is not formally identified, it can still be implied and, therefore, should not be labeled as “unstructured.” The counterpoint states that if data has some form of structure but is not helpful to the processing task at hand, it may still be characterized as “unstructured.” While email messages may contain information with some implied structure, we can label the information as “unstructured,” because normal data mining tools aren’t equipped to parse it. Alas, both sides of the argument persist.

Unstructured data types
Unstructured data is raw and unorganized, and organizations store it all. Ideally, all of this information would be converted into structured data; however, this would be costly and time-consuming. Also, not all types of unstructured data can easily be converted into a structured model. For example, an email holds information such as the time sent, subject and sender (all uniform fields), but the content of the message is not so easily broken down and categorized. This can introduce some compatibility issues with the structure of a relational database system.

Here is a limited list of types of unstructured data:
  • Emails
  • Word processing files
  • PDF files
  • Spreadsheets
  • Digital images
  • Video
  • Audio
  • Social media posts
Looking at the list, you may be wondering what these files have in common. The files listed above can be stored and managed without the format of the file being understood by the system. This allows them to be stored in an unstructured fashion, because the contents of the files are unorganized.

The "Big Data" industry is growing, but the problem of unstructured data going unused has been identified by organizations. Better yet, technologies and services are being developed in reaction. As corporations begin to harness the management and storage of Big Data, they are beginning to keep pace with the accelerated information being created. Big Data is no longer so intimidating.

Jeff Tujetsch is the VP of Notes product development and the senior consultant at Sherpa Software. He has 34 years of experience in information technology and has been involved with products since 1985. Follow him on Twitter @JeffAtSherpa.

 
  • A seismic wave is rumbling through the workplace with AI and automation actively transforming employee productivity and reshaping operations
  • The potential of generative AI to positively impact how we work and live is massive
  • You may wonder why an old content management guy is writing about generative AI and LLMs
  • Generative AI (GenAI) is set to revolutionize the Customer Communications Management (CCM) industry, driving profound changes in how businesses interact with their customers
  • Editor’s Note: This is part 2 of a 3-part series on AI in CCM. You can find part 1 in our Spring issue. Look for part 3 in the next issue