Why We Need Metadata

This article appears in the Spring 2018 magazine issue of DOCUMENT Strategy. Subscribe.

In content management, metadata is used to uniquely identify content objects, improve search, and manage the life cycle of content. In some cases, it can even reference information that is not necessarily explicit in the content object, such as a project ID number that doesn’t appear within the text of a document. We use metadata to describe and provide context to our content. In essence, metadata is about language.

In order to achieve consistency for describing content objects and to facilitate retrieval, we must have vocabulary control. As we discussed last year, developing a taxonomy or classification scheme allows an organization to apply consistent vocabulary control for all content across the enterprise.

Just think about this: The average person has a vocabulary of 20,000 unique words. Without a controlled vocabulary, we could end up with a multitude of terms—some contradictory, inaccurate, or confusing—which only creates more obfuscation than clarity. On the other hand, if we can only use 10% of that vocabulary, then we are left with a much more manageable number of 2,000 terms.

The reality is that metadata can be developed by users or by a folksonomy system (tagging without rules). After all, they’re easy to develop, cheap, and “metadata of the masses.” However, they’re also ineffective over time due to inconsistencies in structure, vocabulary, and spelling. In this light, metadata by itself could be meaningless.

In this age of powerful search engines, you might wonder why we need taxonomies at all. In fact, an executive at Mozilla asked this very question. I’ll tell you the same thing that I told him: Let’s consider an organization with two million documents in a system, and you’re looking for an invoice for a part that was ordered last year. Now, a full search for that invoice and part number might take a while, expend system resources, and could result in hundreds of documents—or none—and too many false positives because the parameters are too broad.

When using a hierarchical taxonomy or classification, the search is narrowed to “accounts payable,” then “invoice,” then the “previous year,” and then the “vendor.” The search would be fast and more accurate. However, for that search to be effective, the system and the search interface must include the right metadata.

Search engines, such as Google, Yahoo, Bing, and others, have spent a gazillion dollars developing taxonomies and indexes in powerful databases, as well as ranking algorithms to determine what appears in the results. The bottom line is that organizations need to invest in taxonomies to support navigation and findability.

The key takeaway is that metadata needs to be managed, normalized, and controlled. User metadata or tags can be mapped to the master data plan and used to improve and refine the plan. Metadata should be expected to add value above and beyond the content it describes.

Charmaine Brooks, CRM, is a Partner with IMERGE Consulting, Inc. and has 20-plus years of experience in the field of records and information management. Contact Charmaine at charmaine.brooks@imergeconsult.com.

Jim Just is a Partner with IMERGE Consulting, Inc., with over 20 years of experience in records and information management. Contact Jim at james.just@imergeconsult.com or follow him on Twitter @jamesjust10.