Image by: chombosan, ©2017 Getty Images

No one likes to admit that their content is in chaos. Even for those that do admit it, they don’t really want to clean up the mess themselves or frankly see that much value resulting from the time and effort it would involve. It’s a bad, but perfectly understandable, situation. Many organizations are left with the following questions: Where would you start? What would it really achieve? Isn’t storage supposed to be cheap?

Can Artificial Intelligence Help With Content Sprawl?

The rich hire cleaners to maintain their homes, often preferring the work to be done while they're not there. That way, the house just stays magically clean and tidy. The same approach could be applied to content cleanup that is professionally done and invisible—driven by artificial intelligence (AI). For if there is one thing that AI doesn’t like, it's dirty data. Redundant, incomplete, duplicated, non-compliant, and irrelevant data just makes AI work harder, since it has to navigate its way (and is oftentimes led astray) through the mess. Conversely, AI is really good at identifying all that messy and redundant data. Furthermore, it can even add structure and organization to the content you actually need.

As affordable, easily available, and multi-purpose AI tools come to market, organizations will increasingly use this technology to bring order to their content chaos. Consider the following two examples:
  • Even if you're already suffering from severe content sprawl, you can use simple AI to improve the situation and, at the very least, prevent this syndrome from getting any worse. Let's consider a typical business process that consolidates invoices with statements, purchase orders, and delivery notes. Even in small organizations, such an exercise can produce dozens of duplicate copies via email exchanges. Using an AI-driven robotic process automation (RPA) tool, organizations can automate the collection and pairing of multiple related documents. This is a process improvement approach, but a side benefit is a dramatic reduction in duplication and sprawl.
  • Curing existing content sprawl using AI is also possible with tools that migration and federation vendors deploy to analyze your existing content stores, identifying what is old, duplicate, unused, or just plain junk. They do this so that when you migrate to a new platform, you only migrate what you need rather than simply lift and shift a pile of junk from platform A to platform B. Typically, this is a one-and-done situation, but the technology is also available to store and use that analysis through simple machine learning on a regular or even real-time basis as future problems arise.
The AI at work in both these instances is actually pretty basic but, nonetheless, highly effective. In simple terms, the AI mechanism is pre-configured to look for exceptions based on a set of rules. For example, has this file been accessed in the last 10 years? Is this a duplicate? Is it probably a duplicate?

Over time, the system learns the behavioral patterns and becomes smarter and smarter to the point where it can actually prompt and predict likely sprawl behavior by noticing structures, behaviors, connections, and disconnections between newly created or captured files. To be clear, there are many products already available to help organizations using AI, including (but certainly not limited to) iManage RAVN, OpenText Magellan, M-Files Apparento, Box Skills, and Office365.

Taking the First AI Step for Content Cleanup

Common sense dictates that we should get rid of our junk and control sprawl, but where do we start? In truth, most organizations have no idea and are terrified of a ham-fisted attempt at cleaning up and possibly destroying critical but hidden files and data. The result is that most organizations do absolutely nothing. However, it doesn’t have to be like this. There are some relatively affordable tools, strategies, and techniques that can go a long way to bringing order to the chaos.

Alan Pelz-Sharpe is the Founder and Principal Analyst of Deep Analysis, an independent technology research firm focused on next-generation information management. He has over 25 years of experience in the information technology (IT) industry working with a wide variety of end user organizations and vendors. Follow him on Twitter @alan_pelzsharpe.