Image by: BrianAJackson, ©2016 Getty Images

Last month, I discussed the general parameters of a shared drive cleanup (remediation) and some of the problem areas you may face. This month, I’ll walk through a typical remediation decision tree with rules that may be applied. Shared drive remediation is as much art as it is science; a good problem-solving mindset will be important as you work through your content. Today, I focus on using available shared drive metadata. While content analysis can be useful or necessary, it usually requires technology tools.

In all cases, the level of risk and the value of the information will be key drivers to decisions made regarding that content. Legal opinions usually lean toward minimizing high-risk content by removing duplicates and strict adherence to retention schedule and policies. For government, reducing the volume of content reduces the effort and cost of meeting open records requests, and for government and commercial enterprises, shared drive remediation simplifies information governance and reduces costs (in particular, e-discovery costs)—a strong win for any information technology (IT) or records department.

The general flow of content remediation looks like this:

Before starting any remediation process, be sure you have an up-to-date records retention schedule (Hint: The more condensed and concise the schedule, the easier the remediation tasks), a records destruction review and approval process, a list of legal holds in place and buy-in from the owner(s) of the targeted content. If you plan to migrate content to an enterprise content management (ECM) system, SharePoint or simply reorganize it, you will want to develop a functional classification and link it to the retention schedule. I’ll address classifications in a later post, but simply stated, it is an enterprise taxonomy or information architecture that simplifies and improves findability and life cycle management of content.

Also, designate a quarantine area on the file share to locate likely redundant, outdated and trivial (ROT) content or content that you suspect has personally identifiable information (PII) or other sensitive content. For now, it’s better to quarantine most content rather than delete it. Any reference to deleting content in this article will always assume you are following your policy.

First, select the target content area (folder tree) using these rules:

1. Relatively unambiguous content in the folder tree: Don’t start at the root of a department (S:HR or S:Accounting, for instance); instead, focus on one area of the shared drive completely within the business area that has similar content.

2. Content that has fixed retention rules or, if trigger-based, metadata about the files is unambiguous. For instance, accounting files that have a seven-year retention from fiscal year end date. There may be outliers (e.g., bond-related invoices in which case retention is seven years after bond maturity), which can be easily assessed if sufficient metadata exists in folder and file names to make remediation decisions.

3. No legal or other holds are in place, or if they are, content on hold is easily segregated.

4. If the target content is high value and/or high risk, it will make the pilot effort more visible and beneficial to the organization, which will kickstart the next effort.

First-pass remediation focuses on the certain ROT. Gather subject matter experts (SMEs) together and review the folder structure as a group. Here are some of the things to look for during the first pass:
  • Temp files, thumbs DB or system-generated files.
  • Personal content: Wedding photos, music, videos, etc. can be deleted or quarantined, depending on your use policy and how nice you’re feeling that day.
  • Orphaned content: Personal subject folders—the digital equivalent of the desk drawer—for staff that have moved on. For example, Betty worked in finance as accounts payable manager, but Betty left the company in 2002, so the most recent "date accessed" attribute is 2002. Betty had no responsibilities that would require a retention of more than seven years. If none of the SMEs have searched the content and legal and records management agree Betty’s files have no value, then delete (or quarantine).
  • Duplicates: Not all duplicates are bad duplicates, but within the same folder structure, they usually have no value. Most common duplicates are “copy of” or Word and PDF rendition. A determination will need to be made on actions to take—delete, leave as is, move to a “copy” folder, etc. In general, if there are multiple duplicates with Word and PDF version, only the most recent Word version should be kept. If final versions (e.g., PDF) are intermingled with drafts, consider creating a “final version” folder, move final version to it and then clean up drafts according to your rules—it will simplify findability and eliminate version ambiguity.
  • Review document types: Sort each directory by "type" attribute and then scroll through looking for extensions that don’t belong (e.g., EXE, TXT, MPGx, EPS, ILL, DAT, ZIP). The invalid extension list will vary depending on the business area and business processes. For instance, you’d expect EPS files in marketing but probably not in accounts payable.
As part of this phase or the next, it will be useful to use a metadata extraction tool, such as Directory Lister Pro ($25) or Cathy (freeware). These tools extract metadata and output as CSV or other formats where analysis can be done much more rapidly than through a manual process and across larger content sets.

Second-pass remediation considers more complex rules. The rules will depend on how consistently content has been identified in folder taxonomy (file plan) and file names.

1. Case and project files (could also be “folders” that represent aggregate documentation, such as a contract folder, personnel folder, etc.) are handled similarly to other folders. It is particularly important to de-duplicate, eliminate drafts and version and be sure the “final” folder follows a standard folder taxonomy (work breakdown structure or file plan).

2. Folder names with date information where date information exceeds retention: Typically, folder name with a designated year, such as 1995 Invoices or 1999 Applicants. Working with the SME, delete, archive or quarantine folders exceeding the records retention policy.

3. As you review the content, other obvious content assessment triggers will present themselves— follow the same rules of date analysis and retention rules and then take action.

In summary, the rules you use must consider content value, risk and the cost/effort required to remediate a content area. Remember, each organizations’ decision tree will reflect the goals and outcomes you wish to achieve.

Jim Just is a partner with IMERGE Consulting, Inc., with over 20 years of experience in business process redesign, document management technologies, business process management and records and information management. Follow him on Twitter @jamesjust10. For more information, visit