Image by: BrianAJackson, ©2016 Getty Images

    Shared drive remediation is a crucial activity for effective information governance (IG). IG helps to lower risks and costs by significantly reducing data volumes and providing accessibility and structure to unstructured data. Today’s discussion focuses on technologies available to help with the remediation and migration process. These products fall under the technology category of file analysis, classification and remediation (FACR). Some products go beyond FACR to include e-discovery, legal hold notification and archiving.

    Classification is a key requirement for effective IG and FACR—unstructured content, once classified, becomes structured and, therefore, findable, useable and manageable throughout its life cycle. Structuring shared drives using classification will move the organization a long way toward IG, but the use of content management systems, including properly deployed SharePoint, brings the greatest degree of operational effectiveness and life cycle control to achieve formal enterprise IG.

    The goal of shared drive remediation is to migrate clean content to a system or standard classification so that it can be found, used and managed through its life cycle. FACR solutions are many and varied; the kinds of content, the ultimate outcomes desired, volume of content and cost will help determine the options available to your organization.


    FACR systems have varying capabilities:
    • Metadata analysis looks only at the file system (and/or SharePoint) metadata (properties)
    • Text analytics further refines categorization of content and also targets personally identifiable information (PII) and identifies high-value content
    • Image analysis groups like-images using graphical pattern matching; it does not require optical character recognition (OCR)
    • Archive solutions perform the above analytics but also ingest target content into their repository for ongoing classification, analysis, discovery, hold and disposition
    • Some solutions are tightly integrated with SharePoint information architecture for bi-directional updates of taxonomies and metadata
    • Some solutions offer e-discovery, email migration and classification term-extraction
    To some degree, the solution capabilities beyond pure FACR are indicative of the product origin—products that started out as e-discovery solutions have strong capabilities in that area. Others originated as FACR solutions and excel at grouping, remediating and migrating content. Still others originated as archiving solutions and have expanded to encompass FACR and e-discovery capabilities. This article focuses on FACR, but there are many resources for e-discovery solutions through LegalTech and other organizations.

    While manual analysis or Excel spreadsheets can be useful for a high-level analysis of content, acting on content is a much greater challenge without a FACR solution. There are five main purposes for FACR solutions:

    1. Discover and cleanse content
    Analyze content, group within classification schemes, remediate redundant, outdated or trivial (ROT) content and purge or quarantine content. This task can be completed across very high volumes of content and across multiple repositories for broad normalization of content. In addition, workflow is used for human identification of groups of content that cannot be automatically classified. Most FACR solutions use artificial intelligence (AI) to constantly improve classification accuracy; others require "document corpus" to train the engine concepts for recognition. Extracted metadata can be rationalized and validated.

    Another valuable analysis task identifies migration issues dependent on the target system; for example, file names or document types not supported by SharePoint, encrypted files, password protected files, undocumented file extensions, etc. These anomalies can be queued in workflow for review or quarantined prior to initiating migration activities.

    2. Identify sensitive data or business-critical data
    Products, which leverage text analytics, use regular expressions (regex) to find social security numbers, credit card numbers and other PII or to locate tags that are critical for a business, such as contracts, intellectual property, etc.

    3. Migration of content
    Once you have clean, classified content, it can be migrated, using business rules and considering IG policies, to a new, properly classified shared drive, an enterprise content management (ECM) solution, SharePoint or another repository. Content that is questionable can be queued in workflow for human analysis, and content with sensitive data can be migrated to quarantine, waiting for further analysis and action.

    4. Content rationalization
    Now that content is clean, categorized and has validated metadata, it can be further analyzed to extract business data or be reorganized to meet business needs (mergers and acquisitions, divestiture, discovery, etc.).

    5. Ongoing governance
    It is critical to monitor and maintain IG rules going forward to avoid facing the same mess a year or two down the road. FACR systems offer various ways of automating classification tasks or monitoring repositories for compliance with the new taxonomies.

    Following are examples of a FACR system user interface and analysis output, compliments of Active Navigation, Inc.

    This example screen shows a snapshot of the targeted content across a variety of business rules and parameters.
    Active Navigation

    In this example, file extensions are grouped into common type grouping. A typical number of file extensions found (2,000 to 4,000) is overwhelming, so content is categorized by type, providing context to users to understand the kinds of records that exist.
    Active Navigation

    This screen shows typical setup of regular expression definitions used to analyze either metadata or text. All vendors use regular expression analysis to identify data important for the target content.
    Active Navigation

    Graphical representation of analysis results provides the analyst with immediate feedback on data or migration issues, content anomalies and volume.
    Active Navigation

    A useful analysis, this report shows file volume by file size.
    Active Navigation

    This final example graphs the volume of duplication—a highly valuable way to “sell” the need to remediate shared drives. In our client interviews, “version confusion” is one of the most common complaints.
    Active Navigation

    It has no doubt become obvious that there are many consideration when cleaning up content and migrating it. FACR tools, fortunately, formalize and automate the application of most business rules and IG policies. They manage content anomaly work processes to effect proper content groupings within a formal classification structure.

    For more information on FACR issues, efficient information organization, information governance, life cycle management and ongoing control, visit www.imergeconsult.com.

    Jim Just is a partner with IMERGE Consulting, Inc., with over 20 years of experience in business process redesign, document management technologies, business process management and records and information management. Contact him at james.just@imergeconsult.com or follow him on Twitter @jamesjust10.

    Most Read  

    This section does not contain Content.
    0