Back in the 1990s, large-scale digitization projects were common so that organizations could work in a fully digital environment, eliminate file cabinets, reduce risk of loss or misfiling and consolidate processes. As cloud applications take over and operations become mobile, use of paper filing systems is seeing its last days. This has resulted in many new initiatives for mass scanning of older files, such as vendor invoices, registration documents, human resources files, meeting and board minutes, land and title documents, engineering drawings, etc., but must be planned in order to provide the accurate costs of a digitization project and to have confidence in the quality and usability of the scanned images.
Planning the digitization project
The initial planning must confirm exactly what documents/files/folders require scanning, how they will be used or integrated in their digital format, how much storage the documents and their indexing will use and the actual cost to scan, followed by complete quality checks and the eventual upload of the digitized documents.
Sizing and cost estimates
You need to know the exact number of files, pages and images to be scanned. Factors to take into account in calculating cost include the following features:
- One- or two-sided documents
- Color or black and white
- Office documents (print), photographic data or a combination
- Number of index fields per page, per document (multiple pages), per folder
- Resolution of scanning (e.g., 300 dpi as office standard or 600 dpi for higher resolution)
- Whether optical character recognition (searchable text version) of document will be required or if only graphic image (e.g., like a fax) will be used
In order for documents to be put through a high-speed scanner, the documents in a file require preparation. Some examples of preparation steps include:
- Removal of clips and staples and a way to keep multiple-page documents together (e.g., use of separator or index sheets in front of each new document)
- Mounting of odd-sized notes (e.g., sticky notes) on standard-sized pages
- Removal of documents from folder
- Capture of information recorded on folder tab or inside of folder
- Seals (impressions in paper) are often shaded with a pencil to highlight the seal on scanning
There are a variety of scanning methodologies that are offered by specialized scanning service bureaus. Options include in-house scanning, scanning by local service bureau with controlled and scheduled pick-ups and deliveries and scanning locally with off-shore indexing.
Sample testing and verification
It is best practice to prepare a sample of files (e.g., 10 to 100) and run a test with a vendor to sort out all of the details prior to entering into a large scanning contract. Randomly select a sample from the physical files. For example, pick one folder every six inches or one foot of file space. Complete the preparatory work to prepare the file or document for scanning. Create a log sheet with the file folder and document id number counting both the number of pages, the number of images (for two-sided pages) and the index terms for the folder and document. When the scanned images are returned from the vendor, or after completion of the scanning in-house, review the scanned images and verify that the number of documents, pages and images are the same as that recorded on the original log sheet and that the sequence of pages are identical to the original. Do not proceed with the full implementation until the accuracy on the sample is 100%. This phase is critical in making sure the final digitized images are usable and have the integrity of the original file.
There are formal standards for checking quality assurance. In addition, service providers for bulk scanning will usually provide pricing based on the type and frequency of quality checking. For example, during the actual scanning process, will you require quality checks on 100% of the images or only on 25% of the images? Depending on the nature of the documents, this factor may impact costs, but it is critical to the integrity of the final set of scanned images that no blurring or equipment malfunctions (e.g., accidental zoom of data capture during scanning) occurs.
Indexing of documents
The amount of indexing will have an impact on the cost of the digitization. Some index information may be captured electronically (e.g., through optical character recognition) or system-supplied data. For example, if an employee name and number already exists in a database, capturing only the employee number would allow for extracting name and address to use as indexes from other systems. These indexes are often referred to as metadata and should be embedded in the actual document. Indexing to the folder, document or page level is another consideration. The more detailed the indexing, the higher the cost and quality control requirements.
Integrating scanned documents into system
Once the scanned documents are complete, they will need to be loaded into the system that will reference these documents. This may be a generic document management system or may be a specific application that is looking for the documents as a backup. Loading of images is often a slow process, and often, special software to accelerate the process is required.
Managing ongoing document access and creation during the digitization process
Plans need to be made to access the documents that are being scanned during the scanning period. Usually, scanning in batches minimizes the time that original documents are not available, and vendors are able to offer a service to return an “in-process” file within 24 hours or make the file available electronically. For processes that need continuous access to paper files, a temporary file can be created for paper documents that are created during the scanning period. At the end of the project, these temporary files can be scanned before going live with the digitized version of the files.
Disposal of originals
After uploading and testing of the digitized files, a decision needs to be made as to the handling of the originals. Most projects do not re-clip the documents and contents in the folder but leave them as is and set a disposal date for the originals so that the digitized version serves as the official and only record.
So, the next time you are told to “just scan everything,” there is some homework to do to properly assess both the cost, time and future value of the files you are scanning.
Paula Lederman is an information management consultant with IMERGE Consulting Inc. She has over 20 years of consulting experience in all aspects of information management. Contact her at Paula.Lederman@imergeconsult.com.