Quality Control and Image Processing
Consistent quality control anchored in the workflow is vital, i.e.:
- Minimum requirements should be defined for quality control on a project basis
- If possible, quality control is not performed by the scan operators themselves, but rather a second person as sources of error are more likely to be discovered with two pairs of eyes
Control through workflow software
Although current software solutions to support workflow usually perform a quality control autonomously, this is limited to technical aspects (file format, resolution, colour management etc.). It is not a complete substitute for a visual control.
Typical scanning errors:
- Missing or double pages
- Shadowing or finger marks on the digital image
- Slanting pages
- Cropped type area (text block on the page)
- Insufficient image definition
- Poor colour authenticity
- Image interference (e.g. moiré effect)
Practical example – quality control E-Periodica
The journals scanned for E-Periodica (external link) are usually checked twice:
- The completeness, legibility and quality of the image files are checked immediately after scanning. Standard tools such as Adobe Bridge (external link) are used on calibrated monitors.
- An additional visual control is performed while the digital copies are being indexed with metadata.
From experience, errors that are only discovered during metadata entry take a great deal of effort to correct:
- replacement of the master file
- replacements of the usage derivatives
- recovery of the correct file name etc
Therefore, the initial manual control is of paramount importance.
Retrospective image processing is time-consuming and costly. Especially in the course of mass digitization, there is no guarantee that the effort involved in individual image processing will pay off. In order to ensure an adequate image quality, however, there is software that enables batch image processing based on pre-set standards. Examples are PageImprover (external link) by 4Digital Books or the open-source product ImageMagick (external link).
Practical example – image processing E-Periodica
The scanners used by ETH-Bibliothek have device-specific, multi-purpose image processing software. This does not only have advantages for the digitization of journals: as the processing possibilities differ from one scanner to the next and not every device can satisfy every need, a separate professional image processing software programme (PageImprover) is used for subsequent processing.
Semi-automatic image processing
The image processing usually takes place in batch mode. The optimisation parameters are determined manually in advance based on individual pages; the automatic image processing is ultimately performed based on these settings.
The concrete optimisation tasks in journal digitization:
- Alignment of the document
- Reduction of disruptive background information (translucent pages)
- Increase in the contrast between the text and the background in greyscale scans
The original scan (left) compared to the optimised image file (right).