File Formats, Master and Usage Formats
Digital images can be stored in various file formats. Every format has properties that affect the usage possibilities (presentation on the internet, long-term archiving etc.). Moreover, the memory required depends on the file format selected.
Some file formats reduce the file sizes by compressing the images. In doing so, a distinction is drawn between lossless and lossy compression. Normally, lossy compression cannot be undone – in other words, the original file cannot be reconstructed after compression.
Common file formats used in digitization: TIFF, JPEG, JPEG2000, PDF, PDF/A.
A detailed description of file formats is provided by the "Koordinationsstelle für die dauerhafte Archivierung elektronischer Unterlagen (KOST (external link))".
Master and usage formats
A master file can be described as the unprocessed scanning product or original file. Master files of greyscale and colour digital copies are usually stored as uncompressed TIFF files.
As an alternative to TIFF, JPEG 2000 requires less memory and, unlike the JPEG format, supports lossless colour space compression and a progressive image build-up. Moreover, it can record metadata. In the DFG Practical Guidelines on Digitisation (2013, pdf, 761 kB) (external link), however, JPEG 2000 is not recommended as a storage format for master files – mainly due to the fact that (as yet) the file format is not very widespread.
Derivatives are file products which have been optimised for use based on master files that are especially needed to display digital copies on the internet. To this end, compressed files are produced from the master files, which take up less memory and can be loaded quickly by the browsers. The browser-friendly formats JPEG and PNG are especially suitable for internet applications. For download functions or additional archiving purposes, the use of PDF/A is recommended.
Practical example – usage derivatives retro.seals.ch
For presentation on the internet, two derivatives are required in a JPEG format: one version in a reduced width for the overall view and one version in its original size for the zoom function. These are created automatically with Agora Workflow Client.
Memory and Long-Term Archiving
On average, a colour TIFF file with a resolution of 300 dpi requires 25 MB of memory space per scanned page. This means that only around 180 books can be stored on a storage medium with 1 TB of memory, for instance.
Essentially, two processes can be adopted to reduce data volumes:
- Storing the files (master and usage derivatives) in a compressed form
- Optimising the file size through suitable scanning parameters (resolution, colour intensity etc.)
Long-term digital archiving
The conceptual planning for the long-term storage of original and master files is a central aspect of digitization projects. Cooperative solutions (e.g. with an efficient IT infrastructure) may help to conceive long-term archiving efficiently and safely.
Further information on the topic: Digital Data Curation at ETH Zurich