17 Digital Preservation Part I
Hema C and Jagdish Arora
I. Objectives
The objective of this module is to impart knowledge on the following aspects of digital preservation:
• Need, relevance, problems and challenges of digital preservation;
• Principles that guide, digital preservation actions;
• Factors that are involved in long-term digital preservation;
• Digital preservation strategies; and
• Impact of intellectual property rights and digital rights management on digital preservation.
II. Learning Outcome
After going through this lesson, learners will attain knowledge on needs, problems and challenges of digital preservation. They would learn about the nature of digital contents and problems and challenges faced in accessing them. They would learn important digital preservation strategies, factors and technical standards used in the process of digitization.
III. Structure
1. Introduction
2. Definitions
3. Needs of Digital Preservation
4. Problems and Challenges of Digital Preservation
5. Principles of Preservation as Applied to Digital Preservation
6. Factors of Digital Preservation
7. Digital Preservation Strategies
8. Digital Rights Management (DRM) and Digital Preservation
9. Summary
1. Introduction
The widespread transition of knowledge from print to electronic format began in 1980s with appearance of 5¼ inch and 3½ inch floppy discs accompanied with documents acquired by libraries. These floppy discs and floppy drives required to read them have disappeared completely. However, information recorded on them is still relevant to libraries and its users. Likewise, CD ROM, DVD ROM and magnetic tape cartridges that are used as low-cost storage media in libraries may also move towards extinction as storage technology evolves. Acquisition of electronic content in libraries are increasing everyday with addition of new resources that are “born digital” through different channels of communication like e-journals, e-books and online bibliographic databases subscribed by libraries from publishers, vendors and aggregators. Furthermore, individual institutions themselves are producing their research output and other knowledge resources in electronic format. Libraries, with their mandate to provide long-term access to resources available with them are concerned that if the media and the technology used for preserving digital content become obsolete, libraries may fail to provide access to its digital data that are preserved for posterity. As such, the issues of preservation of digital contents are a matter of concern with technologies, standards and formats in continuous flux of change.
In the past few years, significant developments have been made in digital preservation with several new projects and programmes involving national and international institutions of high repute. Libraries, archives, and other cultural institutions are eagerly looking forward to adapt and adopt tenets of digital preservation with an aim to avoid the risk of loss of digital contents due to rapid changes in technology. Preservation of digital data requires substantial new investments and commitments by organizations, institutions and agencies to adopt its economic and administrative policies for funding and managing the digital preservation practice.
2. Definitions
Digital preservation deals with the management of digital information over long period of time. Digital preservation is a set of processes and activities that ensure continued long-term access to information from all kinds of records, both scientific and cultural heritage that exists in digital form. According to Trusted Digital Repositories (TDR, 2002) “digital preservation encompasses a broad range of activities designed to extend the usable life of machine readable computer files and protect them from media failure, physical loss and old fashioned”. Kelly (1999) defines digital preservation as “storage, maintenance, and accessibility of digital object (include any digital material such as a text document, an image file, a multimedia CD-ROM or a database) over long-term, usually as a consequence of applying one or more digital preservation strategies”. Digital preservation is the active management of digital content over time to ensure ongoing access.
Kirchhoff (2008) defines digital preservation as “series of management policies and activities necessary to ensure the enduring usability, authenticity, discoverability, and accessibility of content over the very long-term”. Digital preservation refers to a series of managed activities designed to ensure continuing access to all kinds of records in digital formats for as long as necessary and to protect them from media failure, physical loss and obsolescence (Cornell University Library, 2005).
The Wikipedia (2014) defines “digital preservation” as “the series of managed activities necessary to ensure continued access to digital information for as long as necessary”. Digital preservation involves the planning, resource allocation, and application of preservation methods and technologies to ensure that digital information of continuing value remains accessible and usable. It combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change.
3. Needs of Digital Preservation
Libraries should have a clear understanding about its purpose for digitizing and preserving digital material. Fundamental needs for digital preservation include:
• Exponential growth in digital information available in libraries and its ephemeral nature;
• Increased complexity of digital objects (incorporating text, images, audio, video, GIS, formats, etc.) and their increasing dependency on the software required to read and use them;
• Rapid flux of technology, standards and formats;
• Multiplicity of standards and formats;
• Absence of widely-accepted standards that will assure access overtime;
• Need to ensure usability, durability and intellectual integrity of the digital information; and
• Rapid changes and obsolescence of storage media (e.g., Limited life span of storage media).
4. Problems and Challenges of Digital Preservation
The challenges in maintaining access to digital resources over time are related to notable differences between digital and paper-based material. The initial problem with digital preservation is the contents itself (Chen, 2001). Digital contents are complex and dynamic in nature. It requires specific software and up-to-date technologies to access these contents frequently. The economic challenges of digital preservation are also enormous. Preservation programmes require significant upfront investment to create, along with ongoing costs for data ingest, data management, data storage, and staffing. Graham (1998) grouped problems and challenges of digital preservation into three distinct categories, namely: i) Longevity of Physical Storage Media; ii) Technology-related Issues including Technological Obsolescence, Hardware and Software Dependence and Multitude of Formats; and iii) Intellectual Preservation Issues including integrity and authenticity of information. Specific challenges that need to be addressed while preserving digital contents are as follows:
4.1 Dynamic Nature of Digital Contents
Preservation in an analogue world involves static objects like printed documents, manuscripts and other artifacts. Collecting and storing these items in some form is simple and straightforward process. While preserving digital contents requires reconsideration in terms of meaning and purpose of preservation. Digital information exists in several forms and type. There are several digital documents that are true replicas of their print counterpart, such as books, reports, correspondences, etc. However, there are other types of digital material that vary greatly from their tradition forms. For example: interactive Web pages, geographic information systems and virtual reality models. Web sites are often dynamically changing sites. As the object grows and changes over time, new questions emerge about what it means to preserve a digital object. Internet users are all familiar with the link failure syndrome that plagues the Web.
4.2 Machine Dependency
Digital contents are machine-dependent. It may not be possible to access the information unless there is appropriate hardware, and associated software. Access to digital contents may require specific hardware and software that were used for creating them. Since computers and storage technologies are in a continuous flux of change, the time frame available for migrating digital contents to new software / hardware is generally very short, typically 3 to 5 years, as opposed to decades or even centuries that may be available for preserving traditional materials. Techno-obsolescence is considered as the greatest technical threat to ensuring continued access to digital contents.
4.3 Fragility of the Media
The storage media used for storing digital contents are inherently unstable and highly fragile because of problems inherent to magnetic and optical media that deteriorate rapidly and can fail suddenly because of exposure to heat, humidity, airborne contaminants, or faulty reading and writing devices (Hedstrom and Montgomery, 1998). Magnetic storage media is highly sensitive to dust, heat, humidity and other climatic conditions. Most storage devices, without suitable storage conditions and proper management, may deteriorate very quickly without displaying any physical characteristics of external damage. Deterioration of storage media may lead to corrupted digital files in such a fashion that it may not be easy to identify the corrupted portion of digital contents.
4.4 Technological Obsolescence
Technological obsolescence can affect hardware, software and file format. Not only computers are continuingly superseded with their faster and more powerful versions, the media used to store digital contents also become obsolete in two to three years before they are replaced by newer and denser versions of that medium, or by new types of media that is smaller, denser, faster, and easier to read. The digital materials stored on older media could be lost because the hardware or software to read them may become obsolete. Although the media may physically survive for years, the technology to read and interpret it may exist for only a brief period of time. As a result, even if the storage media is retained in the best condition, it may still not be possible to access the information it contains. Obsolescence also affects software that is used to create, manage, or access digital contents since the software are being superseded by newer versions or newer generations with more capabilities. There is a constant threat of backward compatibility for digital contents that were created using older versions of software. Similarly, the file formats are being superseded with newer versions, and the newer versions of software may not read files in older formats. Although some file formats are largely independent of specific software (for example ASCII and Unicode), most are tied to individual or related groups of software. Proprietary software with associated file formats represents some of the most enduring and successful software in use. Commercial software developers regularly release new versions of their software and associated file formats with added features and functionality in order to entice users to upgrade.
4.5 Shorter Life Span of Digital Media
One of the important concerns of digital preservation is relatively short life span of digital media and higher rate of obsolescence of the hardware and software used for accessing the digital records. Rapid change in the IT industry and the move from science-based development to commercial development of software and hardware systems has resulted into media becoming inaccessible at a faster pace.
4.6 Formats and Styles
Information contents that were earlier confined to traditional formats like books, maps, photographs, and sound recordings are getting increasingly available in diversity of digital formats. New formats have emerged, such as hypertext, multimedia, dynamic pages, geographic information systems and interactive video. Each format or style poses distinct challenges relating to its encoding and compression for digital preservation.
4.7 Copyright and Intellectual Property Rights (IPR) Issues
Legal issues, in particular the process of obtaining copyright clearance for preservation and access of archived material, can contribute significantly to the cost and complexity of digital preservation. It is an area where the wider preservation community often needs to make its case with government and other legislators.
Andrew Charlesworth (2012) emphasized that while a number of legal issues colour contemporary approaches to, and practices of, digital preservation, it is arguable that intellectual property law, represented principally by copyright and its related rights, has been by far the most dominant, and often intractable, influence. It is essential for those engaging in digital preservation to understand the letter of the law and to be able to identify and implement practical and pragmatic strategies for handling legal risks in the pursuit of preservation objectives. Moreover, those engaging in digital preservation need to advance a coherent and cogent message to rights holders, policymakers and the public with regard to the relationship between intellectual property law and digital preservation. It is in the long-term interests of all stakeholders that modern intellectual property law permits both the implementation of effective and efficient mechanisms of digital preservation.
5. Principles of Preservation as Applied to Digital Preservation
The basic principles of preservation that are being practiced for preservation of analogue media are also applicable to preservation in the digital world. In essence, digital preservation defines priorities for extending the life of digital information resources. Convey (Convey, 1996) identified five principles, i.e. longevity, choice, quality, integrity, and accessibility that are being practiced for preservation of analogue media and can be extended to digital preservation.
The following principles guide, digital preservation actions:
5.1 Longevity: Density of media to record information has increased exponentially over time while its longevity to store the information has decreased proportionately. The graph given below (Convey, 1996) plots ten “writing” media on “X” axis in chronological order with their corresponding capacity to write information on “Y” axis on a logarithmic scale. It can be observed that the capacity to write information increases at each level by a factor of ten. The longevity of digital contents dependents on the life expectancy of the access system, including hardware and software. Storage media is likely to have longer life span in comparison to computer systems that is used to retrieve and interpret the data stored on them. The libraries must always be prepared to migrate valuable digital contents, indexes, and software to future generations of the computer and storage devices. Migration of digital contents would remain a continuing activity to ensuring perpetual availability of digital information. The libraries must ensure continuing institutional commitment to support long-term migration strategies.
Fig.: Information Density V/s Life Expectancy of Storage Media
5.2 Accessibility: Digital preservation activities must be performed with collaborative understanding that long-term access is the primary goal. Access to digital collections should be supported to the best of ability of available technology and resources. Acquisition of non-proprietary hardware and software components can ensure perpetual access to digital resources.
5.3 Selection: Selection of digital material for preservation is an ongoing process intimately connected to the active use of the digital files. The process of selection and value judgment is involved every time a decision is to be made to convert documents from paper or digital image and migrate it from one storage access system to another so as to continue preserving the information. Rare collection of digital files can only justify media and the cost of a comprehensive migration strategy. (Conway, 1996).
A Selection of digital contents for preservation should reflect the broader institutional mission. Moreover, as with analogue documents, the main criteria in the selection of digital contents for preservation should be their authenticity, significance and lasting cultural value in reflecting subject matter.
5.4 Quality: Quality in the digital world is concerned with the usefulness and usability of digital contents, and is essentially governed by the limitations of capture and display technology. Imaging technology, for example, facilitates scanning at resolution of 3000 dpi, however, the printing and display technology has its limitation. The Quality of the digital object, including the richness of both the image and the associated indexes, is the heart and soul of preservation in the digital world. This means maximizing the amount of data captured in the digital scanning process, documenting image enhancement techniques, and specifying file compression routines that do not result in the loss of data during telecommunication. (Convey, 1996)
5.5 Integrity and Authenticity: Digital preservation is concerned with physical as well as intellectual integrity of digital contents. In terms of digital preservation, the physical integrity of a digital image file is determined in terms of loss of information that occurs when a file is created in the process of scanning, and compressed mathematically for storage or transmission across the networks. The metadata (descriptive or structural) that describes intellectual contents of an image file or its organization is an integral part of the digital file, which must be preserved along with the digital image files themselves. The preservation of intellectual integrity also involves authentication procedures to make sure files are not altered intentionally or accidentally (Lynch, 1994).
5.6 Discoverability: Digital content must have associated bibliographic metadata so that the content can be found by end-users through time.
5.7 Usability: The intellectual content of the item should remain usable via the delivery mechanism of current technology.
5.8 Sustainability: Digital preservation activities must be planned and implemented in ways that resource can be managed and sustained into the future. Future access to digital resources cannot be assured without institutional commitment to the necessary resources that are required for digital preservation.
6. Factors of Digital Preservation
There are many issues involved in long-term digital preservation. These factors can be grouped into the following six categories, each one of them tends to affect one another:
6.1 Cultural Factors: There is a lack of awareness amongst large groups of people within society, including planners and decision makers about the historical value and significance of their digital documentary heritage. This, in turn, leads to obliviousness to perform adequate and proper keeping of those documents with a consequent loss of heritage. Although digital information production is considered valuable, there is not enough awareness about its preservation. In 2003, a US survey carried out by the Cornell University Library found that the main menace to digital materials was the lack of policies and plans inside their institutions to carry out this task. In developing countries, the situation could be worse.
6.2 Technological Factors: Technological factors are mainly related to obsolescence of computers, storage devices and media, changes in operating systems, formats, programs, interfaces, reading and reproducing devices, emerging standards, lack of interoperability among computing devices. Moreover, issues related to information security must also be addressed. This has to do with the relationship among threats, risks, vulnerabilities, impacts, and control measures on digital objects. Libraries, with limited technologies and technological dependence factors, should gear up to cope with these obsolescence and security problems.
6.3 Legal Factors: Preservation of digital contents is not an easy task for libraries. It is strongly associated with legal factors such as copyrights and IPR. Some of the questions that need to be answered include: Who is legally responsible for keeping every document collection or archive for the future? Who has the legally eligible or competence to perform that task? Will it be possible to make these documents accessible in future? National libraries and archives are currently trying to balance their responsibilities of receiving, keeping and providing access to documents and the growing restrictions on distributing them, mostly in electronic formats.
6.4 Methodological Factors: These factors are associated with the tools and standards that are used for appraisal among the different materials, selection and disposal, logical storing and future retrieval of documents. The digital document with a simple set of descriptive metadata like author, title and keywords are not enough for proper future retrieval of digital documents. A new set of metadata that allow hyperlinks and contextualize description of the document in relation with other documents, enhancing its reuse, search, linking, weighting, integration, data mining and interoperability with other programs that might be used in future. If these factors are not taken into consideration then the technological preservation effort will be of limited use inspite of complexity involved and cost.
6.5 Economic Factors: Preservation is an on-going process. Therefore, current, short, and long term costs and funding are important issues to deal with before and during a preservation project, in order to maintain their feasibility in the long term. These includes: cost of digitizing (cost of scanning and/or producing a digital original), cost of editing (to prepare, assemble, alter, adapt, refine or bring about conformity to a standard certain digital document), cost of register (to add set of metadata pertinent to the digital object), cost of storing (cost to maintain in storage devices in or off-line a digital object for a given time) and cost of updating (cost to copy, update, refresh, convert, and reshape digital documents to fulfill new requirements).
6.6 Social Factors: These factors are associated with usability, accessibility and security aspect of digital preservation. The future generations should have effective and efficient access to the information that are preserved. There is no use in preserving digital documents if no one or just a few users will have access to preserved documents. Assuming copyrights, privacy rights, and other legal issues are observed, the future challenge will be how to make this information available to as many people as possible through several generations. The social issues should be addressed and taken into consideration while defining digital preservation policy.
Figure 2: Factors of Digital Preservation
7. Digital Preservation Strategies
The goal of digital preservation strategies is to achieve consistency in the management of digital records. The purpose is to ensure that access to digital archives can be maintained indefinitely. The preserved digital objects should be identical in all essential respects to the original digital objects. It is important to understand what is ‘essential’ in order to protect those aspects of a digital record and to measure the success of preservation interventions. UNESCO’s Guidelines for the Preservation of Digital Heritage (2003) group these strategies under the following four categories:
7.1 Short-term Strategies
Short-term digital preservation strategies are likely to work for a short period of time only. These strategies include:
Figure 3: Short-term Strategies
7.1.1 Bit-stream Copying
Bit stream copying is referred as “backing up data”, or “mirror image backup”, which involves the backup of all areas of a computer hard disk drive or another type of storage media making an exact duplicate of a digital object. Bit stream copying is not a long-term maintenance technique, since it deals only with the question of data loss due to hardware and media failure, whether resulting from normal malfunction and decay, malicious destruction or natural disaster. It should be considered the minimum maintenance strategy for even the most lightly valued, short-lived data.
7.1.2 Refreshing
Refreshing is the transfer of data between two types of the same storage medium, with no change, whatsoever, in the bit-stream. For example, transferring census data from an old preserved CD to a new one. This strategy may need to be combined with migration when the software or hardware required to read the data is no longer available or is unable to understand the format of the data. Refreshing is a necessary component of any successful digital preservation project. It potentially addresses both decay and obsolescence issues related to the storage media.
7.1.3 Replication
Replication is a method of creating duplicate copies of data on one or more systems. The Data that exists as a single copy in only one location is highly at risk to software or hardware failure, intentional or accidental alteration, and environmental disaster like fire, flooding, etc. Digital data is more likely to survive if it is replicated in several locations. Thus, the intention of replication is to enhance the longevity of digital documents while maintaining their authenticity and integrity through copying and the use of multiple storage locations.
Bit-stream copying is a form of replication. LOCKSS (Lots of Copies Keeps Stuff Safe) is a consortial form of replication, while peer-to-peer data trading is an open, free-market form of replication. CLOCKSS supports the traditional model of preservation whereby individual libraries build and maintain local collections of journals.
7.1.4 Technology Preservation or Computer Museum
Technology preservation is the maintenance of the hardware and software platforms, which support a digital resource, if adopted as a preservation strategy. It needs regular cycle of media refreshing. Maintaining obsolete technology in usable form requires a considerable investment in equipment and personnel. It is also called the “computer museum” solution.
7.2 Medium-toLong-term Strategies
Figure 4: Medium to long term Strategies
7.2.1 Migration
Migration is the process of transferring digital information from one hardware and software setting to another or from one computer generation to subsequent generations, without change in their intellectual content. The purpose of migration is to preserve the integrity of digital objects and to retain the ability for clients to retrieve, display, and use them in the face of constantly changing technology. For example, moving files from an HP-based system to a SUN-based system involves accommodating the difference in the two operating environments. Migration can also be a format-based, to move image files from an obsolete file format or to increase their functionality.
7.2.2 Canonicalization
Canonicalization can be defined as a canonical form for a class of digital objects that, to some extent, captures the essential characteristics of that type of object in a highly determined fashion, when it is converted from one format to another. This form could be used to algorithmically verify that a converted file has not lost any of its essence. In particular, it provides a language or framework for understanding the effects of file translation. Unlike text, there are many ways in which an image can be stored: by rows or by columns; in planes; compressed or uncompressed. Different formats in use today make different choices. All of them can be mapped to the relevant canonical format. As long as the canonical form is used, integrity and authenticity can be managed independently of the peculiarity of specific representations and their choices about how to store the image. Clifford Lynch (1999) is recognized as the first person to introduce the idea of canonicalization.
7.2.3 Emulation
Emulation is a method of preservation that can preserve the functionality and the ‘look and feel’ of digital objects that migration may not be able to achieve. This method attempts to simplify digital preservation by eliminating the need to keep old hardware working. Emulation combines software and hardware to reproduce in all essential characteristics the performance of another computer of a different design, allowing programs or media designed for a particular environment to operate in a different, usually newer environment. Emulation requires creation of emulator programs that translate code and instructions from one computing environment so it can be properly executed in another. It is cost effective solution in certain circumstances for the reason that producing one emulator could be much cheaper than migrating every digital object in an archive.
Intellectual Property Rights (IPR) issues are also involved in emulating either operating systems or applications. There is a need of trusted organization that can undertake the work and make this available for others to use for effective emulation.
Figure 5: Issues in Emulation
7.3 Investment Strategies
Investment preservation strategies involve investment of efforts at the time of archiving digital materials. Such strategies include:
Figure 6: Investment Strategies
7.3.1 Restricting Range of Formats and Standards
Preservation programmes may decide to only store data in a limited range of formats and standards. This can be achieved either by only accepting material in specified formats or by converting material from other formats before storage. All digital objects within an archival repository of a particular type (e.g., colour images, structured text) can be converted into a single chosen file format that is thought to embody the best overall compromise amongst characteristics such as functionality, longevity, and preservability. For, example most of the textual and graphical information can be converted into PDF format. The UK Archaeology Data Service (ADS), for example, specifies a preferred (but not exclusive) range of formats for deposit and provides guidelines for depositors on creating or preparing materials for submission.
The strategy does not necessarily solve the access problem unless the obsolescence of formats and standards used are handled effective through some other strategy. This strategy imposes serious restrictions on the range of materials that a preservation programme can accept. Moreover, the process of conversion from original format may cause some loss of essential elements.
7.3.2 Reliance on Standards
Reliance on Standards seeks a way to “harden” the encoding and formatting of digital objects by adhering to well-recognized standards and favouring such standards over more impenetrable and less well-supported ones. It is to software what durable media is to hardware. This preservation strategy involves use of open, widely available and supported standards and file formats that are likely to stable for a longer period of time discarding proprietary or less- supported standards. For example, if JPEG2000 becomes a widely adopted standard, the sheer volume of users will guarantee that software to encode, decode, and render JPEG2000 images will be upgraded to meet the demands of new operating systems, CPUs, etc. similarly, majority of digitization programmes choose TIFF (Tagged Image File Format) as an open, stable and widely supported standard for creation of preservation master images and also most publisher use PDF as de facto standard for electronic distribution of their research articles, due to the availability of PDF readers for all platforms. Like many of the strategies described here, reliance on standards may lessen the immediate threat to a digital document from obsolescence.
7.3.3 Data Abstraction and Structuring
Data abstraction involves analysing, and tagging data so that the functions, relationships and structure of specific elements can be described. Using data abstraction, the representation of content can be liberated from specific software applications and be achieved using different applications as technology changes. The technique requires extensive development of tools and methods for analysis and processing in order to correctly represent and tag each type of data, thus making a document application-independence and simplifies the transport of data between platforms and over generations of technology.
7.3.4 Encapsulation
Encapsulation involves retaining a digital object in its original form as a bit stream, and encapsulating it along with instructions and whatever else might be necessary to maintain access to it in the future. Encapsulation is considered a key element of emulation. First, the information that has to be encapsulated comprises the document and its software environment. Central to the encapsulations is the digital document itself, consisting of one or more files representing the original bit stream of the document as it was stored and accessed by its original software. In addition, the encapsulation contains the original software for the document, itself stored as one or more files representing the original executable bit stream of the application program that created or displayed the document. A third set of files represents the bit streams of the operating system and any other software or data files comprising the software environment in which the document’s original application software ran. Rothenberg (1995) provides a diagram which shows how much needs to be encapsulated:
Figure 7: Encapsulation
Open Archival Information System (OAIS) Model represents a form of encapsulation, in which the digital object is packaged together with the Representation Information needed to interpret the bits appropriately for access; and Preservation Description Information, which includes information on provenance, context, reference and fixity.
Figure 8: Open Archival Information System (OAIS) Model
7.3.5 Software Re-engineering
The function of application software associated with Digital preservation process gets most affected by changes in technology during regular migration. However, software reengineering may offer a number of strategies for transforming software and data formats. Some possibilities include: Adjustment and re-compiling of source code for a new platform: it requires considerable time and effort by the compilers or interpreters to adjust the existing code or re- coding in another programming language or reverse-engineering of compiled code into higher level code and porting that to the new platform or translation of compiled binary instructions for one platform directly into binary instructions for another platform.
7.3.6 Universal Virtual Computer
A Universal Virtual Computer (UVC) is a virtual machine (VM) specially designed for preservation of digital objects, based on emulation. This method allows digital objects to be reconstructed in its original appearance anytime in the future and is completely independent of the architecture of the computer on which it runs. Users could create and save digital files using the application software of their choice, but all files would also be backed up in a way that could be read by the universal computer. The central idea of the UVC-based preservation method is based on the following four different components:
i) Universal Virtual Computer;
ii) UVC program (format decoder);
iii) Logical Data Schema (LDS) with information type description; and
iv) Logical Data Viewer.
A UVC program decodes the file format of a digital object. This format decoder program runs on the UVC, which is the platform-independent layer, independent of future hardware and software changes. Executing the format decoder delivers element tags, which hold specific information about the content of the data in a technology-independent manner. These elements build the Logical Data View (LDV) of the data, which is quite similar to XML. The LDV is a visible representation of the LDS, describing the structure and meaning of the tags as parts of a specific information type.
All these components are controlled by a Logical Data Viewer simply called viewer (Figure 9). For reconstruction, the viewer starts the UVC and feeds it with the data of the digital object to a format decoder running on top of the UVC. In return, it retrieves an LDV and reconstructs a specific representation of the original object’s meaning.
Figure 9: UVC-based preservation method
7.4 Alternative strategies
Alternative strategies to digital preservation include taking analogue backup of document (print or microfilm) or recovering data from obsolete digital media.
7.4.1 Analogue Backups
Figure 10: Alternative Strategies
Analogue backups are a method of conversion of digital objects into analogue form e.g., taking high-quality printouts or the creation of silver halide microfilm from digital images. An analogue copy of a digital object can, in some respects, preserve its content and protect it from obsolescence, without sacrificing any digital qualities. Text and monochromatic still images are the most amenable to this kind of transfer.
The limitations of analogue backups and their relevance to only certain classes of documents are highly expensive, the technique only makes sense for documents whose contents merit the highest level of redundancy and protection from loss.
7.4.2 Digital Archaeology or Data Recovery
Digital archaeology involves retrieving data from obsolete software or hardware environments, or the wealth of other removable media, which have been used since the earliest days of computing. There are a growing number of specialist third party services offering to carry out digital archaeology, and it has been shown to be technically possible to recover bit streams from damaged and obsolete media. Only trained specialists will be able to extract data in this way, using special hardware and software; for instance, in order to extract data from relatively recent, damaged, media, the British Library makes use of ‘forensic’ hardware, designed for use by law enforcement, intelligence, corporate and military agents who need to recover digital evidence from hardware in a way which ensures its authenticity.
Digital archaeology is an emergency recovery strategy, not a pro-active and preventative approach to long-term preservation, because:
• It is much more costly than the other major preservation strategies and is unlikely to be more cost-effective for any other than the most highly valued digital resources;
• Relying on digital archaeology means that the digital material that is not necessarily highly valued (yet might still be useful to some researchers or have important evidential value) might not be rescued;
• If there is no accompanying metadata or documentation, it may be impossible to assess the value or usefulness of obsolete digital resources until after rescue has taken place, which may turn out to be a waste of resources;
• Digital archaeology techniques are unlikely to be successful in all cases; and
• It requires a certain amount of technology preservation (see above).
7.5 Combinations
Even with good planning, a single preservation strategy may fail, leaving the programme with no means of access. Several digital preservation projects may be used to cover the range of objects and characteristics to be preserved.
For example:
• Standards such as TIFF for image collections are often chosen in preparation for eventual migration to other standard formats over the long-term;
• The VERS strategy couples the use of standards (PDF, XML) to the future use of viewers and the likely migration of XML encoded metadata in the future;
• Persistent archives (Moore, 2000) use data abstraction with the view to eventual migration – migration of the data, the mark up system and the supporting software, and upgrading of hardware;
• The Universal Virtual Computer (UVC) approach combines data abstraction with rules for migration of data objects at the point of access, and an emulation approach for software objects. The “durable encoding” approach adds the use of fundamental standards for encoding data, including encoding that could be understood by the UVC.
8. Digital Rights Management (DRM) and Digital Preservation
8.1 Copyright and Other Intellectual Property Rights (IPR)
Content owners have copyright on the content that has substantial impact on digital preservation. The IPR issues for digital materials are more complex and significant than for traditional media. If these issues are not addressed, it can hinder or even prevent preservation activities. Simply copying (refreshing) digital materials onto another medium, encapsulating content and software for emulation, or migrating content to new hardware and software, involve activities that can result in infringement of IPR unless statutory exemptions exist or specific permissions have been obtained from rights holders. As both migration and emulation will involve manipulation and changing presentation and functionality to some degree, it is important to establish a dialogue with rights holders so that they are fully aware of these issues and the actions and rights required to ensure the preservation of selected items are obtained from the copyright holders.
8.2 Access and Security
Some of the additional complexity in IPR issues relates to the fact that electronic materials are also easily copied and re-distributed. Rights holders are, therefore, particularly concerned with controlling access and potential infringements of copyright. Technology developed to address these concerns and provide copyright measures can also inhibit or prevent actions needed for preservation. These concerns over access and infringement and preservation need to be understood by organizations preserving digital materials and addressed by both parties in negotiating rights and procedures for preservation.
8.3 Stakeholders, Contract & Grant Conditions, and Moral Rights
Resources in electronic formats are the result of substantial investment by funding agencies, publishers, individual scholars and authors. Each of these stakeholders may have an interest in preservation. Archiving organization are required to seek permissions from them to safeguard and maximize the financial investment, intellectual and cultural value of the work for future generations. Such interests may be manifested through contract, license, and grant conditions or through statutory provision such as “moral rights” for the authors.
8.4 Privacy and Confidentiality
Digital objects are subject to confidentiality agreements like Data Protection Act or similar privacy legislation that protects information held on individual. Privacy and confidentiality concerns may impact on how digital materials can be managed within the repository or by third parties, and made accessible for use.
8.5 Business Models and Licensing
Business models for dissemination of electronic materials and the range of stakeholders who own the IPR has an impact on digital preservation. In most cases, subscribers to electronic resources, particularly electronic journals, do not have its physical possession. Subscribers are, therefore, concerned that publishers consider the archiving and preservation of these works and include archiving and perpetual access to back issues in licensing of these works.
8.6 Legal Deposit
Legal deposit libraries are obviously own major responsibility for digital preservation for documents deposited with them. In UK, the Legal Deposit Libraries Act 2003 (United Kingdom, 2003) is enabling legislation, which will be implemented over time by a series of further Regulations. UK legal deposit law should, over time, be extended to cover digital publications as well as their preservations. Unusually, this law also includes provisions to allow legal deposit libraries to carry out activities necessary to acquire, preserve and make accessible digital publications. Other countries are increasingly extending their legislation. Initially, new laws tended to cover only tangible digital publications (for example, magnetic tape, diskettes and optical discs) or so-called “static” online publications.
Figure 11: Rights Management
9. Summary
The module introduces the challenges and problems of digital preservation with technologies, standards and formats in continuous flux of change. The module defines digital preservation, its scope, need and processes involved that ensure long-term accessibility and usability of digital information. The module elaborates on problems and challenges of digital preservation that can be grouped into three distinct categories, namely: i) Longevity of Physical Storage Media; ii) Technology-related Issues including Technological Obsolescence, Hardware and Software Dependence and Multitude of Formats; and iii) Intellectual Preservation Issues including integrity and authenticity of information. Principles that guide, digital preservation actions with the ultimate goal of providing long-term access to digital content are described briefly in this module. Module elaborates on factors that are involved in long-term digital preservation. These factors can be grouped into six categories, namely i) Cultural factor, ii) Technological factor; iii) Legal Factor; iv) Methodological Factors; v) Economic Factors; and vi) Social Factors.
The goal of digital preservation strategy is to achieve consistency in the management of digital records so as to ensure long-term access to digital archives. These strategies can be grouped under four categories, namely i) Short-term strategies; ii) Medium to long-term strategies; iii) investment strategies; and iv) Alternative strategies. The module also discusses about use of combinations of digital preservation strategies so as to cover a range of objects and characteristics to be preserved. Lastly, the module discusses impact of intellectual property rights and digital rights management on digital preservation.
Reference
1. Arora, Jagadish (2004). Building digital libraries: An overview. DESIDOC Bulletin of Information Technology, 21(6).
2. Ayre, Catherine , Muir, Adrienne(2004). The Right to Preserve: The Rights Issues of Digital Preservation. D-Lib Magazine, 10(3). Available at www.dlib.org/dlib/march04/ayre/03ayre.html
3. Charlesworth, A. (2012). Intellectual Property Rights for Digital Preservation. Digital Preservation Coalition Technology Watch Report, 12-02.
4. Chen, S. S. (2001). The paradox of digital preservation. Computer, 34(3), 24-28.
5. Conway, Paul (1997). Preservation in digital world. Microform and Imaging Review, 25(4), 156-171. Also available online (http://www.clir.org/pubs/reports/conway2/)
6. Cornell University Library (2005). Tutorial on “Digital preservation management: Implementing short-term strategies for long-term problems”. (http://www.library.cornell.edu/iris/tutorial/dpm/index.html)
7. Dartmouth Digital Library Program: Policies(2011). “A Report from the Digital Projects and Infrastructure Group (DPIG)”. Available at http://www.dartmouth.edu/~library/digital/about/policies/preservation.html
8. Graham, Peter S. (1998). “Long-Term Intellectual Preservation”. Collection Management, 22(3/4), 81-98.
9. Granger, Stewart (2000). Emulation as a Digital Preservation Strategy. D-Lib Magazine, 6(10).
10. Hedstrom, M. and Montgomery, S. (1998). Digital Preservation needs and requirements in RLG Member Institutions. Mountain View, CA: RLG. (http://www.rlg.org/preserv/digpres.html)
11. Jones, M., & Beagrie, N. (2001). Preservation management of digital materials: a handbook (p. 67). London: British Library.
12. Kirchhoff, Amy J. (2008). “Digital preservation: challenges and implementation”.
Learned Publishing, 21, 285-294.
13. Ludasher, B., Marciano, R., & Moore, R. (2001). Preservation of digital data with self- validating, self-instantiating knowledge-based archives. SIGMOD Record, 30(3), 54-63.
14. Lynch, Clifford (1999). Canonicalization: A Fundamental Tool to Facilitate Preservation and Management of Digital Information. D-Lib Magazine, 5 (9).
15. Lynch, Clifford (1994). The integrity of digital information: Mechanics and definitional issues. Journal of the American Society for Information Science, 45, 737-44.
16. Moore R. et al (2000). Collection-based persistent digital archives – Part 2. D-Lib Magazine, 6(4). (http://www.dlib.org/dlib/april00/moore/04moore-pt2.html)
17. Moore, R. et al (2000). Collection-based persistent digital archives – Part 1. D-Lib Magazine, 6(3). (http://www.dlib.org/dlib/march00/moore/03moore-pt1.html)
18. Rothenberg, Jeff(1995). Ensuring the Longevity of Digital Documents. American, 272(1), 24–29.
19. Russell, Kelly (1999). “Digital Preservation: Ensuring Access to Digital Materials Information the Future.” CEDARS, www.leeds.ac.uk/cedars/Chapter.htm
20. Smith, Abby (2003). Digital Preservation: An Individual Responsibility for Communal Scholarship. EDUCAUSE Review. Available online at https://net.educause.edu/ir/library/pdf/erm0338.pdf
21. Trusted Digital Repositories: Attributes and Responsibilities. RLG/OCLC Report (2002). Available online http://www.oclc.org/content/dam/research/activities/trustedrep/repositories.pdf?urlm=1 61690
22. UKOLN (2008) “An Introduction to Digital Preservation: Supporting The Cultural Heritage Sector”. Available at www.ukoln.ac.uk/cultural- heritage/documents/briefing…/briefing-31.doc
23. UNESCO’s Guidelines for the Preservation of Digital Heritage (2003). Available online at http://unesdoc.unesco.org/images/0013/001300/130071e.pdf
24. United Kingdom (2003). Legal Deposit Libraries Act 2003. Available online at http://www.legislation.gov.uk/ukpga/2003/28/contents
25. van der Hoeven, J. R., Van Diessen, R. J., & van der MEER, K. (2005). Development of a Universal Virtual Computer (UVC) for long-term preservation of digital objects. Journal of Information Science, 31(3), 196-208.
26. Voutssas, Juan (2012). “Long-term digital information preservation: challenges in Latin America”. Aslib Proceedings, 61(1), 83-96.
27. Wikipedia. Digital preservation (http://en.wikipedia.org/wiki/Digital_preservation) (last visited on 1st March, 2014)