9 Digital Library Planning and Implementation
H G Hosmani and Yatrik Patel
I. Objectives
Objectives of this module is to impart knowledge of the following aspects of digital library planning and implementation:
• Development of project plans and budgets;
• Identification of sources for funding and grants;
• Strategic planning for collection development;
• Infrastructure planning and implementation;
• Manpower skill development; and
• Implementation of digitization and its sustainability.
II. Learning Outcomes
After going through this lesson, learners would attain knowledge about setps involved in planning of digital library including identification of requirement, feasibility study,implementation strategy involving collection development, infrastructural and human resource planning,financial planning including cost involved in subscriptions, hardware and services. Learners would also gain knowledge on digital library implementation hosting options, providing services, promotion and handling IPR issues.
III. Structure
1. Introduction
2. Planning Digital Library
2.1 Identification of Requirement
2.2 Feasibility
2.3 Implementation Strategy
2.3.1. Collection Development
2.3.1.1 Digital Surrogates of Resources/Metadata
2.3.1.2 Born Digital Resources
2.3.1.3 Third Party Data Sources
2.3.2 Project Plan
2.3.2.1.1 Managerial Planning
2.3.2.1.2 Hardware and Software Infrastructure Planning
2.3.2.1.3 Human Resource Planning
2.4 Financial Planning
2.4.1 Planning and Consulting Costs
2.4.2 Purchase of Hardware, Software and Networking Equipment
2.4.3 Telecommunications Costs
2.4.4 Digitization Costs
2.4.5 Subscription Costs
2.4.6 Subscription Costs
2.4.7 Operating (ongoing) Costs
2.4.8 Training Costs
2.4.9 Developing a Preliminary Budget
3. Digital Library Implementation
3.1 Techniques for Digitization
3.2 Hosting Platforms
3.2.1 Self-Hosting
3.2.2 Mirrored Hosting
3.2.3 Cloud/Shared Service
3.3 Promotion and Provision of Services
3.4 Intellectual Property Rights
4. Summary
1. Introduction
The developments in ICT, especially the WWW and Internet, have led to the creation large number of digital library projects worldwide. Because of wide range of resources available on the World Wide Web, accessing Internet has become part of daily life and due toincreasing number of resources users are accessing indexed& non-indexed collections as well as full-textresources available on the Internet. Due to information explosion, adaption of new technology is needed to support this new search and indexing functionality as well as storage of full-text content in digital world.Basically the purpose of digital libraries is to provide seamless access to the information stored digitally with provision of efficient and effective search. Planning and implementation of digital library project is an important activity to meet these basic purpose of digital library.
Digital libraries differ significantly from the traditional libraries because they facilitate on-line access to electronic full-text documents,images, and audio-visual contents.From the user point of view, digital libraries are systems that provide a community of users with coherent access to a large, organized repository of information and knowledge, without any physical restrictions.
Planning is a systematic process of creation or development of a project. In relation to digital library planning, a thorough study of the library’s existing collection as well as the library’s vision is necessary to facilitate preparation of a good technology plan and project proposal.
2. Planning Digital Library
Objectives of digital libraries in the context of planning and implementation can be summarised as follows:
• Selection of content to be added to the digital library collection and digitizing them in appropriate digital format;
• Assigning metadata to the digital documents being added to the collection;
• Indexing and storageof documents and associated metadata for enabling its efficient search and retrieval;
• Assign a unique identifier to each digital object available in a digital library; and
• Developing a web-based interface to enable browsing, searching, retrieving and viewing the contents of the digital library.
2.1 Identification of Requirement
Before commencement of a digital library project, it is essential to articulate its requirement and purpose as concisely as possible.Its need and relevance, targeted community of users and the process involved in building of digital library need to be spelled out clearly. Once the basic premises for setting-up a digital library is established, it is then possible to move on to a number of other practical questions that must be answered. These questions include:
• What type of resources will it contain?
• How is it expected to grow?
• Who is going to use it and how?
• How can resources be protected against accidental or intentional modification?
• How will access and intellectual property rights be managed?
• What systems does it need to interact with?
• What special capabilities does it need?
• What resources will be required to create and maintain it?
Answers to these questions will broadly clarify how the digital library will work, functionality required, key players and expertise that in necessary to move the project forward. It is critical to define the purpose of the digital library as precisely as possible before attempting to identify an appropriate technical infrastructure, acquire materials, funding and staff resources. Although availability of financial and technical know-how would finally determine the successful execution of a digital library project, overall vision and its articulation is the foundation stone for building a digital library.
Just as the physical libraries require a robust collection development process, so do the digital library. The subject area of the digital documents, their format, target user community, user interfaces etc., inevitably effect on how the digital library will be used and how materials in the digital library will be accessed, and used over a period time.
2.2 Feasibility
At the time of planning and before implementation of a digital library, it is necessary to conduct a feasibility study of the digital library project. The feasibility should be established not only in terms of availability of tools and expertise, but also other factors like volume/numbers of documents, target audience, demand for material to be digitized and user’s expectations and requirements. The study should also assess whether the library can take up the project in-house or should it be out sourced to external vendors.
2.3 Implementation Strategy
Planning involves identification of various tasks related to digital library collection, including i) development strategies; ii) defining financial, infrastructural and manpower resources;and iii) interface development and formulating a timeline for accomplishing these tasks. In case, digital project is large, one has to conduct a feasibility study to assess the viability of the project before detailed planning. The outcome of the feasibility study could be a formal proposal for obtaining management approval or grant for the project.
2.3.1 Collection Development
The first step in planning a digital library collection development project is to specify the need for creating the digital library collection, its purpose and target user community. The purpose could be preservation and improved access to rare, fragile or deteriorating materials, improving visibility of certain material or facilitating re-use of documents.
Establishing a collection development policy is a critical step that is often overlooked. As digital resources ultimately consist of files containing a series of 0s and 1s on a hard disk.There is a tendency to consider them as mutually compatible “electronic information.” However, different types of digital resources present different challenges.As such, digital library must satisfactorily address the issues presented by the resources it will host. Otherwise, there is a substantial risk that digital objects hosted in a digital library may not be accessible for long, and the information could even be lost over a period of time.
There is a need to define the types of resources in the digital library collections and the key attributes of this sources ie. staff publications, working papers, theses, dissertations, project reports, audio and video lectures, songs and musical scores, etc. There is also the need to specify what portion of the material is to be digitized and assess copyright restrictions to be imposed on them. It is also required to define format types of digital objects and operations that users can execute on them.
The conversion of the source materials available in hardcopy into a digital format is a major task. There should be a clear cut statement about the related requirements and their processes, i.e. how to convert the source material into required digital format. What are the digitization requirements? The workflow involved in digitizing the source material should also be defined and documented.
There is need to define how the project is going to be implemented and major milestones and time requirements.
2.3.1.1 Digital Surrogates of Resources/Metadata
It is theoretically possible to digitized all existing collections available in a library. However, the scale of operation, copyright and value are the three factors that calls for selection of materials for digitization and delivering subsets of collections in the following three formats:
• Surrogates of rare items;
• Digitized surrogate collections assembled from multiple repositories; and
• Collections assembled specifically to be digitized.
The creation of a digital archive will significantly improve the accessibility and usability of the information contained within the collection whilst conserving the original materials, which may get deteriorated over a long period of time. Thus, a collection being built for use by future researchers is being made immediately available, using digitization as a deliberate strategy in acquisition.
2.3.1.2 Born Digital Resources
The number and scale of born digital resources are growing from scholarly journals to new fiction, from data sets and satellite images to digital video and computer- generated graphics, several of them are being preserved. It is arguable that these pieces of digital resources remain individual items rather than forming a coherently built digital collection. Collections of learning objects being created by several universities is an example of digital resources that are not created in a planned way to be part of a coherently built digital library. Boezerooy (2003), for example, gave a comprehensive overview of the Australian experience which demonstrates that existing born digital resources are not always created with library advice or assistance or indeed even with long-term preservation in mind. Born digital resources should be assessed and considered at the time of planning the collection building.
2.3.1.3 Third Party Data Sources
Third-party data, as the name implies, is data acquired from other sources and websites. Normally, there are many companies/publishers that sell third-party data, and it is accessible through many different avenues through subscription of e-journals, e-books and open access e-resources. There are number of data-management firms which aggregate information from sites across the web and delivered them to the users based on their interests on particular topics as per their requirements. Appropriate consideration should be given keeping in mind the user requirement at the time of collection development.
2.3.2 Project Plan
After selecting collections for digitization, assessing the strategical advantages of digitization and evaluating the costs to the institution, the next activity to be undertaken is the development of detailed project plans.
The planning process is the crucial ‘first step’on the digitization path and includes articulating the project’s goals and objectives, outlining workflows and developing a budget. These activities inform subsequent attempts to gain funding and set longterm objectives for digitization programmes.
The planning process should be a collaboration of many stakeholders,including users, information providers and staff from all parts of the organization. A broad consultative process at the outset will shape the direction of the project and build consensus.
Stakeholder studies, such as the user assessments, are important tools in developing project plans, as are risk assessments, institutional and infrastructure 5 inventories,selection criteria for digitization and job descriptions.The first objective is to develop a detailed project plan, which will help the implementer to assess whether or not the project can be accomplished,and extend the evidence-base support to the project. This document should be able to define clearly the objectives, goals, deliverables and priorities for the project, and express the vision of the project as it fits into the strategic objectives of the institution as a whole.
2.3.2.1 Managerial Planning
Managerial planning would essentially involve the process of sequencing various tasks, their time management and project monitoring, activities that need managerial planning. This may include conducting feasibility study, procurement of equipment, recruitment of manpower, digitization, IPR and right management issues, integration and organisation of content, finding market, launching and marketing of services, flow diagrams PERT, CPM and SWOT analysis and other management techniques may be deployed at this stage.
There is a need to identify and designate a project manager to lead the implementation of the digital library project. For large digital library projects, it is essential to have a full-time project manager for the project period.
The Project team must be committed to fulfil defined objectives and goal of setting up of digital library and expectations of users and institutional authorities. Project team should have well-defined plan for escalation and decision-making. Members should participate in discussions and should be open to face conflict which should be openly and honestly discussed. As such, it requires appropriate skills and levels of authority to handle the activities related to planning of digital library project. The team should communicate effectively and frequently with each other as well as with people outside the project team. They should respect team confidence and resolve conflicts professionally. Cohesive teamsshould communicate project information only, be it expected or not, good news or bad.
The team should establish ground rules at the start of the project and conscientiously follow these rules throughout the life of the project. Rules represent a ‘contract’ which states how team members will work together. These rules should clearly define communication, attitudes and behaviours that team members shouldvalue.
2.3.2.2 Hardware and Software Infrastructure Planning
The requirements of hardware and software for the server and network components may be worked out with their financial implications and network components. Connectivity and bandwidth required for hosting the digitized collection may also be planned. For details of software and hardware infrastructure required for a digital library refer to Module No 5of paper Digital Librariesin the moduleTechnical Infrastructure of a Digital Library.
2.3.2.3 Human Resource Planning
Human resource has to be worked out in terms of staff time involved, training of existing staff and recruitment of new staff with desired skills. Human resource planning would depend on whether the library is going for in-house digitization or for outsourcing the process of digitization. The project management continues to be an important issue even if the digitization work is out sourced. The management of the project may be divided in groups with their responsibilities well defined. Communication between the groups and a reporting structure may be laid down to facilitate unambiguous communication among the groups and the staff.
Since the entire job of developing and or maintaining a digital library is a highly skilled one, there should be no compromise or slackening in the quality of intake or selection of manpower for the job. Also, even if good quality manpower is positioned, they usually need training to upgrade and sharpen their skills for this job. As such, necessary training, should form an important component of the execution of the project.
2.4 Financial Planning
Financial planning and approximation of the cost that is close to real implementation scenario is a very important aspect in overall planning of digital library. While doing financial planning, following aspects must be considered:
• Planning and consulting costs;
• Purchase of system hardware and software;
• Purchase of network-specific hardware, software, and cabling;
• Site preparation including cabling, furnishing, partitioning, etc.;
• Telecommunications;
• Conversion of manual records into machine-readable form or processing of existing electronic data for use in the new system;
• Access, and subscriptions where appropriate, to external databases and system;
• Internet access;
• Ongoing operating costs;
• Augmenting the existing system hardware and software; and
• Initial and ongoing training for system operators and library staff.
When the system is shared, it is standard practice to allocate these costs between libraries or among consortium members.
2.4.1 Planning and Consulting Costs
Planning and consulting costs include the direct and indirect costs.A consultant can be hired to assist the process with long-range technology planning and to involve the staff in preparing for and participating in all aspects.
2.4.2 Purchase of Hardware, Software and Networking Equipment
Initial purchase costs include acquiring the initial system hardware, software, manpower training and of preparing site or sites for the equipment.
• Hardware covers the cost of server or servers, disk drives, workstations, printers, routers, switches, and machine peripherals.
• Software covers the licensing of the system vendor’s software providing the system’s functionality for digital library setting up.
• Site preparation includes identifying space for the equipment and assuring proper room ventilation and, as necessary, air conditioning, etc.
Vendor provided training costs must also be considered when the system is first installed, as well as the costs of connecting to the Internet.
Purchase of network-specific hardware, software, and cabling requires the design and implementation of a local area network (LAN) on which the system will run. This includes selection of appropriate wiring network architecture, a network operating system compatible with the system selected, and firewall hardware and software. Most of the digital libraries are providing access to the wider world of information available via the Internet.
2.4.3 Telecommunications Costs
These costs are the concern only for shared systems or multi branch sites. All libraries must now factor in the costs of being a gateway to global information resources. In addition to telephone company line connections, there are expenses associated with equipment, such as switches, routers and hubs, to connect to the Internet and to the external databases from specific vendors. When a system is shared by multi users at different sites, this equipment is also used to link up each site’s local area network into a wide area network for access to the system’s servers and workstations.
2.4.4 Digitization Costs
Costs of digitizationare associated with the scanning of text, metadata creation and staff costs, etc. or expenses towards outsourcing of the entire project to an external agency and its monitoring. There are costs involved in the process of content migration, i.e.when moving from an old system to a new one. These costs include the processing of existing content, so that they can be used by the new system.
2.4.5 Subscription Costs
Databases and systems external to the library are now accessible on the Internet. These databases are easily and seamlessly searchable. It contains not only metadata but also the full-text of article and books, pictures and other Images, and audio and full-motion video. The cost of accessing these database, including subscription and other fees, must now be factored into the budget at planning stage.
2.4.6 Operating (ongoing) Costs
Ongoing operating cost include hardware and software maintenance fees and cost for utilities, miscellaneous supplies, and telecommunications. Major ongoing costs are the salaries and benefits of staff assigned or hired to manage and run the system. If the task of maintenance of system or part of it are contracted out to the outside agency, these costs should also be accounted for.Cost of Internet bandwidth and other ongoing cost should also be included in the calculations.
Additions and augmentation to the existing computing Infrastructure may also be required to maintain performance specifications so as to accommodate new user or to allow for additional functionality.
2.4.7 Training Costs
Training costs extend beyond those costs associated with vendor provided training on the integrated system. Typically, vendors expect their library clients to maintain certain levels of technological competency. For example, staff members being trained must be familiar with the Windows environment, while library staff who are the system operators must, at minimum, know how to install, maintain, and troubleshoot network servers and workstations. Libraries must be prepared to fund such training initially and budget for the continuing education or both library staff and system operators.
If a system’s costs are shared by two or more users, these costs may be divided equally or assigned on a proportional basis determined by a mutually agreed-upon formula.
In a consortium, the responsibility for some of these expenses is borne by the individual library, such as purchase of local workstations, printers, and telecommunications devices, and others are borne by the consortium. These consortium expenses are then divided among the individual members as annual assessments through a cost allocation formula. Traditionally, cost allocation formulas were developed based upon activity or usage levels, represented by such factors as circulation count, number of users, or system utilized. Formula based on these criteria can be difficult to develop and maintain because they are based on variables that are subject to frequent change. An alternative is to develop a membership assessment upon annual budgetary target goals determined by the participants or through a formula driven by less subjective variables, such as the number of workstations operating on the system.
2.4.8 Developing a Preliminary Budget
Technology plan should also include a proposed budget, which can be the basis for the preparation of the annual budget including financial resources that are available. If the required financial resources are not under available, the proposed budget will form the basis for a special request to the funding source.
The cost option information gathered for planning will allow to present general budget estimates for each proposed component of the plan and to document the cost proposal in detail as it is reviewed by funding authorities.
The identification of technological options, may involve a variety of activities such as:
• Reading journals and other reference works;
• Having informal discussions or meetings with potential providers of services and systems;
• Visiting other libraries and talking with other librarians;
• Commissioning a consultant’s report;
• Gathering information through the use of formal requests to vendors; and
• Gathering information through the web.
In general, discussions with other librarians via telephone or online and, if necessary, visits to other libraries, are most useful in identifying realistic options and costs for library. If a system or service is already in use by another comparable library, it can noted as to how it performs in the library and how much it will cost. However, each variation in circumstances impacts the cost; the basic statistical profile prepared will make it easier to identify the difference as examined in other libraries.
A consultant’s report can be a valuable source of information on options and their approximate costs. Consultants are frequently used at this stage and are generally worth the investment if affordable. If not affordable to employ a consultant, one can still do a good job of identifying options and costs, and the extra time spent gathering this information will increase the knowledge and understanding of the various technologies.
Once general options have been identified, the next step is to begin gathering more specific information from potential vendors.
Finally, there is no guarantee that funding source will provide the resources to implement the plan. However, there are less chances to receive new resources without a well-prepared and well organized technology plan.
3. Digital Library Implementation
Setting up a digital library mainly requires sources of content in digital form, whether digitized or born digital content. In most applications, a digital library becomes an integral part of the services of a library. Different from the traditional library services, it provides access to digital collections and is built, managed and made accessible to make readily and economically available for use.
Digitization addresses the following three main needs of libraries:
• Preserving the documents;
• Making the documents more accessible; and
• Reusing the documents.
The planning and implementation phases of a digitization project are crucial to its eventual success (Ming, 2000). Because decisions made during these phases play a key role in determining the sustainability and usefulness of the electronic resources created. Basically, the planning phase consists of tasks related to building digital library collections, defining strategies for performing these tasks, identifying required resources, and determining a timeline for carrying out these tasks. On the other hand, the implementation phase involves actual steps required to set up a digital library collection. Freely available open source digital library software packages can be used to build a digital library.
The implementation phase involves several steps as follows (Sitts, 2000), (Smith, 2001):
• Establishing the digitization team;
• Setting up the information technology infrastructure;
• Procuring and installing digital library software packages;
• Finalizing policies and specifications;
• Completing arrangement of workflow for digitization;
• Creating the online digital library collection;
• Obtaining copyright permissions; and
• Providing access to the digital library collection.
3.1 Techniques for Digitization
Digitization of documents for a digital library is generally accomplished in six stages,including Registering, Scanning, Optical Character Recognition (OCR), and Proof-reading. Formatting and producing the Final Version (Smith, 2001).
While doing digitization, it is to be noted that most smoothing algorithms reduce background noise and improve the appearance of scanned documents, they are destructive to text and other data on a document image. Therefore, colour document images need to be handled using specific smoothing filters for both lossy and lossless compression.
OCR translates images to a document format but it causes to lose of layout, images and colour of the original image. In a searchable PDF, the textual content extracted via OCR is put behind the image so search indexers can see it and Acrobat Reader allows selecting it as text. Mass adoption of PDF in most digital software packages make searchable PDF’s the ideal format to store digitized paper.
Automatic classification of scanned images reduces the time spent for each image. RAW(ReadAccess and Write) formats retain image information that is normally lost when capturing to common lossy image formats. It allows post-processing with minimal loss of quality. However, this formatcannot be viewed by most imaging applications and require much space than the common image formats, making it difficult to transmit over networks or e-mail. it can be preferred for rare and valuable documents. For further details on this aspect, please refer to Module No 10 from Digital Librariesin the paper Subject Digitization Part-I.
3.2 Hosting Platforms
The Institute should plan to host the digital library content on its own server or any other mechanism to reduce the cost and Internet traffic. Option for hosting, digital library are as follows:
3.2.1 Self-Hosting
Institute can plan hosting of digital content at their own premises, however, this will require help of a consultant or expertise within the institution. Self hosting requires preparation of site in terms of power backup, environment control, fire protection, access control, etc. The hosting infrastructure (Servers, Storage, Networking and Bandwidth)should also be planned after analysing service requirements, expected load and its sustainability. Self-hosting also requires establishment of a plan for day-to-day maintenance as well as qualified manpower to handle these operations.
3.2.2 Mirrored Hosting
Institute can also make mirroring of digital library contents and replica can be hosted on another site or other servers maintained by some other commercial vendors. This will helpin reducing the costly Internet traffic. Mirror sites also increase the speed with which files or Web sites can be accessed: users can download files more quickly from a server that is geographically closer to them.
3.2.3 Cloud/Shared Service
Cloud based services will provide highly scalable infrastructure and technical expertise to deliver a scalable, flexible, and reliable platform for hosted applications. All the infrastructure assets required to bring a client’s applications to market are provided and maintained, including data centre, Internet connectivity, private network, servers, storage, firewalls, and load balancers. It provides a robust platform for growth and ensures that the client’s hosted applications are delivered in a timely manner with the expected quality of service and security.
3.3 Promotion and Provision of Services
The digital library collection should be visible on web and easily accessible. Proper metadata creation with link to full-text articles/contents is a basic requirement. Well defined display formats and other related on-line services in the organization. In addition to, or in the absence of remote online access to the digital collection, there is a need to explore other modes of providing access to the digital collection. These may include:
• Setting up local public access computers on the library Local Area Network;
• Provision of e-mail based services;
• Optical Media (CD/DVD ROM) based distribution of the collection;
• Internet based distribution of collection; and
• Awareness services, and promotions through Web 2.0 technologies(Social Media, Forums, RSS).
3.4 Intellectual Property Rights
Implementers of digital libraries should inevitably be aware of the relevant intellectual property rights(IPR) that apply to the creation, storage and dissemination of digital information sources. Intellectual property rights are like any other property right. They allow creators, or owners, of patents, trademarks or copyrighted works to benefit from their own work or investment in a creation. These rights are outlined in Article 27 of the Universal Declaration of Human Rights, which provides for the right to benefit from the protection of moral and material interests resulting from authorship of scientific, literary or artistic productions.
4. Summary
Digital libraries facilitates users to access electronic versions of full-text documents and their associated images. They bring significant benefits to the users. However, setting up a digital library requires planning and implementation strategy includes budget, collection &development, selection of content, manpower skills development, digitization and sustainability, etc. This module reviews the steps involved in digitization projects and proposes solutions to improve the efficiency of digital library software packages. Considering the fact that digitization projects take a lot of time, effort and money, the solutions proposed in this module may prove to be useful source of reference.
References
- Tedd, Lucy A & Large Andrew (2005) Digital Libraries: Principles and Practice in a Global Environment: K G Saur
- Witten, Ian H et al (2010) How to build a digital library : Morgan Kaufmann publisher
- Hughes, Lorna.M (2004) Digitizing collections: strategic issues for the information manager: Facet publishing
- Andrews, Judith and Law, Dereck (2004) Digital Libraries: policy, planning and practice: Ashgate Gopal, Krishan (2000) Digital Libraries in Electronic Information Era: Authorspress Smith, Kelvin (2007) Planning and implementing electronic records management: A practical guide: Facet Publishing
- Alhaji, I. U. (2009). Digitization of Library Resources and the Formation of Digital Libraries: A Practical Approach.
- Saffady, W. (1995). Digital library concepts and technologies for the management of library collections: an analysis of methods and costs. Library technology reports, 31(3), 221-380.
- Savanur, K. P., & Nagaraj, M. N. (2004). Design and Implement of digital library: An overview. In 4th ASSIST National Seminar, Kuvempu University, Shimoga, Karnataka, India, 30 April-1 May 2004
- Smith, Abbey (2001) Strategies for Building Digitized Collection. Washington, D.C. Digital Library Federation, Council on Library and Information Resources.
- Waters, D. J. (1998). What are digital libraries. CLIR issues, 4(1), 5-6.