25 Digital Library Project in USA

Kannan P

I. Objectives

After reading this module, you will able to

• Impart knowledge on the pioneer of digital library initiative in USA

• Understand early digital library projects and their impact

• Describe the Digital Library Initiative DL-1 & DL-2

• Appreciate the major digital library projects in USA and their achievement

II. Learning Outcomes

After going through this lesson, learners would attain knowledge about major, flagship digital library initiatives taken by various organizations in USA including early digital library projects such as Project Gutenberg, Mercury, Chemistry Online Retrieval Experiment and Association for Computing Machinery (ACM). Learners would acquire knowledge about US Digital Library Initiatives Phase -1 and Phase -2, their funding partners, outcomes and achievements.

III. Module Structure

1. Introduction

2. Pioneers in Digital Library

3. Early Digital Library Projects

3.1 Project Gutenberg

3.2 Early Experiments Digitization of Journal Articles

3.2.1 Mercury

3.2.2 Chemistry Online Retrieval Experiment (CORE)

3.2.3 Association for Computing Machinery (ACM)

3.3 American Memory

4. Digital Library Initiative (1994-1998)

4.1 University of Michigan

4.2 University of California, Berkeley (The Environmental Electronic Library)

4.3 University of California, Santa Barbara (The Alexandria Digital Library)

4.4 Carnegie Mellon University (The Informedia Digital Video Library)

4.5 University of Illinois, Urbana Champaign (Federated Repositories of Scientific Literature)

4.6 Stanford University (Infobus)

5. Digital Library Initiative Phase-2 (DLI-2) 1999-2004

6. Major Digital Library Projects

6.1 The Networked Digital Library of Theses and Dissertations (NDLTD)

6.2 National Science Digital Library System (NSDL)

6.3 Digital Library for Earth System Education (DLESE)

6.4 ArXiv

6.5 CiteSeer (ResearchIndex)

6.6 Networked Computer Science Technical Reference Library (NCSTRL)

6.7 NASA Technical Report Server (NTRS)

6.8 OAIster

7. International Collaborative Projects

8. Summary

9. References

1. Introduction

The major objectives of every library are to collect, process, preserve and disseminate information based on the user needs. In the 1990s www become a favourite media of information delivery due to its simplicity and usage in all the fields: particularly sharing of information in the scholarly community. The digital library provides access to scholarly content through distributed networks to the user desktop, which is not at all possible by the traditional library. Project Gutenberg is truly considered as first digital library initiated by Michael Hart in the year 1971. From late 1980, different working groups conducted workshops towards the impact of information technology and role of electronic resources in the academic community. In 1993, the DLI phase-I formally started with the funding support from National Science Foundation (NSF) / Defense Advanced Research Projects Agency (DARPA) / National Aeronautics and Space Administration (NASA), This gavea real impetus to web-based digital library initiative to the world over. This module elaborates on various early digital library projects and the challenges, role of National Science Foundation to promote digital library research activity and major digital library project and its achievements.

2. Pioneersin Digital Library

Vannevar Bush, one of the Roosevelt’s advisers in the World War II, in his seminal article published in the “Atlantic Monthly” in 1945, conceptualized Memory Extender called “MemEx”, a thinking machine in which an individual could store information and link them. However, given the developments in digital technology at that period, the MemEx was essentially proposed to be an analogue machine that could be used for information storage on microfilms with a mechanical linking processes.

Vannevar Bush’s microfilm-based “MemEx”, in turn, inspired Ted Nelson and Douglas Engelbart to carry forward the underlying concept behind MemEx. In 1962, Engelbart started work on the Augment Project, which aimed to produce tools to aid human capabilities and productivity. He developed NLS (oN-Line System) that allowed researchers in Augment Project to access all stored working papers in a shared “journal” which eventually had 100,000 items in it, and was one of the largest early digital libraries. Engelbart is also responsible for inventing a pointing device (mouse) in 1968. Ted Nelson designed “Xanadu System” in 1965 and coined the word “Hypertext” and proposed a system wherein all publications in the world would be deeply inter-linked.

The most significant development in the history of Internet and digital library was the invention of World Wide Web (WWW) by Tim Berners-Lee at the CERN Laboratory in 1991. The crucial underlying concept behind World Wide Web (WWW) is hypertext that has its origin in Ted Nelson’s Project Xanadu, and Douglas Engelbart’s oN-Line System (NLS).

3. Early Digital Library Projects

3.1 Project Gutenberg (http://www.gutenberg.org/)

Project Gutenberg was launched in the year 1971 at the Materials Research Lab at University of Illionois by Dr Michael Hart. The main objective of the project was to create electronic access of humanities literature available at the University of Illionois. A pioneer project in a number of ways, Project Gutenberg was the first information provider on the Internet and is the oldest digital library. When the Internet became popular, in the mid-1990s, the project got a boost and obtained an international dimension. The number of electronic books in Project Gutenberg rose from 1,000 (in August 1997) to 5,000 (in April 2002), 10,000 (in October 2003), 15,000 (in January 2005), 20,000 (in December 2006), 25,000 (in April 2008) and 44,844 (in February 2014) with a current production rate of around 340 new books each month. Project Gutenberg promotes digitization in “text format”, meaning that a book can be copied, indexed, searched, analyzed and compared with other books. Contrary to other formats, the files are accessible for low-bandwidth use. Apart from ASCII format other formats are also included such as HTML, PDF, EPUB, MOBI, and Plucker to access the book from various devices. In 2000, a non-profit corporation, the Project Gutenberg Literary Archive Foundation, Inc. was chartered in Mississippi to handle the project’s legal needs. Dr Gregory Newby was the founder CEO of this project. In 2000, Mr Charles Franks also founded Distributed Proofreaders, which allowed the proofreading of scanned texts to be distributed among many volunteers over the Internet. Since the project is based in USA, at the beginning all the book were in English.

Multilingualism started in the year 1997, now books are available in 60 languages with eleven main languages (6).

3.2 Early Experiments Digitization of Journal Articles

3.2.1 Mercury

Carnegie Mellon University, during 1988-1992, attempted to create campus-based electronic library. One of the objectives was to host the scanned images of journal articles, using materials licensed from publishers. Four publishers were identified for publishing sixteen of the twenty computer science journals that were most heavily used on campus. They were ACM, IEEE, Elsevier, and Pergammon. During the project, Pergammon was taken over by Elsevier. None of the publishers had machine-readable versions of their journals, but they gave permission to convert printed materials into digital format for use in the library. Thus, an important part of the work was the conversion, storage, and delivery of page images over the campus network. The fundamental paradigm of Mercury was searching a text database, to identify the information to be displayed. An early version of Z39.50 was chosen as the protocol to send queries between the clients and the server computers on which the indexes were stored. Mercury introduced the concept of a reference server, which keeps information about the information stored on the servers, the fields that could be searched, the indexes, and access restrictions. Carnegie Mellon already had a mature set of network services, known as Andrew. Mercury was able to use standard Andrew services for authentication and printing. Electronic mail was used to dispatch information to other computers (5).

3.2.2 Chemistry Online Retrieval Experiment (CORE)

CORE was a joint project by Bellcore, Cornell University, OCLC, and the American Chemical Society that ran from 1991 to 1995. The project converted about 4,00,000 pages, representing four years of articles from twenty journals published by the American Chemical Society. CORE included two versions of every article, a scanned image and a text version marked up in SGML. The scanned images ensured that when a page was displayed or printed, it had the same design and layout as the original paper version. The SGML text was used to build a full-text index for information retrieval and for rapid display on computer screens. Two scanned images were stored for each page, one for printing and the other for screen display. The CORE collections, which were only twenty journals, occupied some 80 gigabytes of storage (5).

3.2.3 Association for Computing Machinery (ACM)

In 1993, the ACM decided that its future production process would use a computer system that creates a database of journal articles, conference proceedings, magazines and newsletters, all marked up in SGML. Subsequently, ACM also decided to convert large numbers of its existing journals to build a digital library covering its publications from 1985 onwards.. One use of the SGML files is a source for printed publications. However, the plan was much more progressive. The ACM planned for the day when members would retrieve articles directly from the online database, sometimes reading them on the screen of a computer, sometimes downloading them to a local printer. Libraries would be able to license parts of the database or take out a general subscription for their patrons. The collection became online during 1997. It uses a web interface that offers readers the opportunity to browse through the contents pages of the journals, and to search by author and keyword (5).

3.3 American Memory

The Library of Congress, which is the world’s biggest library, has a huge number of special collections of unique or unpublished materials. Among library’s treasures, rare books, pamphlets, and papers provide valuable material for the study of historical events, periods, and movements. Millions of photographs, prints, maps, musical scores, sound recordings, and moving images in various formats reflect trends and represent people and places. American Memory was a pilot program that, from 1989 to 1994, reproduced selected collections for national dissemination in computerized form. Collections were selected for their value for the study of American history and culture, and to explore the problems of working with materials of various types, such as prints, negatives, early motion pictures, recorded sound, and textual documents (5).

4. Digital Library Initiative (1994-1998)

The Digital Libraries Initiative (DLI) was the result of a community-based process which began in the late 1980s with informal discussions between researchers and funding agencies in USA. These discussions progressed to planning workshops designed to develop research values and agendas and culminated in the National Science Foundation (NSF)/Defense Advanced Research Projects Agency (DARPA)/National Aeronautics and Space Administration (NASA) Research in Digital Libraries Initiative announced in late 1993. The Digital Library Initiative Phase -I (DLI-1) started in the year 1994 with the funding support from the National Science Foundation (NSF), Department of Defense Advanced Research Project Agency (DRAPA) and National Aeronautics and Space Administration (NASA). The US Government spend around $ 24 million during four years from 1994-1998 for projects assigned to six major universities. These universities concentrate on developing digital library architecture, technologies and standard procedure for capturing, processing and organising the information, develop search, browse and visualisation interfaces, etc. (2).

Six major universities participated in the first phase of DLI-1 (1994-1998) as follows (3):

4.1 University of Michigan

This project focused on the collection of earth and space sciences and intended to serve a variety of users. The key contribution of the project included agents based digital library architecture. The agent represents an element of the DL, i.e., a collection or service. The external collaborators were IBM, Elsevier Science, UMI International and Kodak.

4.2 University of California, Berkeley (The Environmental Electronic Library)

The UC Berkeley Digital Library project was part of the NSF/ARPA/NASA Digital Library Initiative and part of the California Environmental Resource Evaluation System. The project’s goal is to develop the technologies for intelligent access to massive, distributed collection of photographs, satellite images, maps, full text documents, and “multivalent” documents.

4.3 University of California, Santa Barbara (The Alexandria Digital Library)

The goal of the project was related to distribution of digital library for geographically referenced information. The information was indexed based on geographical location in addition to indexing by other attributes. The prototype web called Alexandria Digital Library was developed to search the data set based on text and visual query language. The collection includes digitised maps, images and other geographical information related to Santa Barbara, Ventura and Los Angeles. The project partners with National Centre for Geographical Analysis, Digital Equipment Corporation and Xerox.

4.4 Carnegie Mellon University (The Informedia Digital Video Library)

Focus of this project was to develop search and discovery interface for the video medium. The project called Informedia Digital Video Library aims to integrate speech, language and image understanding technologies to support both the creation of and retrieval from the digital library. Software called Sphinx-II developed by Carnegie Mellon University was used to recognise the speech, automatically transcribe narratives and dialogues from each video. The project was supported by industrial partners such as Bell Atlantic, Intel Corporation, Microsoft Incorporation, etc.

4.5 University of Illinois, Urbana Champaign (Federated Repositories of Scientific Literature)

The Digital Libraries Initiative (DLI) project at the University of Illinois at Urbana-Champaign was aimed at developing the information infrastructure to effectively search technical documents on the Internet. A large test bed of scientific literature was developed for evaluating its effectiveness under significant use, and researching enhanced search technology. The test bed of Engineering and Physics journals was based in the Grainger Engineering Library. The National Centre for Super- computing Applications (NCSA) developed software for the Internet version in an attempt to make server-side repository search widely available. The Research section of the project used NCSA supercomputers to compute indexes for new search techniques on large collections, to simulate the future world, and to provide new technology for the Test bed section.

4.6 Stanford University (Infobus)

The Stanford Digital Libraries project was one of the participants in the four-year, $24 million Digital Library Initiative, started in 1994. In addition to the ties with the five other universities that were part of the project, Stanford also had a large number of partners. Each university project handled different angle of the total project, with Stanford focusing on interoperability. The collections were primarily computing literature with strong focus on networked information sources, meaning that the vast array of topics found on the World Wide Web were accessible through this project as well. At the heart of the project was the test bed running the “InfoBus” protocol, which provided a uniform way to access a variety of services and information sources through “proxies” acting as interpreters between the InfoBus protocol and the native protocol.

5. Digital Library Initiative Phase-2(DLI-2)1999-2004

Based on the recognized achievements of DLI and the promise of additional Federal investment in digital libraries, Digital Libraries Initiative phase-II (DLI-2) was announced in the year 1998. DLI-2 was a multiagency initiative to provide leadership in research fundamental to the development of the next generation of digital libraries, to advance the use and usability of globally distributed, networked information resources, and to encourage existing and new communities to focus on innovative applications areas. The success of the original DLI-1 programme and the continued IT research interest allowed the NSF to continue to spearhead the development of the DLI-2 research programme. More sponsoring agencies joined with DARPA, NASA and the NSF in the DLI-2 programme, including the National Library of Medicine (NLM), the Library of Congress (LOC), the National Endowment for the Humanities (NEH), the National Archives and Records Administration (NARA), the Smithsonian Institution (SI), and the Institute of Museum and Library Services (IMLS).

If intent in the first phase was to concentrate on the investigation and development of underlying technologies, the second phase (1999-2004) was intended to look more at applying those technologies and others in real life library situations. Second phase aimed at intensive study of the architecture and usability issues of digital libraries including the vigorous research on: a) Human-centred DL architecture; b) Content and Collections-based DL architecture; c) Systems-centred DL architecture; and d) Development of DL test beds. Under the DLI-2, 77 large and small projects in various categories were developed with the support of academic institutions at US.

6. Major Digital Library Projects

6.1 The Networked Digital Library of Theses and Dissertations (NDLTD) (http://thumper.vtls.com:6090/)

The Networked Digital Library of Theses and Dissertations (NDLTD) is an international organization dedicated to promoting the adoption, creation, use, dissemination, and preservation of electronic theses and dissertations (ETDs). NDLTD supports electronic publishing and open access to scholarship in order to enhance the sharing of knowledge worldwide. The concept of electronic theses and dissertations (ETDs) was first discussed during the meeting in 1987 at Ann Arbor, Michigan, organized by UMI and attended by representatives from Virginia Tech, the University of Michigan. The result of several years of intense collaborative work, the ETD-db software that emerged from Virginia Tech in 1996 provided a complete ETD submission package from beginning to end. Maintaining its leadership role, Virginia Tech also coordinated the development and implementation of a distributed digital library system, so that ETDs from all participating institutions could be accessed easily. The system that was developed allowed browsing and searching based on institution, date, author, title, keywords, and full-text, as well as downloading for local reading or printing of ETDs worldwide. This early effort to create a global digital library provided the conceptual framework for what became the Networked Digital Library of Theses and Dissertations (7).

The National Digital Library of Theses and Dissertations was established in 1996, directed by an informal steering committee. As its scope became international, the organization kept the acronym NDLTD, but changed its name to the Networked Digital Library of Theses and Dissertations. In 1998 interested institutions began meeting annually for what would become a series of symposia on electronic theses and dissertations sponsored by NDLTD and designed to help universities initiate ETD projects (8).

In 2003, the NDLTD incorporated as a non-profit charitable organization, with a set of bylaws. Today, the NDLTD’s members include more than 200 universities around the world, as well as partner organizations, including Adobe, the American Library Association, the Association of Research Libraries, the Coalition for Networked Information, the Joint Information Services Committee, OCLC Online Computer Library Center, Proquest / UMI, and Theses Canada—all working toward the goal of unlocking the benefits of shared knowledge for all (8).

6.2 National Science Digital Library System (NSDL)(https://nsdl.org/)

To stimulate and sustain continual improvements in the quality of science, mathematics, engineering, and technology (SMET) education, the National Science Foundation (NSF) has launched the National Science, Mathematics, Engineering, and Technology Education Digital Library (NSDL) program in 2000. The resulting digital library is intended to serve the needs of learners belonging to a broad user audience such as Kindergarten to Senior Higher Secondary School, undergraduate, graduate, and life-long learning in both formal and informal settings. Envisioned as the premier portal to current and future high-quality SMET educational content and services, this virtual facility will enable seamless access to a rich array of interactive learning materials and resources, distinguished by the depth and breadth of the subject matter addressed, and valued for its authority and reliability (7).

Initial development of the NSDL program began in late 1995 with an internal concept paper for the NSF Division of Undergraduate Education (DUE). Mogk and Zia examined the opportunities and challenges in evaluation and dissemination that would be implied by a national digital library for science education (10). Subsequently, the idea was explored and developed further through a series of workshops and planning meetings over the next several years. Beginning in 1998, two rounds of prototype projects were supported through the Special Emphasis: Planning Test beds and Applications for Undergraduate Education program conducted under the auspices of the multi-agency Digital Libraries Initiative – Phase 2 (DLI-2) program.

More than 60 projects have been funded since 1998 in three areas (11):

The collection track for offering contents (e.g.,national biology digital library, digital mathematics library, experimental economics digital library);

a. The service track for providing technologies and services (e.g.,University of Arizona’s GetSmart e-learning concept map system);

b. The core integration track, for linking all contents and services under a unified framework.

Open Archive Initiative (OAI) based content creation and metadata harvesting is one of the critical components in NSDL, which has the potential for improving the standards and sustainability of all projects involved. The NSDL programme takes a grass-roots approach to inviting community input and consensus building through various committees and working groups.

6.3 Digital Library for Earth System Education (DLESE)(http://www.dlese.org/library/index.jsp)

The Digital Library for Earth System Education (DLESE) mission is to “improve the quality, quantity, and efficiency of teaching and learning about the Earth System, by developing, managing, and providing access to high-quality educational resources and supporting services through a community-based, distributed digital library” launched in 2001 (12). DLESE has emerged to support the specific educational needs of the geoscience community within the larger NSDL network. The National Science Foundation provided funding for the development of DLESE which is now operated by the National Center for Atmospheric Research (NCAR), Computational and Information Systems Laboratory and the NCAR Library. DLESE serves as a vehicle for the geoscience community to respond to the challenges of systemic educational reform and the changing technological landscape.

DLESE provides educational discovery features that enable users to search by grade level, educational resource type, and keyword. It also contains a resource cataloguer, and community oriented services, such as discussion forums for working groups and a community-posting tool. To ensure interoperability with the NSDL, support for the Open Archives Initiative harvesting protocol has been implemented. The DLESE collections grow through community contributions from individuals or institutions. The DLESE Program Center (DPC) enables the community to consciously and actively shape the intellectual framework of the DLESE collection by providing tools, components, and services that reflect DLESE policy, assure collection quality, and promote pedagogical innovation

DLESE has established relationships with science and science education professional societies, including the American Association for the Advancement of Science (AAAS), the American Geological Institute (AGI), the American Geophysical Union (AGU), the Incorporated Research Institutions for Seismology (IRIS), the National Science Teachers Association (NSTA), and the emerging Center for Ocean Sciences Education Excellence (COSEE) and Earth scope efforts (7). These partners provide outreach opportunities for DLESE through their conferences and workshops.

6.4 ArXiv (http://arxiv.org/)

Started in August 1991, also known as Los Alamos National Laboratory (LANL) e-print service, is a fully automated electronic archive and distribution server for research papers. ArXiv is owned, operated and funded by Cornell University and partially funded by the National Science Foundation (7). It covers un-refereed articles self-archived by the authors. The areas covered include physics and related disciplines like mathematics, nonlinear sciences, computer science and quantitative biology. The contents of arXiv conform to Cornell University academic standards. Authors can submit their papers to the archive using the World Wide Web interface. They may also update their submissions, though previous versions remain available. Users can retrieve papers from the archive either through an online interface, or by sending commands to the system via e-mail. Users can also register to automatically receive an e-mail listing of newly submitted papers in areas of interest to them. Facilities to view recent submissions and to search old submissions are also provided via the World Wide Web interface.

Involvement of arXiv in the OAI: The Open Archives Initiative (OAI) developed from a meeting held in Santa Fe in 1999, which was initiated by Paul Ginsparg (arXiv, Los Alamos National Lab.), Rick Luce (Los Alamos National Lab.) and Herbert Van de Sompel (University of Ghent, Los Alamos National Lab.). ArXiv has continued to be actively involved in both management of the initiative and technical development of the protocol (7).

6.5 CiteSeer (ResearchIndex) (http://citeseerx.ist.psu.edu/index)

CiteSeer also known as the ResearchIndex is a scientific literature digital library and search engine that focuses primarily on the literature in computer and information science. It contains freely available, full-text research articles (journal pre-prints and papers where available, conference proceedings, technical reports) downloaded from the web. It indexes PostScript and PDF research articles. CiteSeer uses search engines, crawling and document submissions to harvest papers.

The articles are indexed by an Autonomous Citation Indexing (ACI) system which links the records together through references cited within, and citations made to, an article. It provides links to related articles and can identify the context of a citation, allowing researchers to see what their peers have said about the cited work. CiteSeer computes citation statistics and related documents for all articles cited in the database, not just the indexed articles. CiteSeer locates related documents using citation and word based measures and displays an active and continuously updated bibliography for each document. It shows the percentage of matching sentences between documents. CiteSeer provides the context of how query terms are used in articles instead of a generic summary, improving the efficiency of search. Other services include full-text, Boolean, phrase and proximity search. It provides automatic notification of new citations to given papers, and new papers matching a user profile.

CiteSeer aims to improve the dissemination and feedback of scientific literature and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency and timeliness in the access of scientific and scholarly knowledge. Rather than creating just another digital library, CiteSeer provides algorithms, metadata, services, techniques and software that can be used in other digital libraries.

6.6 Networked Computer Science Technical Reference Library (NCSTRL)(http://www.ncstrl.org/)

The Networked Computer Science Technical Reference Library (NCSTRL) is an international collection of computer science technical reports from Computer Science departments and industrial and government research laboratories, made available for non-commercial and educational use. Mostly NCSTRL institutions are universities that grant PhDs in Computer Science or Engineering, with some industrial or government research laboratories. NCSTRL is based on two previous technologies for technical report libraries. Dienst is a protocol and implementation for distributed digital library servers. WATERS is a system that links distributed FTP report repositories via a centralized index. The NCSTRL architecture combines the power and flexibility of Dienst with the ease of installation of WATERS. The technology underlying NCSTRL is a network of interoperating digital library servers. The digital library servers provide three services: repository services that store and provide access to documents; index services that allow searches over bibliographic records; and user interface services that provide the human front-end for the other services. Search requests from users generate parallel protocol requests to the distributed index servers.

6.7 NASA Technical Report Server (NTRS)(http://data.nasa.gov/)

NASA’s history with web-based DLs dates back to 1993, when a WWW interface was provided for the Langley Technical Report Server (LTRS). Prior to this, LTRS was simply an anonymous FTP server that distributed technical reports authored and sponsored by NASA Langley Research Center. However, LTRS provided access to reports only from NASA Langley Research Center, and not other NASA centres and institutes. Beginning in 1994, software used to create LTRS was shared with other NASA installations and “LTRS-like” DLs were setup. In 1995 the NASA Technical Report Server (NTRS) was set up to provide integrated searching between the various NASA web-based DLs (7).

NASA’s technical information is available via the NASA Technical Report Server (NTRS) to provide students, educators and the public with access to over 500,000 aerospace-related citations, over 300,000 full-text online documents, and over 500,000 images and videos. The types of information includes: conference papers, journal articles, meeting papers, patents, research reports, images, movies, and technical videos – scientific and technical information (STI) created or funded by NASA. It is a part of the NASA Scientific and Technical Information (STI) Program, whose mission is to collect, archive and disseminate NASA aerospace information, and locate domestic and international STI pertinent to NASA’s missions and Strategic Enterprises. NTRS also collects scientific and technical information from sites external to NASA to broaden the scope of information available to users. NTRS’s Simple Search searches for NASA information only and its Advanced Search can search for NASA and non-NASA information. It also facilitates browsing and weekly updates.

The NTRS integrates the following three NASA collections and enables search and retrieval through a common interface (14):

• NACA Collection: Citations and reports from the National Advisory Committee for Aeronautics period lasting from 1915 to 1958.

• NASA Collection: Citations and documents created or sponsored by NASA starting in 1958 and continuing to the present.

• NIX Collection: Citations and links to the images, photos, movies, and videos from the discontinued NASA Image eXchange (NIX).

The information found in the NTRS was created or funded by NASA and is unlimited, unclassified, and publicly available.

6.8 OAIster (http://oaister.worldcat.org/)

OAIster is a union catalogue of millions of records representing open access digital resources that was built by harvesting from open access collections worldwide using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAIster began at the University of Michigan in 2002 funded by a grant from the Andrew W. Mellon Foundation and with the purpose of establishing a retrieval service for publicly available digital library resources provided by the research library community. During its tenure at the University of Michigan, OAIster grew to become one of the largest aggregations of records pointing to open access collections in the world.

In 2009, OCLC formed a partnership with the University of Michigan to provide continued access to open access collections aggregated in OAIster. OCLC is evolving OAIster to a model of self-service contribution for all open access digital repositories to ensure the long-term sustainability of this rich collection of open access materials. Today, OAIster includes more than 30 million records representing digital resources from more than 1,500 contributors. Additionally, the OAIster records are included in search results for those libraries with WorldCat Local and WorldCat Local quick start.

The records of the open access digital resources available via OAIster lead to a wide range of materials and include (15):

• Digitized (scanned) books, journal articles, newspapers, manuscripts and more

• Digital text

• Audio files (wav, mp3)

• Video files (mp4, QuickTime)

• Photographic images (jpeg, tiff, gif)

• Data sets (downloadable statistical information)

• Theses and research papers

7. International Collaborative Projects

International digital library research project intended to develop a system that can operate in multiple languages, formats, media, social and organisational contexts. The main objective of this project is to avoid duplication of effort from the different agencies and sharing the scientific knowledge and scholarly data all over the world. Major International Collaborative projects as follows:

• National Science Foundation (NSF) – Joint Information Systems Committee (JISC) (US- UK): International Digital Libraries Collaborative Research and Application Test beds

• NSF- German Research Foundation (DFG) (US-Germany): International Digital Libraries Research

• Network of Excellence in Digital Library System (DELOS) / NSF Working Group Reference Models for Digital Libraries: Actors and Roles

• NSF/European Union (EU) Digital Libraries: Future Directions for a European Research Programme

8. Summary

This module discussed various digital library initiatives taken by the US for the benefit of academic community. The module elaborated on various aspects of the first digital library Project Gutenberg and other earlier projects such as Mercury, Chemistry Online Retrieval Experiment Association for Computing Machinery (ACM). You also acquired knowledge about Digital Library Initiative Phase -1, Phase -2, their funding partners and the test bed project. The role of funding agencies such as National Science Foundation (NSF) / Defense Advanced Research Projects Agency (DAR.PA) / National Aeronautics and Space Administration (NASA) is vital to develop technical architecture and to promote digital library research activities in US. You have also learned about major digital library projects, their outcome and achievements.

9. References

1. Chowdhury G G, Chowdhury Sudatta. (2003). Introduction to Digital Library, London, Facet Publication

2. Madalli, Devika P. (2003). Digital Libraries and Digital Library Initiatives, Digital Libraries: Theory and Practice, DRTC, Bangalore.

3. Padmini Srinivasan (1997). Digital Library Projects in the United States. DESIDOC Bulletin of Information Technology, vol.17, No.6, pp 15-21

4. Challenges for digital libraries (http://www.unc.edu/~elese/diglib/history.html)

5. Arms, William (2000) 1999 Manuscript of Digital Libraries, M.I.T. Press (http://www.cs.cornell.edu/wya/diglib/ms1999/Chapter3.html)

6. Project Gutenberg (1971-2008) (http://www.gutenberg.org/cache/epub/27045/pg27045.html)

7. Digital Library in Education, Science and Culture: an analytical survey (http://iite.unesco.org/pics/publications/en/files/3214660.pdf)

8. The Networked Digital Library of Theses and Dissertations (http://www.ndltd.org/about)

9. National Science Digital Library System (https://nsdl.org)

10. Mogk, David W. and Zia, Lee L. “Addressing Opportunities and Challenges in Evaluation and Dissemination through Creation of a National Library for Undergraduate Science Education.” Invited Symposium in Proceedings of the 31st Annual Meeting of the Geoscience Information Society, October 28-31, 1996, Denver, CO (1996). Available at (http://gdl.ou.edu/rp1.html)

11. Zia, L. lee The NSF National Science, Mathematics, Engineering, and Technology Education Digital Library (NSDL) Program: A Progress Report. D-Lib Magazine, October 2000. Available at (http://www.dlib.org/dlib/october00/zia/10zia.html)

DLESE: Digital Library for Earth System Education (http://nsdl.org/partners/detail/PATH-000-000-000-007)
NASA Technical Reports Server (http://data.nasa.gov/about/)
http://www.sti.nasa.gov/find-sti/#ntrsharvest
http://www.oclc.org/oaister/about.en.html