12 Digital Library and Semantic Web

Dinesh Pradhan and Yatrik Patel

 

I.  Objectives

 

Semantic Digital Libraries facilitate the usefulness of the traditional digital libraries by exposing the content of the digital library through a structure and inter-related web so that it can be easily used by both the machines and the users. This module discusses:

 

•    The core services that can be enabled using a semantic digital library;

•    The architecture of semantic digital library from various perspectives; and

•    Example of some semantic digital library initiatives.

 

 

II.   Learning Outcomes 

 

After going through this lesson, learners would attain knowledge on augmenting usefulness and services of traditional digital library to semantic digital library using architecture and services that characterize semantic digital library. Learners would also gain knowledge about ongoing semantic digital library initiatives.

 

 

III.   Structure 

 

1              Introduction

2             Semantic Digital Library Services

2.1           Search, Browsing, and Recommendation Services

2.2          Services for Augmenting Resources

2.3          Dissemination and Notification Services

2.4          Services Providing Interoperability

2.5          Preservation Services

2.6          Quality Assurance Services

2.7           Integrated Documentation

3              Architecture of Semantic Digital Libraries

3.1           Metadata Perspective

3.2           User Perspective

3.3           Layered Architecture

3.3.1        Client Layer

3.3.2        Data Presentation

3.3.3        Data Preparation

3.3.4        Data Manipulation

3.3.5        Data Abstraction Layer

3.3.6        Data Source Layer

3.4           Vertical Architecture of Business Logic Services

3.5           Stack of Core Technologies Architecture

3.5.1        Service Oriented Architecture

3.5.2        Peer-to-Peer

3.5.3        Grid

4              Semantic Library Projects

4.1            JeromeDL Project (http://sourceforge.net/projects/jeromedl/)

4.2           BRICKS Digital Library Infrastructure (http://www.brickscommunity.org/)

4.3           MarcOnt Initiative

4.4           Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) Project http://simile.mit.edu/)

5               Summary

 

 

 

 

 

1. Introduction 

 

Most of the digital libraries facilitate the searching through the RDBMS which stores the metadata information of the digital library content or using a full-text index to allow searching of the contents of the digital library. These methods are helpful in browsing the digital library or retrieving the results by making a query based on the terms that are indexed from the digital objects. But for retrieving results for queries where the query term is not included in the indexed terms or the metadata of the digital objects. For example, an user is looking for a book on “orchids” authored by “Rolfe” in the digital library having a book on “Vanilla” authored by “Rolfe” will not get the desired result. Whereas semantic digital library will also give this record in the search results.

 

This is where the addition of semantics to a library can be useful. Semantic information, represented by metadata attached to each object and by one or more ontologies to provide semantic context for searches, can be used to resolve just such queries. The additional flexibility and search capability represented by such a technique to make a semantic digital library much more valuable than a traditional DL for an average user.

 

Sebastian Ryszard Kruk pointed out three problems faced by the Digital library users as follows:

 

•    Missinga librarian- problems with information discovery and understanding complex metadata

•    Missing peers- cannot share experience with other users visiting the library

•    Missingconnection with other sources- library resources cannot become a part of the information processing workflow

 

The semantic digital library provides the options to make the digital libraries more useful in today’s interlinked web by making the unstructured data machine readable.

 

2. Semantic Digital Library Services 

 

The major goals of semantic digital library is to integrate information related to digital library objects available from various sources like bibliographic descriptions, user profiles, bookmarks, controlled vocabulary etc. This can be achieved with using semantics providing meaningfully connected information. It should also enable interoperability with other systems including digital libraries and other services. Kruk and McDaniel lists the important services that can be enabled using Semantic digital libraries as follows:

 

2.1. Search, Browsing, and Recommendation Services 

 

The major objective of a semantic digital library is to provide information discovery that is superior to the services available in current digital libraries. It should facilitate the users to find out interconnected information about resources available in the digital library while browsing, filtering, or finding similar information objects. Query refinement engines should use complex semantic relations between results to provide search results matching users profiles. Semantic digital library should also provide recommendation services, e.g., based on the context and resource(s) annotations or based on collaborative filtering. The search engine should allow for exploiting information about different media types, complex objects, streaming and spacio- temporal resources. In the case of resources with complex annotations it is important to support content-based search together with similarity search algorithms. In the case of heterogeneous, competitive networks of content providers, the semantic digital library should implement query trading algorithms to support the users in their searching.

 

2.2.  Services for Augmenting Resources 

 

Semantic digital libraries can augment the stored digital objects by providing additional annotations along with the basic metadata provided during the upload process. This can be achieved by both automated and user-based annotations. The user-based annotations can be achieved by utilizing the power of social networking sites where community annotations, tagging, and ranking etc. are done by the users. The annotation services should be flexible enough to adapt to various user groups and content being augmented, e.g., time-tagging stream media or annotating regions-of-interest (ROI) in images and geo-spatial resource. User-based annotations are the key technology to actively engage users in the process of sharing knowledge. Semantic digital library services should ensure that users, providing annotations benefit from better recommendations and search results.

 

2.3. Dissemination and Notification Services 

 

Semantic digital libraries should enable the users to access the metadata in any format and through any services. The users should be able to access the content’s metadata by constructing mash-ups of services and content. These metadata should be provided through APIs, as a customizable RSS and ATOM feeds, SIOC (Semantically-Interlinked Online Communities) descriptions, RDF graphs, and in JSON format. To enhance the user experience, extensible and customizable notification services can be provided as a part of the information retrieval process.

 

2.4. Security and Policy Assurance Services 

 

The Semantic Digital library should adapt to various policy enforcements and provide flexible authentication and access control mechanisms for tconvenience of the users.

 

2.5. Services Providing Interoperability 

 

The semantic digital library should provide interoperability standards for content, metadata, services, and protocol standards with both backward and forward compatibility. It should provide services compatible with both legacy standards, e.g., Dienst, Z39.50, and OAI-PMH, and modern protocols, e.g., SPARQL, or OAI-ORE.

 

2.6. Preservation Services 

 

The Semantic Digital library should ensure versioning, archiving (backup and recovery) as well as, provenance tracking (especially in the context of an open world approach of semantic and social technologies), and keeping track of history of events related to information objects.

 

2.7. Quality Assurance Services 

 

Semantic digital library have to ensure efficiency, security and semantics of maintained metadata.

 

2.8. Integrated Documentation 

 

Semantic digital libraries and the related services are mostly complex in nature. It is a vital requirement from the perspective of the library usability to deliver documentation integrated within the digital library system.

 

3. Architecture of Semantic Digital Libraries 

 

Kruk et.al, has provided a three perspective for architecture for the semantic digital library: top- down layered architecture, vertical architecture of core services and stack of enabling infrastructures. This proposed architecture is based on the metadata perspective and user perspective as discussed:

 

3.1. Metadata Perspective 

 

The Semantic digital libraries should make the metadata more open, unstructured, and highly interlinked to assist users in information discovery in the interconnected information space. The semantic digital library architecture should include annotations contributed by the users in addition to the metadata descriptions provided in a traditional digital library. Ontologies are one of the key technologies to semantic digital libraries that enable reasoning over interconnected concepts, provide the meaning to the information objects and to the relations between them. As such the metadata become a network of meaningful, interlinked concepts which can be used by the digital library management services to leverage the usefulness of the content from the perspective of the end users.

 

3.2. User Perspective 

 

The introduction of highly interconnected, open, and pretty unstructured metadata to semantic digital libraries posed requirements for new information discovery and management technologies and provided features like highly extensible faceted navigation, natural language- based interaction, or community-driven collaborative filtering etc. The contemporary user interaction models, e.g., Web 2.0 or instant messaging, has to be included for the users. Benefits that users can derive from Semantic Digital Libraries have to be delivered as enriched services of what users already have in contemporary solutions.

 

In the semantic digital library the variety of content views and user perspectives should be reflected in the metadata layer in the library. User profiles can be captured with a special ontology that describes characteristics, groups they belong to, and relationships with other users. Semantic DLMS can select only the desired part information and process it to deliver a given functionality for a specific user; it became especially feasible with the underlying unstructured metadata and ontologies.

 

3.3. Layered Architecture 

 

Kruk has given a layered architecture for Digital Library Management System which is composed of six layers. The layered architecture is presented in figure given below.

 

Figure 1: Layered Architecture for Semantic Digital Library proposed by Kruk etal.

 

The lower layers in the figure are more data-oriented and the upper layers provided the requirements of the end-users and external services. The service oriented architecture is proposed to specify, organize and execute services delivered by each abstraction layer.

 

3.3.1. Client Layer 

 

The client layer represents applications used to access the digital library content which includes the graphical user interface available through web browsers or mobile browsers, programmes that process the content for their own purpose or mash-ups that consume the content and provide through an integrated portal/service.

 

3.3.2. Data Presentation 

 

The data presentation layer represents the endpoints which deliver the raw content to the client layer application and services. The data in this layer is expressed in different level based on the standard of encoding like some formats require more information and processing, while other can be directly sent over from the underlying the RDF storage. This layer shows the benefit of using RDF object model and ontologies to define meaning of concepts and relations. In most cases we do not need to know the particular schema in advance; if necessary the specification required for transformation can be derived from these ontologies. In some cases it is just a matter of ‘reformatting’ the information to other standards like JSON. This layer assures that information modeled using concepts of one ontology can be translated and tailored to fit understanding of other users and client services.

 

3.3.3. Data Preparation 

 

This layer is quite different from the previous layer and acts as a check point for the delivered content. It does not directly manipulate the data, but represent the data based on various policies based on the user requirement or ensure the consistency of the data and augment with additional information from this library or other services.

 

3.3.4. Data Manipulation 

 

The data manipulation level is the most developed layer of the semantic digital library model. It retains the core mechanisms responsible for delivering particular library functionality to the end user. The layers above process the data to fit the appropriate client needs, while the layers below provide means to perform operations needed to fulfill a given service goal.

 

3.3.5. Data Abstraction Layer 

 

The data abstraction layer provides atomic data operations for the data manipulation layer. This is the last level before accessing the actual databases and responsible for data consistency, its proper preservation, and access. It delivers an abstract object model, which allows to seamlessly access and manage various data models, e.g., RDF, full-text index, distributed index.

 

3.3.6. Data Source Layer 

 

The data source layer represent the data repositories used for the semantic digital library which include RDF storage for metadata storage and conventional storages for binary form of the resources. It also includes full-text index storage, and separate services required for managing distributed and local data.

 

3.4. Vertical Architecture of Business Logic Services 

 

The data manipulation services layers of a semantic digital library are presented in four vertical layers as given in the figure 1. These four layers are based on the role they play in interaction with the end-users in the semantic digital library:

 

3.4.1. Information discovery (access, search, browsing) services are the services the end-users interacts with most often. The services implemented in this column range from enhanced search and browsing, natural language interfaces, faceted navigation to community services, such as annotations, ranking, collaborative filtering and collaborative search. Additional services support context based operations, user profiling, and overall quality of service. End-users can also interact with the DLMS to compose workflows to support their business needs.

 

3.4.2. Advanced management services are mostly accessible by the librarians and library administrators. They deliver service, user, collection, and streaming media management. Librarians can also manage the set of ontologies deployed in the semantic digital library; they can also invoke ontology learning processes on the corpus of recently added resources.

 

3.4.3. Basic services support core operations of the semantic digital library, such as content and metadata management, ensuring access and policy management. A set of basic services support librarians in the classification process by indexing, feature extraction and recognition operations.

 

3.4.4. Interoperability services are furthest from the end-user, but close to the external services. To support interoperability, semantic digital libraries can offer different solutions; they range from simple exposing metadata and SPARQL endpoints to metadata harvesting solutions (e.g., OAI-PMH) to service and metadata mediation services.

 

3.5. Stack of Core Technologies Architecture 

 

While the previous sections provide the architecture of semantic digital library from the perspective of metadata, users, and services, this section provide the architecture from the technology perspective in which all the above operate.

 

3.5.1. Service Oriented Architecture 

 

The service oriented architecture suggests for providing the services of semantic digital libraries using web services which facilitates effortless integration of digital library services in outer online services or mash-ups. This includes both the REST and SOAP services.

 

3.5.2. Peer-to-Peer 

 

The Peer-to-peer network infrastructure is used for data processing distribution. It’s commonly used to create a distributed index for data stored on different machines.

 

3.5.3. Grid 

 

The Grid networks are mostly used for heavy-weight data processing tasks. The grid infrastructure can support digital library in execution of the most heavy-weight operations like query processing which is to be executed at the runtime, converting resources, processing annotations, and indexing, that can be executed behind the scene. The semantic digital library should not deliver new features at the cost of lowered system efficiency; therefore, grid networks can support processing annotations and Semantic Web languages query processing to maintain overall system performance.

 

4. Semantic Library Projects

 

Several digital library prototypes or tools has been developed in different part of the World. A few are listed as follows:

 

4.1. JeromeDL Project (http://sourceforge.net/projects/jeromedl/) 

 

JeromeDL, a digital library that deploys Semantic Web technology for user management and search with an aim to provide personalized services to the users based on user’s profiles.. The FOAF vocabulary is used to gather information about user profile management, and semantic descriptions are utilized in the search procedure. JeromeDL is implemented in Java and available under an open-source license. The main components of the JeromeDL system consist of:

 

• Resource management: Each resource is described by the semantic descriptions according to the JeromeDL core ontology. Additionally a fulltext index of the resource’s content and MARC21, and BibTEX bibliographic descriptions are provided. Each user is able to add resources via a web interface. To satisfy the quality of delivered content, each resource uploaded through the web interface has to be approved for publication. The administrative interface for librarians (JeromeAdmin) allows to manage resources and associated metadata (MARC21, BibTEX and semantic annotations) as well as to approve user submissions.

 

• Retrieval features: JeromeDL provides searching and browsing features based on Semantic Web data through a three step search algorithm.

 

•  User profile management: In order to provide an additional semantical description of resources, scalable user management based on FOAF is utilized.

 

• Communication link: Communication with an outside world is enabled by searching in a network of digital libraries. The content of the JeromeDL database can be searched not only through the web pages of the digital library but also from the other digital libraries and other web applications. A special web services interface based on the Extensible Library Protocol (ELP) has been developed for this purpose.

 

4.2. BRICKS Digital Library Infrastructure (http://www.brickscommunity.org/) 

 

BRICKS (Building Resources for Integrated Cultural Knowledge Services) is an European Project which aims at enabling integrated access to distributed resources in the Cultural Heritage domain. The target audience for BRICKS is broad and heterogeneous in nature and involves cultural heritage and educational institutions, the research community, industry, and the general public. The architecture of BRICKS digital library systems is distributed nature and rather than managing contents and metadata in a centralized system, institutions have the possibility to install a BNode for bringing in their contents and metadata into the BRICKS network. Each BNode instance, provides a set of services which cover the major functionalities (e.g. DRM, User Management) required by each institution.

 

4.3. MarcOnt Initiative 

 

MarcOnt Initiative is an initiative to create a new bibliographic description standard in form of an ontology and related tools utilizing semantic technologies. The goal of the MarcOnt Initiative is to create an ontology which would have potential to become a standard for describing data in semantically-enabled digital library systems. It was created to address the following requirements:

 

•    The  ontologyshould  be  compatible  with  at  least  MARC21  Concise  Format  For Bibliographic Data,

•    MarcOnt  ontology  should  be  written  using  OWL  DL  (Web  Ontology  Language  – Description Logics),

•    Semanticdescriptions should be easily translated to MARC21 or Dublin Core formats (and possibly others),

•    Itshould be possible to merge MarcOnt ontology with other (bibliographic) ontologies. paper we present the first version of MarcOnt ontology is one of the on-going projects aiming at creating a new bibliographic description standard (MarcOnt) and mediation services to support different legacy bibliographic formats. But the outcome of the project proposal has yet to come.

 

4.4. Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) Project (http://simile.mit.edu/) 

 

The Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) Project was jointly conducted by the MIT Libraries and MIT CSAIL (founding partners also included HP Laboratories and the World Wide Web Consortium) with support from the Andrew

  1. W. Mellon F SIMILE sought to enhance interoperability among digital assets, schemata/vocabularies/ontologies, metadata, and services. A key challenge it solved was to make collections interoperable, which are distributed across individual, community, and institutional stores — by drawing on the assets, schemata/vocabularies/ontologies, and metadata held in such stores.

 

5. Summary

 

Semantic digital libraries are the next step in the evolution of current digital library management systems. They offer services which exploit the notion of semantics and cater to online communities of library users to keep up with growing demand on the quality of provided services.

 

 

 

References

 

  • Kruk, Sebastian Ryszard, and Bill McDaniel. Semantic digital libraries. Springer, 2008.
  • Kruk, Sebastian Ryszard . Semantic Digital Libraries: Improving Usability of Information Discovery with Semantic and Social Services. Available online at http://www.slideshare.net/skruk/semantic-digital-libraries
  • Kruk, Sebastian Ryszard, and Bill McDaniel. “Goals of semantic digital libraries.” Semantic digital libraries. Springer Berlin Heidelberg, 2009. 71-76.
  • Kruk, Sebastian Ryszard, Adam Westerki, and Ewelina Kruk. “Architecture of Semantic Digital Libraries.” Semantic Digital Libraries. Springer Berlin Heidelberg, 2009. 77-85.
  • Kruk, Sebastian Ryszard, Stefan Decker, and Lech Zieborak. “JeromeDL–adding semantic web technologies to digital libraries.” Database and Expert Systems Applications. Springer Berlin Heidelberg, 2005.
  • Synak, Marcin, and Sebastian Ryszard Kruk. “Marcont initiative-the ontology for the librarian world.” 2nd European Semantic Web Conference ESWC 2005. 2005.
  • Haslhofer, Bernhard, and Predrag Kneževié. “The BRICKS digital library infrastructure.” Semantic Digital Libraries. Springer Berlin Heidelberg, 2009. 151-161.