14 Semantic Web, Invisible Web and Deep Web

Dr Aditiya

I. Objectives

• To attain knowledge about the semantic web, its definition, concept, history and architecture.

• To get familiarized with various components of semantic web

• To learn about major tools that are used for developing semantic web

II. Learning Outcomes

After studying this lesson, learners will attain knowledge about the semantic web, its definition, concept, history and architecture. They would attain knowledge about components of semantic web such as register, decomposer, reasoner, invoker and matchmaker. They would learn about major tools that are used for developing semantic web, viz. extensible Markup Language (XML), Resource Description Framework (RDF), Web Ontology, etc.

III. Module Structure

1. Introduction

2. Semantic Web

2.1 Definition

3. History of Semantic Web

4. Architecture of Semantic Web

5. Semantic Web : components and tools

5.1 Components of Semantic Web Service

5.2 Tools for developing Semantic Applications

5.2.1 Extensible Markup Language (XML)

5.2.2 Resource Description Framework (RDF)

5.2.3 Modeling data in RDF

5.2.4 Web Ontology

5.2.5 Web Ontology Language (OWL)

6. Issues and & Challenges

7. Promises of Semantic Web

8. Implication and Applicability in Real World

9. Impact on trinity of libraries

10. Summary

11. References

1. Introduction

In the era of information explosion, information retrieval and management of retrieved information and to point out the relevant information from the ocean of information is a difficult task. To find out the exact information according to the desired need over the Web is very difficult. To resolve the problems various technologies are emerging everyday. In the same way the concept of semantic web is one of the new ideas to improve our existing information retrieval system using machines to reduce human efforts.

Semantic web is an effort of World Wide Web promoted by World Wide Web Consortium to make information available on Web as machine-processable. It is a concept which makes possible to organize available Web information resources and to use them not only by syntax and structural methods but also by the semantics’ on the concept. It is an abstract representation of World Wide Web resources based on a framework known as RDF (Resource Description Framework).

2. Semantic Web

The concept of Semantic Web has gone a long way since its inception. It is visualized that application (search engines or intelligent agents) will not only understand the semantics of the available information they would make devices communicated as and when it is required. The promises are high with the application of Semantic Web.

2.1 Definition

According to Tim Berners-Lee “Semantic web is an extension of current web in which information is given well defined meaning ,better enabling computers and people to work in cooperation.”

The term Semantics means study of meaning expressed by elements of a language, characterizable as a symbolic system. Semantic web uses the technologies which helps machines to understand information on the Web including visible and invisible web (information which is available and indexed in the database of search engine and information that cannot be reached by search engines). It provides a better search result in more defined, meaningful and understandable way. Semantic web can also link the databases and applications along with the information contained by them, resulting a user in getting the richest and relevant sources.

Thus, we can define the Semantic web as, the Web which can provide semantic search results. In other words, it can understand the meaning of searched linguistic element by analyzing it and show the results in defined way.

3. History of Semantic Web

Tim Berners-Lee of CERN lab had invented the Web in 1989 and since then it has gone a long way. Tim Berners-Lee’s original vision of the Web was much more ambitious than the reality of the existing (syntactic) Web. Further it is Tim Berners-Lee who visualized Semantic Web as “A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities”. Historically the concept has emerged out of different versions of the web. The good old early web which is also called as static web or Web 1.0 led to another version of the Web which was more interactive and hence labled as Web 2.0. The history of Semantic applications or the Semantic web is an extract of development of Web. The Web 2.0 where the applications are connected to each other with the use of web based ontologies and metadata is a primitive kind of Semantic application which will further lead to the full blown Semantic Web as Web 3.0. Tim Berners Lee further puts it as if the interaction between person and hypertext could be so intuitive that the machine-readable information space gives an accurate representation of the state of people’s thoughts, interactions, and work patterns, then machine analysis could become a very powerful management tool.

Hence, it is realized that there is a strong need to develop a system of Web objects where machine can analyze search text and understand its meaning/context and interpret accordingly for automating various procedures.

4. Architecture of Semantic Web

Semantic web ha s seven layered architecture which is composed mainly of seven functions and each function is nearly represented by a layer as follows :

Fig. 1: Architecture of Semantic Web

Starting from the bottom of the above shown diagram the first layer consist of UNICODE and URI. Unicode is a universal character representation standard for representation of any written script. URI stands for Uniform Resource Identifier, it is a standardized form that allows to identify resources uniquely. There are different variants of URI as URN (uniform resource naming), PURL (Persistent URL), URL (Uniform Resource Locator) and so on. URI provides understandable identification of all resources in a distributed Internet system.

The next layer is of Extensible Markup Language (XML), a language used for describing resource in a nested system like HTML (Hypertext MarkUp Language). XML is used to define namespace and develop XML schema in a standard syntax forming the very basics of Semantic Web.

Next layer is Resource Description Format (RDF) which describes the format of representation of Knowledge or an idea or an object in triplicate format i.e; Subject-Predicate-Object.

RDF incorporates metadata representation about WWW resources and provides a mechanism through RDFS (Resource Description Format Schema) to define taxonomies/ontology. These taxonomies/ontology forms basic constructs for semantic services in the form of classes and their respective properties.

In order to develop large scale ontology a language is given, which is named as Web Ontology Language (OWL). OWL is derived from description logic and offers more constructs over RDFS. A construct is an architectural unit and here these construct appear as standardized vocabularies. These vocabularies create a knowledge structure which in turn should be used to reason using rules and logics.

Rule Interface Format (RIF) and Semantic Web Rule Language (SWRL) provide a layer over ontology layer for reasoning among the various concepts represented as knowledge construct. Further a query layer of Simple Protocol and RDF Query Language (SPARQL) is used to query the whole underlying architecture using RDF sentences and resources. SPARQL is used to query RDF data structure (knowledge base) woven by RDFS and OWL.

Above all this, there is execution layer resulting in the proof of logic and develops a trust in terms of input given and output received. Finally, above all these layers user interface is built.

5. Semantic Web: components and tools5.1 Components of Semantic Web Service

Semantic Web is basically a concept where system performs the tasks which are normally performed by applying human intelligence. In this, system analyzes the search terms and understands its meaning and further interprets, rather simply presenting to users. Computer based intelligent systems will replace human intellect and only input is given by humans in the form of data. Hence, machine would understand the meaning of the data; then after the processing and arrangement it will provide a structured format to it so that reasoning can be performed for more meaningful and understandable output. Following are the physical components of Semantic web services:

Register: A register is a place where the factual data is stored collectively in the form of resources or objects. This is raw state of data where no processing is administered over the object.

Decomposer: This is the component which initiates the processing on object by disintegrating the various components of object. The disintegrated parts are arranged in a sequential manner.

Reasoner: This is the most important component of any semantic application. It is collection of rules which is used for analyzing the object. Basically, these are rule based systems which apply the rules on the collected data or object. This reasoning is used for problem solving.

Invoker: It is a triggering component which initiates the process of searching or the action of the service. It starts with the request made by the client for the service.

Matchmaker: The main execution module which looks after the most suitable result as per the request of client.

5.2 Tools for developing Semantic Applications

5.2.1 eXtensible Markup Language (XML)

XML describes and exchange data on web. It allows the creators to create pages in their own language in which phrases can hold their meaning and description. The tags used by XML are more meaningful than the tags used by the HTML. For example, XML uses the tag <Author> rather than using the tag <H1> like HTML. Author is more meaningful and self defined than H1. HTML tags are predefined while in XML tags are of creators’ choice. XML can help the creator to decide that what information should be put between the tags and those information have hierarchical structure that’s why a user can easily understand that what does this information means. (Aditya Tripathi, 2003)

5.2.2 Resource Description Framework (RDF)

It is a language for representing information about resources in the web.RDF identifies things by using URIs. It uses simple statements (Triplets) to describe things. It is a domain dependent technology providing a way to build an object model from which actual data is referred.

Development of RDF started with the initiation of PICS (Platform for Internet Content Selection) project in 1995. PICS was a rating mechanism about the content of web pages. The idea was to filter the unwanted set of web pages, which contain foul language, pornographic material, violence etc. Once the project was initiated, it was found that it can be used for describing the content of web page and could be made to represent content understandable by machines. The extension of PICS project was PICSNG (PICS Next Generation), which was later called as RDF (Resource Description Framework).

5.2.3 Modeling data in RDF

Representation of data through RDF is very easy as it follows a triplet of Resource, Property and Value. A simple RDF model has three parts.

i. Resource (subject): Any entity which has to be described is known as ‘Resource’ which is equivalent to ‘Subject’ in normal English grammar. It can be a ‘webpage’ on Internet or a ‘person’ in a society or any object.

ii. Property (Predicate): Any characteristic of ‘Resource’ or its attribute which is used for the description of the same is known as Property,

which is equivalent to ‘Predicate’ in normal English grammar. For example, a webpage can be recognized by ‘Title’ or a man can be recognized by his ‘Name’. So both are attributes for recognition of resource ‘webpage’ and ‘person’ respectively.

iii. Value (Object): A Property must have a ’Value’ which is equivalent to ‘Object’ in normal English grammar.

5.2.4 Web Ontology

Presently, search engines perform searching over stored indexes in their databases with pattern match algorithm. This search lacks representation of concept with search term. This inherent problem is not due to any difficulty with search engines rather it is due to representation of data in webpage using Hyper Text Markup Language (HTML), the language of Web. Hence, a mechanism is visualized to represent the data of web pages using another language i.e. Extensible Markup Language (XML) with a standard data description framework called as Resource Description Framework (RDF). It is understood that each individual web page can be considered as an entity and will have its attributes or characteristics. Based on this property the pages can be grouped and further they can form relation with other web page(s) or group of web pages. This develops a kind of web based ontology also known as Web ontology for web documents but the original idea of ontology remains same. This framework uses standard vocabularies like Resource Description Frame Work Schema (RDFS) and Web Ontology Language (OWL) for describing the concepts and its relations with other concepts. The search engines extract the data from the web page and preserve the relation with the data, so that meaningful results can be generated.

5.2.5 Web Ontology Language (OWL)

The Web Ontology Language (OWL) is a language to create the Web ontologies. A Web ontology follows object oriented approach and hence, facilitates descriptions of classes, properties and their instances. These ontologies preserve the formal semantics and specifies derivation of logical consequences. This ontological structure may represent one as well as collection of web objects.

OWL ontologies would provide developing agents which can reason. These agents would provide generic support avoiding any particular subject domain. The standard method of constructing ontologies would lead to third party agents including commercial as well as public domain agents. These agents will further build services to ultimately benefit the users.

The Species of OWL

The OWL language provides three increasingly expressive sublanguages designed for use by specific communities of implementers and users. (Michael K, 2004)

OWL Lite supports those users primarily needing a classification hierarchy and simple constraint features. For example, while OWL Lite supports cardinality constraints, it only permits cardinality values of 0 or 1. It should be simpler to provide tool support for OWL Lite than its more expressive relatives, and provide a quick migration path for thesauri and other taxonomies.

OWL DL supports those users who want the maximum expressiveness without losing computational completeness (all entailments are guaranteed to be computed) and decidability (all computations will finish in finite time) of reasoning systems. OWL DL includes all OWL language constructs with restrictions such as type separation (a class cannot also be an individual or property, a property cannot also be an individual or class). OWL DL is so named due to its correspondence with description logics [Description Logics], a field of research that has studied a particular decidable fragment of first order logic. OWL DL was designed to support the existing Description Logic business segment and has desirable computational properties for reasoning systems.

OWL Full is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees. For example, in OWL Full class can be treated simultaneously as,a collection of individuals and as an individual in its own right. Another significant difference from OWL DL is that a owl:DatatypeProperty can be marked as an owl:InverseFunctionalProperty. OWL Full allows an ontology to augment the meaning of the pre-defined (RDF or OWL) vocabulary. It is unlikely that any reasoning software will be able to support every feature of OWL Full.

6. Issues and Challenges

Semantic Web is a concept and there is no physical availability of it. To develop a machine with human intellect is difficult task and it is still evolving. The major problem with semantic web is its implications. Some major issues of semantic web are:

• Uncommon tags are used for creating the content that is why it is difficult to understand the meaning. Creator can choose tags according to their convenience so, for a particular context different tags are used. For example, for the writer of book various tags like creator, author may be used and due to this, there will be no standardization.

• Lack of common ontology may cause problem in creating databases.

There should be a top level ontology and that should be accepted by all. So far, such kind of ontology has not been worked out.

• Multilingualism can create problems in exact searching because common terms of different languages have different meaning.

• Practical implication of semantic web is difficult task because to design a system having capabilities of thinking, analyzing and decision making is a tough job.

• New information are emerging every second, anyone can put his/her view, ideas, concepts. Semantic Web has to handle this large amount of data generated every second. Further this large database will be reasoned on the fly on each request of the user which is a challenge in itself.

7. Promises of Semantic Web

• Semantic web can represent information in more categorized way.

• Taxonomy is more standardized way of representation of knowledge.

• Data can be linked such that meaningful inferences are drawn.

• The whole concept of Semantic Web would function over web browser; one need not to look for new software and technologies.

• New mechanism to find more reliable and trusted information.

• New services or agents will be developed, intelligent enough to pipe in the useful data from other source.

• Web surfing in Semantic Web is more targeted and browsers used in it are able to produce more customized searching.

8. Implication and Applicability in Real World

With the development of society, human being can move towards literacy, gain information and become knowledgeable. The humans have developed various technologies and are moving towards technological era. With the advent of information technology, modernization of society is increasing day by day. The existing society or the information society is becoming machine oriented and today we are performing various tasks with the help of machines. The Human being has developed computers for his ease to reduce his physical exercise but his needs still remains the same and he is continuously moving towards new inventions and new technologies. The growth of artificial intelligence and robotics are good example of such development where a robot has almost all the capabilities like humans but whatever it may be it doesn’t have thinking, analyzing and decision making capabilities. Tim Berner’s Lee started working in this direction and tried to develop such a system which have all these capabilities and as a result of this concept of semantic web has evolved.

At present we can see lot of applications or the services over Web which assist users to locate most suitable result. The flight booking agents, travel planner agents, social networking sites do work within the scope of semantic technology. It is just a beginning and in future there will be more and more intelligent applications.

9. Impact on trinity of libraries

Libraries are changing and changing radically. Whether one day there will be an agent to replace librarian is a question to be pondered over? Though sounds impossible but near solutions can be reached. The libraries are being fully automated with minimal interference from the staff. Semantic technology is going to make its mark in information discovery and retrieval. This may reduce the mental exercise of the staff and burden and further benefit the users. A user can get all the information on his doorstep only by putting the query.

Semantic Web is also beneficial in document management. Though there is no shelf but concept maps of Classification schemes are going to be a big help for constructing ontologies especially for retrieval.

10. Summary:

Semantic web is in its starting phase and we are focusing to develop its basic and static infrastructure. The next step will be to realize active components on top that makes use of this infrastructure to provide intelligent services to users. We are trying to provide various high level services by mechanizing different aspects like searching for vendors, products, services; comparing and combining products; coalition forming of vendors etc., which requires human efforts but still most of long way is ahead.

11. References

1. Semantic Web introduction (http://infomesh.net/2001/swintro/)

2. Berners-Lee, T., Hendler J. and Lassila,O: The Semantic Web. Scientific American 284(5) (2001)

3. Introduction to RDF and RDFS ( http://www.xml.com/pub/a/2001/01/24/rdf.html)

4. Ontologies and Semantic Web (http://www.obitko.com/tutorials/ontologies- semanticweb/ontologies.html)

5. Abran, A., and Moore, J.W. (Exec. Eds.), Bourque, P. and Dupuis, R. (Eds.) Guide to the

6. Software Engineering Body of Knowledge (2004)

7. Advanced Computing: An International Journal ( ACIJ ), Vol.2, No.6, November 2011-12

8. World Wide Web Consortium (W3C) Semantic Web activity’s homepage. (http://w3c.org/sw.)

9. The Semantic Web by Eric Miller (http://www.w3c.org)

10. IEEE Std 610.12-1990, IEEE Standard Glossary of Software Engineering Terminology, IEEE, 1990.

11. Yajing Zhao, Jing Dong, Senior Member, IEEE, and Tu Peng Ontology Classification for

12. Semantic-Web-Based Software Engineering, IEEE TRANSACTIONS ON SERVICE COMPUTING, VOL. 2, NO. 4, OCTOBER-DECEMBER 2009

13. H.-J. Happel and S. Seedorf. Applications of ontologies in software engineering.(2nd Workshop on Semantic Web Enabled Software Engineering (SWESE 2006) at ISWC 2006, Galway, Ireland, November 11-15, 2006)

Ian Horrocks and Alan Rector. The Semantic Web : Ontologies and OWL
Nigam Shan. An introduction to OWL and its alternatives, National Centre for Biomedical Ontologies
Ying Ding, Dieter Fensel, Michel Klein and Borys Omelayenko. The Semantic web : Yet Another Hip, Data Knowledge Engineering, 2002, 6.10.01.
OWL Web Ontology Language Guide, Michael K. Smith, Chris Welty, and Deborah L. McGuinness, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/owl-guide/
Aditya Tripathi. Resource Description Framework: A Tutorial for Developing Web Ontology. DRTC Workshop on Semantic Web 8th – 10th December,2003DRTCBangalore http://drtc.isibang.ac.in:8080/bitstream/handle/1849/120/D_aditya- sematic_web.pdf?sequence=2