12 Advances Course in Information Storage and Retrieval II: Semantic Web

Biswanath Dutta

I. Objectives

To study the advanced semantic techniques and technologies in view of information storage and retrieval.

II. Learning Outcomes

After going through this module the students:

• Will know the meaning of Semantic Web and its objectives.

• Will know about knowledge representation and knowledge modelling framework.

• Will learn about various semantic tools and techniques.

• Will know the W3C recommended semantic web languages for semantic representation of data and knowledge.

• Will know the potential applications of natural language processing technologies in Semantic Web.

III. Structure

1. Introduction

2. Semantic Web

3. Semantic Web Components

3.1 Extensible Markup Language (XML)

3.1.1 Salient Features of XML

3.1.2 Issues with XML

3.2 Resource Description Framework (RDF)

3.2.1 RDF Triplets

3.2.2 Salient features of RDF

3.3 Resource Description Framework Schema (RDFS)

3.3.1 RDF vs. RDF Schema

3.3.2 Issues with RDF Schema

3.4 Ontology

3.5 Logic and Ontology Language

3.5.1 Description Logics

3.5.1.1 Description Logic Components

3.6 Web Ontology Language (OWL) and its species

3.6.1 OWL Full

3.6.2 OWL DL

3.6.3 OWL Lite

3.7 Proof Layer

3.8 Trust Layer

4. Semantic Web and Natural language Processing

5. Summary

6. References

1. Introduction

The web is a success story both in terms of availability of information and increasing number of users [4]. Today people use the Web for various purposes including for knowledge acquisition, sharing thoughts, business and entertainment. However, we all are aware about the fact that in most of the cases the retrieved results, from the Web of documents using the present search engines, like, Google, Yahoo, etc., are highly irrelevant and noisy. The problem is most of the information available on the Web are for human consumption and human interpretation, not for machine to consume, interpret and process. The search engines do not really understand our query, the meaning of the query. They process and execute our query considering the query as a set of strings, which they match against their indices and retrieve the results accordingly.

Semantic Web, an extension of the present Web, is characterized by associating the machine accessible formal semantics with the Web content. The motivation behind Semantic Web is to automatize the processing and execution of the Web of information and to improve interoperability among the Web based information systems. The goal is to retrieve meaningful and relevant information to the users.

In this module we discuss Semantic Web (SW) techniques and technologies. We also discuss the potential uses of Natural Language Processing (NLP) in Semantic Web (see module no. for discussion on NLP).

2. Semantic Web

Tim Berners-Lee, an inventor of the World Wide Web (WWW), first envisioned the idea of Semantic Web (SW). Generally speaking, Semantic Web is not a new Web, rather it is “an extension of the current Web in which information is given well-defined meanings to enable computers and people to work in cooperation [5]”. The well-defined meanings and explicit representation of data, will enable the Web to provide qualitatively a new level of services. Antoniou and Frank van [3] defined Semantic Web “a vision of the next generation web, which enables Web applications to automatically collect Web documents from diverse sources, integrate and process information and interoperate with other applications in order to execute sophisticated tasks for humans.”

Semantic Web proposes a set of semantic technologies and techniques (discussed below) to allow machines to process logically connected data on the Web automatically and infer new information. Through a rich knowledge representation model, such as, Resource Description Framework (RDF), Semantic Web provides a highly structured data. It is now possible for application developers to share their rich structured data on the Web, and software agents can infer knowledge based upon the different kinds of structured and logically connected data available on the Web. It is important to mention that RDF is built on the elementary pointer mechanism, Universal Resource Identifier (URI) (discussed below). We know in traditional Web, URI is mainly used to refer the documents and its parts through the hypertext mechanism. But the emerging Semantic Web shows a new face of it by using it to name anything, starting from the abstract concepts color, test, dream, etc. to the physical objects person, location, mountain, etc. to electronic objects (aka information object) home page of an institution. RDF is also used to name the relationships between objects as well as the objects themselves [8].

In the following sections we discuss the semantic techniques and technologies.

3. Semantic Web Components

Figure 1 shows the semantic web technology stack that describes the semantic web design and vision. It is built on a layered structure. The goal of the layered structure is to implement the semantic web vision step by step. The idea is, as stated in [3] [9], it is easier to achieve consensus on small steps, whereas it is much harder to get everyone on board if too much is attempted. On the other way, it is also because to achieve the vision of semantic web, it is not mandatory to implement the entire semantic web technology stack. Instead the decision of implementing the technologies would be guided by the system objective.

Fig. 1: Technology stack of Semantic Web [5]

In building the semantic Web in a layered manner, Antoniou and Frank van [3] have discussed the following two principles:

i. Downward Compatibility: agents (agents are software programmes that work autonomously and proactively), fully aware of one layer, should also be able to interpret and use information written at lower levels. For example, an agent aware of the semantics of OWL can take full advantage of information written in RDF and RDF Schema.

ii. Upward Partial Understanding: agents fully aware of one layer should also be able to take at least partial advantage of information at higher levels. For example, an agent aware of only RDF and RDF Schema semantics can interpret partial knowledge written in OWL, by disregarding those elements that go beyond RDF and RDF Schema.

3.1 Extensible Markup Language (XML)

In Figure 1, at the bottom of the Semantic Web layer is XML (eXtensible Markup Language) and XML Schema. XML is a subset of Standard Generalized Markup Language (SGML). XML has become an universal meta language for defining markup and is an important technology in the development of semantic web. It allows users to create their own tags to describe the data in a nested structure. However, it says nothing about what the structures means. Generally speaking, XML is a suitable format for sending documents and exchange of information on the Web.

3.1.1 Salient Features of XML

Some of the salient features of XML as discussed in [3] are:

i. Extensible: tags can be created, defined and can be extended to lots of different applications.

ii. Machine accessibility: XML document is easily accessible to machines as every piece of information is described. Moreover, their relations are also defined through the nesting structure. For example, the <title> tags appear within the <book> tags, so they describe properties of the particular book. A software processing the XML document would be able to deduce that the title element refers to the enclosing books element.

iii. Separates content from formatting: XML separates the content from formatting. The same content can be displayed in multiple ways, without creating the multiple copies of the same content. We just need to change the template and not the content. Also, the content can be used for many purposes other than the display.

iv. A meta-language for markup: XML does not have any fixed set of tags, instead it allows users to create their owns tags.

3.1.2 Issues with XML

Similar many others, XML is also not free from the issues. Some of the well-known limitations of XML are [3]:

i. Interoperability: XML does not enforce for standardizing the vocabulary and becomes a matter of subject to interpretation. For example, one can annotate a document with an element name “Author”, and other can use “Creator” to refer the same. A human being can make out that both the element names are referring the same, but the question is how a software programme can understand this? The non-standardization of terms creates an obstacle for data sharing between the systems.

ii. Tag nesting: The nesting of tags does not convey any standard meaning. In other words, there is no meaning associated to the nesting of tags, rather it is up to the applications to interpret. For example, David John is a lecturer of Thermodynamics. This piece of information can be represented in XML in various ways as follows. At least the two possibilities are:

<lecturer>David John</lecturer>

</course>

<teaches>Thermodynamics</teaches>

</lecturer>

Notice that the above two representations show two different ways of nesting, although they represent the same information. In the first case, course name is considered as the primary one that nested the element lecturer, while, in the second case, lecturer is treated as primary element and the nested element is teaches referring the course name. Hence, we can say that there is no standard way of assigning meaning to tag nesting.

iii. Domain-Specific Markup Languages: since XML allows users to define their own tags, we see lots of domain specific markup languages that have evolved in the recent past. For instance, MathML [10], CML (Chemical Markup Language) [11]. The main issue with these domain specific markup languages is non-standardization of terms in describing the objects. However saying so, we can further state that preventing the flexibility of creating domain specific markup languages may cause to the lack of inadequate resource description. Hence, the idea is to treat this flexibility as a feature instead of a limitation.

Next we discuss the next layer, i.e., RDF of the Semantic Web technology stack in Figure 1.

3.2 Resource Description Framework (RDF)

Resource Description Framework is designed as a metadata data model and is not a language. It is a framework designed to express information on the Web about resources. The resources can be anything including the both physical object (e.g., people, organization, locations) and abstract object (e.g., numbers, events) [12] [27]. RDF is primarily intended to make information machine processable, rather than only display to the human. RDF is based on the idea of identifying anything using URI (Uniform Resource Identifier).

3.2.1 RDF Triplets

RDF allows us to express information about the resources as a statement. Each statement is expressed as a triple and has three basic components: <subject> <predicate> <object>. Each triple is like the subject, verb and object of an elementary sentence.

An example of a statement:

David John is the author of the webpage http://drtc.isibang.ac.in/~David. This statement can be represented diagrammatically as shown in Figure 2 [4]:

Fig. 2: RDF statement

In the above the subject, predicate and object are [12]:

i. Subject: is any resource which has to be described. The resources can be any entity, for instance, a webpage, an article, a person, an organization or a place. In the above example, the subject is the Webpage of David.

ii. Predicate: is any characteristic of a resource or its attribute, which is used for the description of the resource. For example, an article can be described by Title or a person can be described by his Name. In the above example, author is the predicate.

iii. Object: is any value for a predicate of a resource. For instance, the title of DRTC webpage is Documentation Research and Training Centre, or, name of a Person is John Smith. In the above example, David John is the object (the value) for the predicate author of the resource, i.e., the webpage of David.

The triples can be expressed using XML or any other serialization formats, such as, Notation 3 (N3) [28], N-Triples [29], Turtle [30], etc. The XML representation for the above statement Figure 2 is as follows.

<?xml version=”1.0″? Encoding=“UTF-16”?>

<rdf:RDF

xmlns : rdf =”http://www.w3.org/1999//02/22-rdf- syntax-ns#” xmlns : mydomain=”http://mydoamin.org/schema/”>

<rdf:Description rdf:about=”http://drtc.isibang.ac.in/~David“>

<mydomain:author>David John</mydomain:author>

</rdf:Description>

</rdf :RDF>

The first line specifies that we are using XML version 1.0. xmlns: rdf =“http://www.w3.org/1999//02/22-rdf- syntax-ns#” specifies the XML namespace for RDF. xmlns:mydomain=”http://mydoamin.org/schema/ specifies the XML namespace for our own defined vocabulary mydomain. Note that an XML namespace [31] is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute name. The syntax of declaring an XML namespace is: xmlns:namespace- prefix=“namespace”. The rdf:Description element makes a statement about the resource http://drtc.isibang.ac.in/~David, which is a webpage. Within the description the predicate author element is used to describe the resource and the value for the predicate is David John.

3.2.2 Salient features of RDF

Some of the salient features of RDF are:

i. The framework or the data model is designed in a generic manner, i.e., independent of any particular domain. Hence, data, belonging to any domain, will have the same representation. This feature enables us to make use of a same piece of data in different ways in different applications.

ii. RDF is based on the idea of identifying things on the Web using URIs.

iii. RDF can be seen as a directed graph with labeled nodes and arcs as in object-oriented programming.

iv. RDF is designed as a metadata data model.

3.3 Resource Description Framework Schema (RDFS)

As discussed above, the RDF data model allows us to make statement about the resources. But, this data model does not make any assumption about what the resource IRI stands for [12]. To make the meaning of the information explicit, in practice, the data is expressed in RDF in combination with vocabularies. To support the definition of vocabularies, RDF provides the RDF Schema language [12].

RDF Schema uses the notion of class to specify the categories which can be used to classify the resources. RDFS provides a base level schema, which can be extended through subclass refinement. With the RDF Schema, we can build the hierarchies of classes and properties. The relation between an instance and a class is expressed by a property called type. Further, with the RDF Schema, one can apply the type restrictions on the subject and object of a particular triple through the properties domain and range. An example of the use of type property and property restrictions domain and range are shown in Figure 3.

3.3.1 RDF vs. RDF Schema

Figure 3 illustrates the different layers involved in RDF and RDFS for the representation of a statement: F. Guinchiglia is a supervisor of John. The schema for this statement may contain classes such as faculty, associate professor, professor, graduate and undergraduate student, etc. and properties such as supervisorOf, academic Assignment, etc. Note that property academic Assignment (note shown in the figure) is a super-property of the sub-property supervisor Of. In the Figure, the rectangles represent the properties, ellipses above the dashed line represent the classes and below the dashed line are the instances.

Fig. 3: RDF and RDFS layers

3.3.2 Issues with RDF Schema

RDF Schema allows to represent lightweight ontological [20] knowledge. It allows to create the hierarchies of classes and subclasses and also the property and sub-properties. However, RDF Schema is still restricted to its scope. It has a restricted expressivity and hence is not sufficient to design a formally defined ontology which would allow us to infer implicit knowledge from the knowledge base [4]. Some of the functionalities that are not supported by RDF Schema language are [18]:

i. Local scope of properties: rdfs:range defines the range of a property globally for all the classes. In RDFS, we cannot express the range restrictions locally. This means the range restrictions cannot be applied only to some classes. For example, we cannot say that Assistant Professors supervise only Master’s dissertations, while Full Professors supervise PhD. Dissertations.

ii. Disjointness of classes: Sometimes we may wish to express that classes are disjoint. For instance, Assistant Professor and Associate Professor are disjoint. This is not possible in RDF Schema.

iii. Boolean combinations of classes: Sometimes we may need to create new classes by combining a set of classes using, for instance, union, intersection, and complement. For example, we may need to express the class Academic staff to be the disjoint union of the classes Assistant Professor, Associate Professor and Professor. In RDFS, this cannot be done.

iv. Cardinality restrictions: Sometimes we may need to apply restrictions on a property stating that how many distinct values a property may or must take. For instance, we may wish to express that a Person has exactly two parents, or that a PhD. candidate is supervised by at least one full professor. In RDFS, this cannot be expressed.

v. Special characteristics of properties: In RDFS, we cannot express the characteristics (e.g., transitive, inverse, functional) of the properties. In knowledge representation, the expression of the meta properties are proved to be useful for reasoning and inferencing knowledge. For instance, a resource A is part of a resource B and resource B is part of a resource C and declare that part Of is a transitive property, then the reasoner can conclude that resource A is also the part of C.

From the above we can say that to build a well-defined formal ontology, we need a language which is more expressive than RDFS. The language we choose must offers the above features and more. However, note that the language we choose must have a well balance between the expressivity power and the efficient reasoning support (in terms of tools). It is known that richer the language is, it is unlikely to build efficient reasoner. Hence, we need to choose a language which is rich in expression, we can express large classes of ontologies and knowledge, but also has a reasonably efficient reasoning tool support.

3.4 Ontology

The term “ontology” is originated more than two thousand years ago from metaphysics, a branch of philosophy, and more specifically from Aristotle’s theory of categories [22], where an ontology is a systematic account of existence. The purpose was to provide a categorization of all existing things in the world. Ontologies have been lately adopted in several other fields, such as Library and Information Science (LIS), Artificial Intelligence (AI), and more recently in Computer Science (CS). Many definitions of ontologies have been provided. In Information Science and Computer Science, ontology is considered as an engineering artefact and referred as a formal naming and definition of the types, properties, and interrelationships of the entities that really or fundamentally exist for a particular domain of discourse [32]. The most prominent definition of ontology was provided by Gruber in 1993 [21]. According to Grubar, ontology is an “explicit specification of a conceptualization”. In 1998, Studer et al [22] extended Gruber’s definition by stating that “an ontology is a formal, explicit specification of a shared conceptualization”. So, in simple words, we can say that ontology is a formally represented knowledge of a domain of discourse (aka universe of discourse) based on a shared conceptualization. Here, conceptualization refers to an abstraction, a simplified view of the domain of discourse motivated by some purposes. The formal and explicit specification of the conceptualization of the domain of discourse makes the constituents of an ontology machine interpretable [4].

The primary objective of an ontology is to share the knowledge it represents [1]. Some of the significant uses of an ontology are [3]:

i. An ontology provide a shared understanding of domains;

ii. An ontology is useful to represent and to facilitate the sharing of domain knowledge between human and automatic agents;

iii. An ontology can be used for organizing and navigating websites;

iv. An ontology is useful for improving the accuracy of search queries. The search engines can exploit the information generalization and/ or specialization.

3.5 Logic and Ontology Language

Logic is a branch of philosophy. Logic is not about the study of a truth, instead it is about the truth of a statement in relationship to another statement. In representing knowledge, logic plays an important role. Logic is primarily concerned with the study and valid use of reasoning. Logic helps in establishing the consistency and correctness in the data sets and to infer new piece of knowledge that are not explicitly stated but are required by or consistent with a known data sets. Logics can be characterized as follows [3, 8]:

i. Language: logic provides a high-level language in which knowledge can be expressed in a transparent way and will have a high expressive power.

ii. Formal semantics: logic has a well-understood formal semantics, which assigns an unambiguous meaning to logical statements.

iii. Reasoning: automated reasoners can deduce conclusions from the given knowledge, thus making implicit knowledge explicit. For example:

a. X is a bear

b. a bear is a Mammal

c. a Mammal has fur Therefore, X has fur.

iv. Inferred knowledge explanation: with the proof systems, it is possible to trace the proof that leads to a logical consequence. In this sense, we can say that logics can provide the explanations for the inferred knowledge.

Nevertheless, addition of logic to the Web needs care as the Web with several characteristics, can lead us to the problems, while we use the existing logics [8]. Addition of logic to the Web pre-supposes use rules to make inference, necessary courses of action, etc. It is important that the logic deployed must be powerful enough in describing the complex objects, but at the same time it must not be so complex and inflexible that it becomes contradictory for the software agents itself while infer knowledge.

There are number of different knowledge representation paradigms that have emerged to provide languages for representing ontologies, in particular description logics (discussed below) and frame logics. Web Ontology Language (OWL) is one such language that is based upon Description Logics (DL). The other such languages belonging to the family of description logics are such as, Knowledge Interchange Format (KIF) [24], Simple Common Logic (SCL) [13] etc.

3.5.1 Description Logics

Description Logic (DL) is a family of logic and is based on knowledge representation formalisms. It is a structured fragment of First Order Logic (FOL) and has efficient decision power. Research on DL started to overcome computational problems of different complexity as the reasoning in different fragments of FOL. The research on DL started under the label terminological systems to emphasize that the representation language was used to establish the basic terminology adopted in the modeled domain [14] followed by concept languages. Now DL has become a foundation of Semantic Web for its use in designing ontologies.

DL has become popular since the focus moved towards the properties of the underlying logical systems. Research on DL covered the theoretical foundation as well as the implementation of knowledge representation systems and the development of applications in several fields. For example, reasoning about database conceptual models; for schema representation in information integration system, or for metadata management; as logical foundation of ontology languages, etc. [14].

Description logics are formal logics with well-defined semantics. Semantics of DL is defined through model theoretic semantics, which formulate the relationships between the language syntax and the models of a domain. In designing DL, the emphasis is given on key reasoning problem decidability, and the provision of sound and complete reasoning algorithms. A key feature of DL is their ability to represent relationships beyond the is-a relationship that can hold between the concepts [14].

3.5.1.1 Description Logic Components

In DL, the important notions of a domain are described by concept descriptions that are built from concepts, roles and individuals. Concepts are the unary predicates, while the roles are the binary predicates. It is also possible to state facts about the domain in the form of axioms which act as constraints on the interpretations in a DL knowledge base [15].

In DL Knowledge Base (KB), the distinction between TBox (Terminological Box) and ABox (Assertional Box) is drawn which are the two main components of it. TBox contains intentional knowledge in the form of terminology and is built through declarations that describe concepts and their properties. In other words, it contains sentences describing concept hierarchies, i.e. relation between concepts and the various properties of the concepts. ABox contains extensional knowledge or assertional knowledge that is specific to the individuals of the domain of discourse [14].

3.6 Web Ontology Language (OWL) and its species

As it is discussed above, RDFS has limited expressivity. RDFS cannot be used to represent complex knowledge. For instance, we cannot express that male and female are two disjoint classes, or say that we want to express that a person has exactly two parents. To enable to represent complex knowledge, an enriched ontology language, called Web Ontology Language (OWL) has been developed and recommended by the Web Ontology Working Group of W3C. Initially, the researchers in Europe designed an ontology language called Ontology Interface Layer (OIL) and in United States, the Defense Advanced Research Project Agency (DARPA) designed independently another ontology language called Distributed Agent Markup Language (DAML) [34]. Latter these two have been merged together and formed a single ontology language DAML+OIL.

Latter this DAML+OIL became a starting point for the W3C Web Ontology Working Group in defining OWL. Description logics is the logical foundation of OWL ontology language. OWL is built on top of RDF and RDF Schema. OWL adds more vocabulary for describing properties and classes including the relations between classes (e.g. disjointness), cardinality exactly one), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes [16].

The primary goal of OWL language is to provide machine-processable semantics for resources that is to make the machine representations of resources more closely resemble their intended real world counterparts [17]. In order to add the following listed capabilities to ontologies, OWL uses both URIs naming and the resource framework for the Web provided by RDF [16]. The added advantages are:

i. Ability to be distributed across many systems

ii. Scalability to Web needs

iii. Compatibility with Web standards for accessibility and internationalization

iv. Openness and extensibility

OWL 1.0 ontology language consists of three sub-languages [33], such as, OWL Full, OWL DL and OWL Lite. These sub-languages differ by their power of expressiveness as discussed below.

3.6.1 OWL Full

It is the complete language and is more expressive compare to OWL DL and OWL Lite. It uses all the OWL language primitives [16] [33]. It allows free mixing of OWL with RDF Schema. Since OWL Full has higher expressivity, it does not enforce a strict separation between the classes, properties, individuals and data values.

Advantage: Since OWL is designed to be on top of RDF and RDFS, it is fully upward- compatible with RDF, both syntactically and semantically. This means, any legal RDF document can also be considered as a legal OWL Full document. Similarly, any valid RDF/RDF Schema conclusion can be treated as a valid OWL Full conclusion.

Disadvantage: The primary disadvantage of OWL Full is it is undecidable. It is because of its greater expressive power. Hence, the use of OWL Full is impractical for applications that require complete and efficient reasoning support. More expressive knowledge base leads to the complexity in terms of reasoning. Software programmes will need more time (where time growth rate is exponential) to process a query. Due to its greater expressive power, it is unlikely to have complete and efficient reasoning support by the reasoners.

3.6.2 OWLDL

It is a sublanguage of OWL Full. OWL DL includes all OWL language constructs, however, they can be used only under certain conditions (for example, while a class may be a subclass of many classes, a class cannot be an instance of another class. This means classes and individuals are separated in OWL DL). OWL DL provides maximum expressivity while retains the computational completeness, i.e., all conclusions are guaranteed to be computable and within a finite time. OWL DL is decidable. OWL DL corresponds to the SHOIN(D) [14] of description logics.

Advantage: It supports efficient reasoning as the classes, properties and individuals are strictly separated. It is a decidable language, hence would be useful for applications that require complete and efficient reasoning.

Disadvantage: We lose full compatibility with RDF. The use of OWL DL is relatively complex compared to OWL Lite.

3.6.3 OWL Lite

It is a sublanguage (alternatively, a lighter version) of OWL DL. OWL Lite is a OWL DL with more restrictions. OWL Lite supports only a subset of the OWL language constructs, for instance, it excludes enumerated classes, disjointness, arbitrary cardinality (only permits cardinality values 0 or, 1), etc. It corresponds SHIF(D) of descriptive logic. For example,

Advantage: It is easy to understand and use. It is also easy to implement by the tool developers. Due to its simple structure and lighter expressivity, it proves a quick migration path for thesaurus and taxonomies.

Disadvantage: OWL Lite has restricted expressivity. To represent and reason complex knowledge, OWL Lite is not useful.

Table 1 shows an OWL ontology (expressed in RDF/XML) for the schema shown in Figure

3. The ontology is produced using the ontology editing tool Protégé [35].

<?xml version=”1.0″?>

<!DOCTYPE rdf:RDF [

Table1:OWLOntology

<!ENTITY owl “http://www.w3.org/2002/07/owl#” >

<!ENTITY xsd “http://www.w3.org/2001/XMLSchema#” >

<!ENTITY rdfs “http://www.w3.org/2000/01/rdf-schema#” >

<!ENTITY myvoc “http://vocabularyexamole.org/int-voc#” >

<!ENTITY rdf “http://www.w3.org/1999/02/22-rdf-syntax-ns#” >

<rdf:RDF xmlns=”http://example.org/educand-and-educators/1.0#” xml:base=”http://example.org/educand-and-educators/1.0″ xmlns:rdfs=”http://www.w3.org/2000/01/rdf-schema#” xmlns:myvoc=”http://vocabularyexamole.org/int-voc#” xmlns:owl=”http://www.w3.org/2002/07/owl#” xmlns:xsd=”http://www.w3.org/2001/XMLSchema#” xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”>

<owl:Ontology rdf:about=”http://example.org/educand-and-educators/1.0″>

<rdfs:label xml:lang=”en”>Institutional ontology</rdfs:label>

<rdfs:comment rdf:datatype=”&xsd;string”>This is an institutional ontology.

</rdfs:comment>

</owl:Ontology>

<owl:ObjectProperty rdf:about=”&myvoc;hasSupervisor”>

<rdf:type rdf:resource=”&owl;InverseFunctionalProperty”/>

<rdfs:label xml:lang=”en”>has supervisor</rdfs:label>

<rdfs:range rdf:resource=”http://example.org/educand-and-educators/1.0#Faculty“/>

<rdfs:domain rdf:resource=”http://example.org/educand-and- educators/1.0#Student”/>

</owl:ObjectProperty>

<owl:ObjectProperty rdf:about=”&myvoc;isSupervisorOd”>

<rdfs:label xml:lang=”en”>is supervisor of</rdfs:label>

<rdfs:domain rdf:resource=”http://example.org/educand-and-educators/1.0#Faculty“/>

<rdfs:range rdf:resource=”http://example.org/educand-and-educators/1.0#Student“/>

</owl:ObjectProperty>

<owl:Class rdf:about=”http://example.org/educand-and-educators/1.0#AdminStaff“>

<rdfs:subClassOf rdf:resource=”http://example.org/educand-and- educators/1.0#Staff”/>

</owl:Class>

<owl:Class rdf:about=”http://example.org/educand-and- educators/1.0#AssistantProfessor”>

<rdfs:subClassOf rdf:resource=”http://example.org/educand-and- educators/1.0#Faculty”/>

</owl:Class>

<owl:Class rdf:about=”http://example.org/educand-and- educators/1.0#AssociateProfessor”>

<rdfs:subClassOf rdf:resource=”http://example.org/educand-and- educators/1.0#Faculty”/>

</owl:Class>

<owl:Class rdf:about=”http://example.org/educand-and-educators/1.0#Dean“>

<rdfs:subClassOf rdf:resource=”http://example.org/educand-and- educators/1.0#AdminStaff”/>

</owl:Class>

<owl:Class rdf:about=”http://example.org/educand-and- educators/1.0#EducantAndEducators”/>

<owl:Class rdf:about=”http://example.org/educand-and-educators/1.0#Faculty“>

<rdfs:subClassOf rdf:resource=”http://example.org/educand-and- educators/1.0#Staff”/>

</owl:Class>

<owl:Class rdf:about=”http://example.org/educand-and- educators/1.0#GraduateStudent”>

<rdfs:subClassOf rdf:resource=”http://example.org/educand-and- educators/1.0#Student”/>

</owl:Class>

<owl:Class rdf:about=”http://example.org/educand-and-educators/1.0#Professor“>

<rdfs:subClassOf rdf:resource=”http://example.org/educand-and- educators/1.0#Faculty”/>

</owl:Class>

<rdfs:subClassOf rdf:resource=”http://example.org/educand-and- educators/1.0#EducantAndEducators”/>

</owl:Class>

<owl:Class rdf:about=”http://example.org/educand-and-educators/1.0#Student“>

<rdfs:subClassOf rdf:resource=”http://example.org/educand-and- educators/1.0#EducantAndEducators”/>

</owl:Class>

<owl:Class rdf:about=”http://example.org/educand-and- educators/1.0#UndergraduateStudent”>

<rdfs:subClassOf rdf:resource=”http://example.org/educand-and- educators/1.0#Student”/>

</owl:Class>

</rdf:RDF>

3.7 Proof Layer

The idea of this layer is to provide an explanation to a user the chain of reasoning is used to derive the conclusion. In this sense, we can say that an user can be provided with the explanations for the inferred knowledge by an inference engine. The proof can be provided by tracing back the firing of rules that resulted in the assertion [6].

3.8 Trust Layer

As shown in Figure 1, at the top of the Semantic Web layer cake is the trust layer. This is an important component of any semantic applications. This layer is to earn user trust on the web applications. Here, the trust refers to the operation (i.e., the security) and the quality of information provided by the applications. The trust layer can be implemented through the use of digital signatures and other kinds of information, for instance, rating, recommendations made by trusted agents, certification agencies and/or customer bodies [18].

In summary, each layer of the Semantic Web layer cake is built on the layer below. Each layer can be developed and make operational relatively independently. As we move from bottom to the top of the layers, each layer progressively becomes specialized and also complex. Also note that to build a Semantic Web application, we need not necessarily to use all the technologies of the layered cake (Figure 1). As per the requirements of the applications, we can stop implementing at any layers. For instance, if an application requires a lightweight ontology, similar like, a thesaurus, we may need not to write the ontology using OWL, instead we may use RDFS.

4. Semantic Web and Natural language Processing

In this section we explore the possibility of using NLP technologies in Semantic Web.

As it is stated in [25] it is entirely appropriate, indeed highly desirable, to apply NLP methods to the foundations of the Semantic Web. The dream of Semantic Web soon will become true if really this happens. Dini [26] stated that NLP can help Semantic Web in two phases: in the acquisition phase (i.e., at the time of building Semantic Web), and in the retrieval phase (i.e., at the time of accessing Semantic Web). Here, the phrase at the time of building Semantic Web refers to the fact that to build Semantic Web we need very accurate tagging algorithm. The phrase at the retrieval phase refers to the fact that to query Semantic Web, NLP could help transforming semantic resources with simple but smart search interfaces.

Dini [26] stated that “Tagging has always been one of the most popular tasks in NLP experiments, and it is obviously tempting to assume that the final result of a completely tagged Web could be achieved only by applying tagging algorithms.” We can say that some progress has already been made towards this goal, although we are yet to achieve the complete accuracy. A number of papers appeared, in the recent time, focused on the possibility of automatically tagging webpages with RDF descriptions. With the advancement of automatic classification, which has already reached to a satisfying degree of accuracy, we can borrow this technology in extracting the RDF descriptions about the resources (e.g., person, organization, location, event). Nevertheless, the extraction of RDF descriptions is not enough. In context of Semantic Web, it is not enough to say that a certain webpage is about an institution. A tagging application needs to further extract and describe the resource (here, the institution) with the information available in that page, such as, the year of establishment, courses offers, location of the institute, etc. Also, to achieve the completeness, the tagging applications should also be able to gather missing information from other websites and create links with the different resources [26].

Some of the potential applications of NLP in Semantic Web are: can be applied to build knowledge bases, can be applied to construct ontology, and can be used in ontology learning. Note that the research in exploring the use of natural language processing technologies in Semantic Web is at the premature stage. Currently, lots of research is going on in this area.

5. Summary

In this module we have discussed various semantic techniques and technologies, such as, RDF, RDFS, OWL, ontology, logic, etc. The semantic techniques and technologies, as discussed, are essential for organizing, representing and retrieving the information meaningfully. The semantic representation of information allows us to infer new knowledge from the existing knowledge in the knowledge base. In this module we have also discussed the potential applications of natural language processing in Semantic Web.

6. References

1. Dutta, B., Chatterjee, U. and Madalli, Devika P. (2013). From Application Ontology to Core Ontology. In the Proceedings of International Conference on Knowledge Modelling and Knowledge Management (ICKM 2013), Bangalore, India. ISBN: 978- 93-5137-765-8.

2. Semantic Web Made Easy. http://www.w3.org/RDF/Metalog/docs/sw-easy

3. Antoniou, Grigoris and Harmelen, Frank van. A semantic web primer. London: MIT Press, 2004.

4. Dutta, B. and Prasad, A. R. D. Semantic e-learning system: theory, implementation and applications. Germany: LAP, 2013, pp. 216, ISBN 978-3-659-18318-8.

5. Berners-Lee, T., Hendler, J. and Lassila, O. (2001). The Semantic Web: a new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American. http://www.scientificamerican.com/article.cfm?id=the-semantic-web

6. Expert system. https://en.wikipedia.org/wiki/Expert_system

7. Dutta, B. (2008). Semantic Web Services: A Study of Existing Technologies, Tools and Projects. DESIDOC Journal of Library and Information Technology, 28 (3), pp. 47-55.

8. Berners-Lee, T., Connolly, D., Kagal, L., Scharf, Y. and Hendler, J. (2006). N3Logic: a logical framework for the World Wide Web. http://www.dig.csail.mit.edu/2006/Papers/TPLP/n3logic-tplp.pdf

9. Davis, J., Fensel, D. and Harmelen, Frank van. Towards the semantic web. West Sussex: John Wiley, 2003.

10. MathML. http://www.w3.org/Math/

11. Chemical Markup Language (CML). http://cml.sourceforge.net/

12. Resource Description Framework (RDF) Model and Syntax Specification: W3C Recommendation, 22 Feb. 1999. http://www.w3.org/TR/1999/REC-rdf-syntax- 19990222/#intro

13. Altheim, M., Anderson, B., Hayes, P., Menzel, C., Sowa, J. F., and Tammet, T. SCL: Simple Common Logic. http://www.ihmc.us/users/phayes/CL/SCL2004.html

14. Description Logic Handbook: Theory, Implementation and Applications. Ed. by F. Baader, D. Calvanese, D.L. McGuinness, D. Nardi, P.F. Patel-Schneider. Cambridge University Press, 2003.

15. Agarwal, S. (2007). Formal Description of Web Services for Expressive Matchmaking. Doctoral thesis. http://www.digbib.ubka.ukarlsruhe.de/volltexte/documents/2531.

16. Web Ontology Language. http://www.w3.org/2004/OWL/

17. RDF primer, 2004. http://www.w3.org/TR/REC-rdf-syntax/#richerschemas

18. Lassila, O. Towards the semantic web. http://www.w3c.rl.ac.uk/pastevents/TowardsTheSemanticWeb.pdf

19. Aristotle’s Categories, 2007. http://plato.stanford.edu/entries/aristotle-categories/

20. Giunchiglia, F., Dutta, B. and Maltese, V. (2009). Faceted lightweight ontologies. Conceptual Modeling: Foundations and Applications, Alex Borgida, Vinay Chaudhri, Paolo Giorgini and Eric Yu (Eds.), LNCS 5600 Springer.

21. Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2), pp.199–220].

22. Studer, R., Benjamins, V. R. and Fensel, D. (1998). Knowledge engineering: principles and methods. http://www.das.ufsc.br/~gb/pg-ia/KnowledgeEngineering- PrinciplesAndMethods.pdf

23. Wilks, Yorick and Brewster, Christopher (2009). Natural Language Processing as a Foundation of the Semantic Web. Foundations and Trends in Web Science, 1(3–4), 199‐327. doi: http://dx.doi.org/10.1561/180000000

24. Garbham, A. Artificial intelligence: an introduction. London: Routledge & Kegan Paul, 1988.

25. Wilks, Yorick and Brewster, Christopher (2009). Natural Language Processing as a Foundation of the Semantic Web. Foundations and Trends in Web Science, 1(3–4), 199‐327. doi: http://dx.doi.org/10.1561/180000000

26. Dini, Luca (2004). NLP technologies and the semantic web: risks, opportunities and challenges. Intelligenza Artificiale 1(1), pp. 67-71.

27. RDF 1.1 Concepts and Abstract Syntax. W3C recommendation 25 February 2014. https://www.w3.org/TR/rdf11-concepts/

28. Notation3 (N3): a readable RDF syntax. W3C Team Submission 28 March 2011. https://www.w3.org/TeamSubmission/n3/

29. RDF 1.1 N-Triples: a line-based syntax for an RDF graph. W3C Recommendation 25 February 2014. https://www.w3.org/TR/n-triples/

30. RDF 1.1 Turtle: Terse RDF Triple Language. W3C Recommendation 25 February 2014. https://www.w3.org/TR/turtle/

31. The “xml:” namespace. 26 October 2009. https://www.w3.org/XML/1998/namespace

32. Dutta, B. (2014). Symbiosis between An Ontology and Linked Data. Librarian. Vol. 21, no. 2, pp. 15-24. ISSN: 0972-3978.

33. OWL Web Ontology Language Reference. W3C Recommendation 10 February 2004. https://www.w3.org/TR/owl-ref/

34. Web Ontology Language. https://en.wikipedia.org/wiki/Web_Ontology_Language

35. Protege. http://protege.stanford.edu/

36. Dutta, B., Chatterjee, U. and Madalli, D. P. (2015). YAMO: Yet Another Methodology for Large-scale Faceted Ontology Construction. Journal of Knowledge Management. Vol. 19, no. 1, pp. 6 – 24. (Impact factor 2013: 1.257).

37. Dutta, B., Nandini, D. and Shahi, G. (2015). MOD: Metadata for Ontology Description and publication. In Proceedings of DCMI International Conference on Dublin Core and Metadata Applications (DC-2015), Sao Paulo, Brazil, 1-4 September 2015, pp. 1-9. Available at: http://dcevents.dublincore.org/IntConf/dc- 2015/schedConf/presentations