The semantic web

The World Wide Web (or WWW) displays "web pages" (digital documents), organized around a home page, playing a central point in navigation using hyperlinks and articulated around a structure homepage called "website".

The central idea of the Semantic Web Initiative is to make the meaning (signified) of web content accessible and processable by the machine. The Semantic Web creates machine-interpretable languages based on cognitive science. There is a reduction in the distance between a human approach and a machine approach with the latter having the capacity to process millions of data in record time. This allows the development of sophisticated tools and systems that can provide much higher functionality for supporting human activities on the Web.

The Semantic Web relies on the combination of the following technologies:

  • Explicit metadata: they allow web pages to carry both representation (signifier) and meaning (signified).
  • Ontologies: a description of the main concepts of a domain and their relationships.
  • Logical Reasoning: Draw conclusions from the combination of (meta)data and ontologies.

HTML: HyperText Mark-up Language

HTML is a "mark-up" language whose role is to formalize the representation of a document containing formatting “tags” andwhich is a subset of the Standard Generalized Markup Language. This language is based on a set of predefined tags, which control the appearance of a web page (such as lists in bold, italic, numbered or not, line breaks, etc.) and links that he establishes with other documents.

The HTML language allows the reading of documents on the Internet from different machines, thanks to the HTTP protocol, allowing access via the network to documents indexed by a unique address, called URL (Unique Resource Locator).

XML: Extensible Mark-up Language

XML is a domain-independent tag meta-language (used to define a mark-up language). XML allows users to define their own tags defining the structure of a web page, which can be processed by a computer machine. XML tags do not describe the appearance of web pages (see HTML)! XML separates content from formatting, a useful property for defining different representations and views from identical data on different devices.

XML actually comprises a family of languages that support various activities around the core language:

  • DTDs (Document Type Definition) and XML Schema: two languages that allow the user to define his own vocabulary. 
  • XPath: a language supporting access to parts of XML documents. Access is the necessary prerequisite for querying XML documents. 
  • XQuery: a query language for XML. 
  • XSLT: a language defining transformations from XML to HTML, or between XML representations. Thus XSLT is a key tool for the syntactic manipulation of XML documents. 

In the design of the Semantic Web, XML provides the basic layer for syntactic manipulation. While XML is a universal language for defining mark-up, it does not provide with any means of talking about the semantics (meaning) of data. For example, there is no intended meaning (signified) associated with the nesting of tags; it is up to each application to generate a representation/meaning association for importing or exporting data.

RDFResource Description Format

RDF and RDF Schema provide the basic core languages for the Semantic Web.

RDF is a language for describing resources with a XML syntax. Its basic building block is a statement, a triple consisting of an Entity (called resource in Web terminology), a Property, and a Value (which may be another resource). Essentially, a statement is a fact P(a,b) where P is a binary property, and a,b are resources. In the Semantic Web design, RDF defines a layer residing on top of XML. 

RDF is domain-independent in that no it makes no assumption about a particular domain of use. It is up to the user to define her own terminology in a schema language called RDF Schema which constitutes a primitive ontology language offering the following features:

  • Organisation of objects in Classes 
  • Subclass - sub-properties - relationships. 
  • Domain/range and restrictions on properties. 

The expressive power of RDF and RDF Schema is deliberately very limited: RDF is (roughly) limited to binary ground predicates, and RDF Schema is (roughly) limited to subclass and sub-property hierarchies, with domain and range restrictions of properties.

OWL: Ontology Web Language

A class declares properties common to a set of objects, attributes representing the state of objects and methods representing their behavior. It appears as a mold or as a manufactory from which it is possible to create other objects; in this case it is an instance of a class (creation of an object having the properties of the class).

There are a number of characteristic user-cases of the Semantic Web that require more expressiveness. Such extensions include:

  • Disjointness of classes
  • Boolean combinations of classes
  • Cardinality restrictions
  • Special characteristics of properties
  • Local scope of properties: rdf:rangedefines the range of a property. But sometimes we may want to restrict the range, depending on the class. 

OWL (Web ontology language) is based on top RDF/S, and seeks to find a balance between expressive power and efficient reasoning support. Reasoning is important because it allows one to: 

(a)check for consistency of an Ontology and the knowledge; 

(b)check for unintended relationships between classes and 

(c)automatically classify instances in classes.

Logic

The formal foundation of the OWL language is a branch of knowledge representation and reasoning called “description logics”. While this foundation is promising, there is a different approach to representation and reasoning based on rules. It’s main advantages are: 

  • Rule engines exist and are quite powerful. 
  • Rules are well known and used in mainstream IT, and is easier for users to learn. 

Rule systems can be seen as an extension, or as an alternative to OWL. The first idea is driving current research attempting to integrate description logics and rules, while maintaining somewhat efficient reasoning support. The latter idea studies the use of RDF/S in conjunction with rules as the basis of an alternative Web ontology language.

Apart from classical rule systems, it is interesting to consider systems that can deal with contradictory conclusions. Such systems are interesting for modelling default inheritance and rules with exceptions. They are also very useful for knowledge integration, where inconsistences can naturally occur when knowledge from different sources is put together.

Ontology

Ontology is a conceptual model that can represent various projects in terms of hierarchy of tasks, products, contributors, roles and rights. It allows to build custom access and views on the information of a project and that for each member acting to the implementation of the project.

Having identified objects and explicit relationships allows the automatic reconciliation of distant information produced by others, facilitating the enrichment, search and processing of information.

It is therefore essential:

  • to design documents that have information in a form accessible by the machine,
  • and keep links created with relationship values.

It is only when this semantic level exists that it becomes possible to use the computational power of the computer to help the user to exploit the information to a greater extent than mere reading.