Make Machine Accessible Information

In the coming years we will witness a revolution in the ability of machines to access, reprocess and use information. This revolution is essentially due to three trends related to the Semantic Web: Web Data, Web Services and Web Identity. These webs are designed to make available, accessible and usable knowledge about data semantics, semantic services and semantic knowledge about the individuals. In this article we will explore the first of these 3 Webs, Web Data, and how to make information accessible by machines will transform the way of finding information.
Web Data

The amount of information and services available is increasing exponentially. Every day it becomes increasingly difficult to find what we’re trying to look for. The problem is that we must learn to tell the machines what we want. Why a machine cannot understand what site, what recent tweet, which photo on Flickr, what message on Facebook, or which restaurant we seek on the Internet?

Because she cannot. She does not understand and do not have access to most sources. It lacks the semantic understanding and common sense to build bridges between information.

It is essential that the host can access a higher level of understanding. Instead of doing statistical analysis on the correlation between research and a document, a machine must literally be able to understand. Therefore, databases of knowledge are necessary to ensure that we are talking about the same entity. Examples of these databases are:

* An encyclopedia containing information for understanding the meaning and the semantic context of a particular term. For example, understanding that Berlin is a city, how many people live in, and where it is located.
* The yellow pages or a set of services to obtain more complex information that changes regularly. For example, the road between Berlin and Porto in the car, the current temperature in degrees Celsius for Porto.
* A database for people to have access, with a rule set of permissions, information of a person who could help improve the personalization and recommendation.

Web Data

The idea of Web data is derived from the Semantic Web. People trying to solve the problem of the inherent inability of the machines include a web page. Initially, the goal of the Semantic Web was so invisible to annotate web pages with a set of metadata attributes and categories to enable machines to interpret the text and put it in context. This approach has not worked because it was too complicated to be implemented by individuals without technical expertise. Similar approaches, like microformats, simplify the tagging process and help cope with this problem.

These approaches have in common effort to improve the accessibility of machines to knowledge in web pages that were designed to be viewed by men. On the other hand, these sites contain lots of information that are not relevant for machinery which must be filtered. What we need is a database meant to be searchable by machines, that is to say, stripped of irrelevant information. But beware! Who said that the machines and we humans had to share a single web?

The idea of Web data has therefore emerged to circumvent the problems caused by this limitation and the existence of a structured database colossal distributed worldwide and contains all types of information. These data are owned by companies that are opening more and more. Typically, a database contains information about a particular area, such as books, music, encyclopedic data, companies etc … If these data were interconnected (that is to say, pointing to each other as the Web sites do), a machine could circulate in the web of data “silently” and structured information to gather semantic knowledge about any entity or domain. The result of such an approach could be a gigantic database, totally free, which could be the basis for a new generation of applications and services.

Linking Open Data

The project Linking Open Data (LOD or Open Data Link) supported ar W3C is a promising approach. The picture above shows all the databases participating in the project. The data sets are made to reuse existing ontologies such as WordNet, FOAF and SKOS and interconnect them.

The data sets all offer access their database and point to entries in other data sets. The project follows the basic principles governing the World Wide Web: simplicity, tolerance, modular design and decentralization. The project now has LOD over 2 billion RDF triples, which is a lot of information (a triplet is a brick of information consisting of a subject, a predicate and an object which can represent the properties of an object or its relations with other subjects). In addition, the number of data sets involved in the project grows very fast. It can access data sets through various means: for example, via a semantic web browser, or by being indexed by search engines semantic.

For a brief overview of Web Data, you can click the following links:

* The company Yahoo on CrunchBase,
* The city of Berlin or play Tetris on DBpedia,
* The book iPhone: The Missing Manual on O’Reilly Media

With all the data on the Web of data, knowledge that goes from very general to very specific is available to machines that enable the emergence of a new generation of services. Queries become very sophisticated machine-understandable and accessible to the next generation of search engines.