Book Review: Semantic Web Information Management

Semantic Web Information Management, A Model-Based Perspective by Roberto De Virgilio, Fausto Giunchiglia and Letizia Tanca (Eds).

ISBN 978-3-642-04328-4

Be it from Twitter, blogs, newspapers, scientific articles, the Large Hadron Collider or from high-throughput sequencers, we live in a world where the volume of digital data is ever-increasing. These documents and data may available through the World Wide Web. However, interpreting, processing and integrating them are time-consuming and not easily automatable tasks. These tasks convert data into information and deal with the semantic heterogeneity, i.e. the different ways of expressing the same information. The Semantic Web is an extension of the Web designed to tackle these issues and to make the information processable by computers.

The book Semantic Web Information Management, A Model-Based Perspective — edited by De Virgilio, Giunchiglia and Tanca — explores the topics of storage, reasoning, and querying, as well as applications and engineering methods for semantic web systems. Each chapter is authored by specialists, reaching a total of 63 contributors. The 3 initial chapters present an overview of the book, the main concepts on data management and the hierarchy of languages in the semantic web (XML, RDF, RDFS and OWL). The rest of the total 22 chapters are divided into 5 sections and are independent accounts of the topics they discuss. Each of them introduces the topic and its issues, presents a brief survey of related work and then engages into the methodology proposed by the authors. The reader might find that there is no unified terminology across the chapters, but they reflect the variety of vocabularies in the research community.

The book is targeted to both researchers and practitioners of the semantic web, as some chapters focus on theoretical approaches but others look into applications. The book is useful for newcomers to the field, as well as for experienced researchers, who want to review or explore recent approaches. The bibliography will serve as a source for relevant works to deepen the investigations. However, the book seems unsuitable for the lay reader as it includes the approaches’ formal foundations and extensive technical details.

A common theme across the chapters is the synergy between databases and the semantic web. Relational database management systems are mature software implementations, designed to store large amounts of data and to offer an efficient query interface. On the other hand, the semantic web languages have been designed to represent domain knowledge and annotations, allowing for a machine-processable and meaning-focus view of the data. These two data management approaches have complementary strengths and weaknesses, and this book promotes the interaction between the two communities in order to provide an efficient information management solution.

Information management refers to organizing the data and controlling its structure, processing and delivery. It relies on a model-based perspective in both databases and the semantic web. Good modeling is recognised as crucial to achieve efficient data representations and reasoning capability, and this book emphasises this fact from its subtitle onwards.

The section on storage presents and compares several RDF storage schemes based on relational databases. Then, an efficient model-based approach is described, independent of the representation language used. Finally, a chapter looks into the design and architectural components of a semantic search infrastructure,  Sindice.com, which deals with data at the terabyte scale.

The sections on reasoning and querying are the most extensive in the book, reflecting the significance of these two subjects. After introducing the main reasoning techniques, they are analysed in the context of large and distributed ontologies. Three chapters follow with applications of ontology reasoning: matching, mapping revision and a temporal extension to the OWL language. In the case of querying, the chapters examine its relationship with reasoning, and span from an analysis of query answering over databases and ontologies to a detailed description of the SPARQL query language and its formal semantics, a scheme to improve its performance, an extension with rules and quantification, and a benchmarking platform.

In the part on applications, the selection of topics cannot be comprehensive. The most important and domain-independent application included is data integration. The corresponding chapter shows a general framework for integrating data and includes the formal foundation for the design of OWL2 QL, a subset of OWL with a good balance between expressive power and computational complexity of reasoning.

The final chapters refer to design and engineering methods for semantic web systems, which are, by definition, distributed and heterogeneous. Again, this is too broad a topic to be covered comprehensively in 3 chapters. However, the selection involves a model-independent approach to interoperability, a storage solution for RDF based on a relational engine and a RDF(S)-based modeling approach. Interoperability, storage and modeling are three valuable issues that the semantic web helps resolving in an elegant way.

Overall, the book is an excellent resource for researchers and practitioners, which presents and analyses the challenges and recent techniques on semantic information management.

—-

Review updated : 21 March 2012.

About Alejandra Gonzalez-Beltran
Alejandra Gonzalez-Beltran

Alejandra Gonzalez-Beltran is a Senior Research Associate on Computational and Systems Medicine at University College London, UK. She holds a PhD from Queen’s University Belfast, UK, and a Licentiateship (equivalent to MSc) in Computer Science from Universidad Nacional de Rosario, Argentina. Her research interests are on data and metadata management for large-scale distributed systems, including knowledge representation and federated ontology-based queries for biomedical applications.