Book Review: Multilingual Information Retrieval – From Research to Practice

Multilingual Information Retrieval – From Research to Practice by Carol Peters, Martin Braschler, Paul Clough

ISBN: 978-3-642-23007-3

The fundamental concept of Multilingual Information Retrieval is computer usage aimed at surmounting language boundaries both for information in the WWW and for many other purposes, such as military intelligence or defense, international trading, inventions or international relations between countries, not to mention the most common use – human communication. In 2004 the highest number of candidate countries ever joined the EU – since then on the strong necessity to translate all the official documents into other languages has resulted in a requirement to develop new technologies for automatic search, translation and in some cases information summarisation.

In 2002 American Text Retrieval Conference (TREC), an on- going series of workshops co-sponsored by the National Institute of Standards and Technology (NIST) and the U.S. Department of Defense (the only institution at that time in the world developing the research in multilingual, or rather American-Chinese information systems) has turned the priority over the multilingual tracks to CLEF (Cross-Language Evaluation Forum) headed by one of the book authors – Carol Peters and supported by the European Commission. Thus, in this book the Reader may find a lot of information from experience of running such large evaluation campaigns.

For  xemple, During the SIGIR ’96 workshop on Cross-Language Information Retrieval (CLIR) a discussion took place on introducing a technical term that best describes the field. At the same time American Defense Advanced Research Agency (DARPA) was proposing “translingual” which, as of yet, has not become the standard in literature [Douglas W. Oard, “Alternative Approaches for Cross-Language Text Retrieval,” in AAAI Symposium on Cross-Language Text and Speech Retrieval, pp. 131-139, Palo Alto CA, 1997]. Multilingual, as an alternative, seemed too broad to distinguish IR systems with translation component from those in which the queries are retrieved in any target language.

The book contents comprise six chapters that follow a conference paper structure. Personally, I find the most interesting chapter 1 that defines CLIR and presents its history in brief, in particular conferences and institutions that started the research. The nature of chapter 2 is more technical as it provides step-by-step basics about the process of monolingual IR. Students can learn about indexing and matching phases. Chapter 3 introduces the most common approaches to CLIR indicating divergence of languages, translation models and language ambiguity. The next chapter deals with human- computer interaction in terms of multilingual interface design. Chapter 5 is devoted to evaluation seen from user and system perspectives. Obviously, the Authors couldn’t stop to kill two birds with one stone promoting the multilingual CLEF tracks and sharing the results from the campaigns’ Working notes. The evaluation metrics stimulate developers to improve their own IR systems encouraging them simultaneously to participate in a campaign in order to compete with numerous other research groups from around the world. The last chapter concludes with two aspects that cannot be missed to give the complete overview of information retrieval; non-textual information like image, speech and video retrieval to move on the Reader to practical implementation of multilingual systems. Here presented are Web Search, Digital Libraries and the branches that rely on multilingual systems mainly like healthcare, government, law, business and commerce.

Having the book read I do agree with the Authors saying “The book is intended for graduate students, scholars and practitioners with a basic understanding of classical text retrieval methods.” Therefore, I recommend it to academia as a resource providing background knowledge in multilingual information retrieval.



About Jolanta Pietraszko
Jolanta Pietraszko

Jolanta Mizera-Pietraszko is a Ph.D. Fellow at Institute of Informatics, Faculty of Computer Science and Management, Wroclaw University of Technology, Poland. Her research interests are mainly focused on multilingual search engines, parallel languages, bi-text processing, bilingual question answering systems and multilingual digital libraries. She invented an innovative language and system independent asymmetric translation technology entitled An Approach to Analysis of Machine Translation Precision by Using Language Pair Phenomena, Invention number P387576 registered on 23.03.2009 by the Patent Office of the Republic of Poland. She in an FP7 Expert for the European Commission in Brussels, an Expert in R&D projects for the Ministry of Science, an Evaluator of English Course books for the Ministry of Education. She gave a tutorial on Translation Component as an Impact Factor on the Retrieval Results at Ionian University, Corfu, Greece, an Association of Computing Machinery Board Member for Computing Reviews, US. She reviews up-to-date software release and books for the British Computer Society, London, UK. She is an IEEE Technical Committee on Digital Libraries Fellow. She has been invited to serve on International Program Committees of the conferences in the UK, Czech Republic, India and Poland. Her projects have achieved recognition from the university, the European Union, foreign scientific institutions and the Polish Ministry of Science.