(This article was co-authored with Sascha Kriewel, University of Duisburg-Essen.)
The Information Retrieval group at the University of Duisburg-Essen is part of the Computer Science and Applied Cognitive Science Department in the Faculty of Engineering Sciences at the University of Duisburg-Essen. The department provides degree programmes in Applied Computer Science as well as in Applied Cognitive and Media Science, both at the bachelor and master level. Besides 11 computer science professors, there are also four professors of psychology, who are mainly teaching in the second program. This unique composition also leads to a user-oriented focus in the computer science programme.
The Information Retrieval group originated from the University of Dortmund in 1991, and moved to Duisburg in 2002, when Norbert Fuhr changed positions. While our work originally focused on the system-oriented view of IR, we are now mainly working on user-oriented approaches, without neglecting theoretic models as well as system building.
In the following, we describe four of our current research areas.
ezDL is an IR front-end system that connects to various kinds of IR engines. Its predecessor DAFFODIL started as a user-friendly federated search engine implementing several cognitive IR concepts. ezDL is a complete redesign, aiming at three different goals:
- ezDL is an open-source interactive search tool implementing state-of-the art concepts in cognitive IR and user interface design. It comprises functions for meta-search in various sources or digital libraries, organizing and filtering of merged results, support for search sessions as well as a personal library for storing different types of documents.
- ezDL is a development platform for interactive IR systems. Due to its service-oriented architecture. existing interface components can be changed, replaced or new ones added – without changing the rest of the system.
- ezDL is an evaluation system for interactive IR, by including extensive logging and eye-tracking support. Furthermore, there is a component for controlling evaluation settings, which schedules online questionnaires and tasks to be solved with ezDL
The ezDL front-end comes in various flavours – as desktop client, browser client or smartphone app. ezDL has been used as a system in several research projects, also by other research groups.
Khresmoi (Knowledge Helper for Medical and Other Information users)
The group is participating in the European Khresmoi project which is aimed at developing a multilingual multimodal search and access system for biomedical information and documents. Within the project we focus on creating user interfaces for information retrieval that don’t follow a one-size-fits-all approach, but instead are targeted at specific user groups or support specific tasks – while still using the same infrastructure. While a typical lay person searching for health information might be presented a streamlined, Google-like interface, medical specialists or general practitioners are offered a user-interface offering many functionalities of DL search systems. Radiologists searching for medical images on the other hand are used to working with radiology viewing stations and the interface for them reflects this and their typical workflow.
Quantitative Modelling of Interactive Retrieval
The Interactive Probability Ranking Principle regards interactive IR as a sequence of situations, which are modelled as lists of choices. The user sequentially evaluates the proposed choices, each requiring a certain effort and promising a specific benefit, and the first accepted choice moves her to a new situation. In order to quantify efforts and benefits, we are analysing search sessions by considering system logs as well as eye-tracking data. The result of this analysis can be represented as Markov models. These models allow e.g. for predicting the time for finding the next relevant item in a given situation, thus suggesting the best strategy to reach this goal. Currently,we are working on the extension of this model in order to estimate the quality wrt. time-based evaluation measures, as well as for predicting the effect of certain system improvements on system performance.
Towards Better Document Clustering: Proving the Cluster Hypothesis
Apart from the clustering algorithms themselves, document clustering has been mainly a field of heuristic approaches in the past. With the Optimum Clustering Framework, there is not only the first solid theoretic foundation for document clustering, we also have proven the clustering hypothesis. The core idea is to redefine document similarity such that documents are similar when they are relevant to the same queries. Thus, clustering is always performed wrt. a set of queries. After introducing two new cluster quality metrics, we can define optimum clustering as the set of Pareto optima wrt. these two metrics, thus proving the cluster hypothesis.
Besides the set of queries, clustering methods following the OCF also need a probabilistic retrieval method (in order to compensate for the lack of relevance judgments) and a document similarity metric. We can show that existing document clustering methods are implicitly based on these three components, but use heuristic design decisions for most of them. Furthermore, the fusion methods group average, min-cut and k-means clustering can be can be shown to be greedy strategies for approaching the Pareto optima mentioned above.