The MindTheGap 2014 full day workshop, held 4 March in conjunction with the iConference at Germany’s Humboldt-Universität zu Berlin, aimed to bring together researchers from different domains (including Information Systems, Information Retrieval, Natural Language Processing and Recommender Systems) to discuss the idea of going beyond search as a “single shot”, i.e., an isolated single query, and move towards more user based and personalised models. The strong program committee selected a broad range of papers and key note speakers making for an interesting day, all whilst overlooking the impressive grounds of the University and Berlin city Centre.
The first session, chaired by Udo Kruschwitz, focused on interactive IR. The first key note speaker, Nick Belkin (Rutgers), gave the packed room an overview of IR and introduced the concept of interactive elements, where the user manipulates data objects created by the IR model. He outlined how the “single shot” query should be extended to whole session evaluation. Nick raised the issue of how to evaluate IR systems, going beyond the traditional use of relevance as a measure and suggesting “usefulness to the user” may be more appropriate in some cases. This was one of the key themes discussed throughout the day. Following on from this, Orland Hoeber, presented his work on visual analytics, combining data processing and machine learning with HCI to support search visualisation in complicated query result sets. 3 demonstrations were briefly presented: Hotmap (web search); CIDER (image search) and VISTA (Twitter).
After vibrant discussions in the coffee break the main paper session, chaired by Frank Hopfgartner, began with Ying-Hsang Liu presenting a proposed project for lifelogging using fitbit.com for use in information retrieval in the medical domain and new information seeking contexts. This raised the second big issue for discussion at the workshop: how data is shared and what needs anonymising to protect user’s privacy. Anastasia Giachanou presented her work in the domain of patent classification and search, introducing the Multilayer Collection Selection (MSC) algorithm which showed improved performance over an existing system. Her work also highlighted that users (in this case patent classification experts) have different search term tactics to complete their task, using keyword searches preferentially to classification code searches. Timo Lüddecke presented novel work on the design of search input (using a linear scale to weight keywords) and result set display (extracting the colour scheme and keywords from the destination pages). Issues were raised about scalability and compatibility with existing search APIs and the working demos provoked interesting discussion about how (and if) established interfaces should be changed to improve the user experience. The final talk of the morning session was from Jon Chamberlain, presenting analysis of user response times from the game-with-a-purpose Phrase Detectives in order to highlight 3 stages of human cognition in task processing. The closing discussions of this session focused on how performance indicators could be used in an IR setting to understand more about the users from log data. Nick Belkin added that he was working on more detailed analysis of user behaviour using eye tracking to look at the cognition process in more detail.
Unfortunately Cathal Gurrin was not available to chair the afternoon session so Udo stepped in and introduced the second key note speaker Miguel Martinez-Alvarez, a KTP Associate from the University of Essex working at Signal, a London-based start-up specialising in personalised news filtering. As one of the few attendees representing industry, Miguel provided valuable insight into how academia, which is primarily driven by research questions and organised by discipline, could work closer with industry that is more driven by commercial opportunities centred around solving a problem. Issues of data and knowledge sharing were again raised during the discussion.
Following on from Miguel’s talk, the focus of the workshop changed to archive search. Steffen Hennicke reported on his work on modelling patterns of user request behaviour for German archival records with a goal to create an Archival Knowledge Model (AKM) of search patterns and retrieved documents. In a related talk Bryan Jurish presented work looking at resolving problems in pre-processing historical texts when there is a high variance of graphemic forms. Both talks generated discussion about how to manage historical data and how interesting features such as accent or regionalism could be accessed in the future.
In the final session of the day, Toine Bogers (Aalborg University) presented his keynote talk on the similarity between search and recommender systems, again raising the issue of using different measures of performance than relevance, such as usefulness and interestingness. Toine proposed that future research could focus on both areas, with a term “focused recommendation”, where a search query and recommendation are combined. To clarify he showed examples such as when a user posts a message in a forum like “Can someone recommend me interesting books about X?” Toine explicitly showed how both research areas can be used to support the user in this type of query, and also that this type of query is not uncommon in certain domains. Toine’s talk generated enthusiastic discussion and was voted “Best presentation” by the audience by a considerable margin. The final talk by Jing Yuan presented detailed demographic data from surveys of IP-based TV services that use recommendations at different times of the day to match the user’s changing mood. Jing’s paper won the “Best paper” award for the workshop, based on the score from the reviewers.
The concluding part of the workshop was a lively “fish bowl” discussion where different members of the audience could join a panel to discuss the occurring themes of the workshop:
- Data sharing and privacy. When going beyond the single shot query and using more user data, issues of privacy and data sharing become important. Data sharing is a significant barrier to industry working closer with academia and was an issue that was discussed often during the day. Program Committee member Birger Larsen suggested one way to overcome this could be to store data in the cloud and allow researchers to query it without being able to download the entire dataset. In this manner privacy could be protected. He suggested www.visceral.eu as a tool to do this. Jaap Kamps joined the call for researchers to make data available in whatever way they can.
- Research directions. Miguel raised the issue of how best to move forward with ideas that were discussed at the workshop and there was agreement that an IR/RecSys task would be helpful, with suggested conferences being HCIR, IIIX, COLING, or ACL. Udo mentioned that there was a CLEF 2014 news filtering task (NEWSREEL) that might also be worth considering as a way to test new ideas.
- Continuation of the workshop. It was generally agreed that the workshop was useful in bringing researchers together and that it should continue as a workshop at multidisciplinary conferences such as iConference.
After the awards were presented the workshop attendees were treated to some German hospitality, first at the iConference drinks reception then afterwards with a tour from Udo of some of Berlin’s finest bars and eateries.