Conference Review: ECIR 2012 Industry Day

Three speakers, one session: getting ready for Industry Day

Three speakers, one session: getting ready for Industry Day

The annual BCS IRSG European Conference on Information Retrieval (ECIR 2012) was held in Barcelona, Spain, this year. The 34th edition of the conference has been organized by Yahoo! Research, Universitat Pompeu Fabra and Barcelona Media, supported by BCS IRSG and ACM SIGIR. The conference represented a valuable moment for researchers to share their results. Very interesting papers have been presented during the three days of the conference.

April, 5th was dedicated to ECIR Industry Day. The event has been organized by Fabrizio Silvestri (ISTI-CNR, Pisa, Italy) and Ronny Lempel (Yahoo!, Haifa, Israel). The program of the day has been structured in three different sessions: morning, noon, and afternoon session including concluding remarks from the PC chairs. Each session started with an invited talk and continued with up to three oral presentations. Each talk has been scheduled in a 30 minutes time slot (including Q&A after each talk). In total, nine different speakers presented their research targeting their presentations on how they contextualize and trim their efforts within an industry experience. About sixty people attended this year’s Industry Day, thus confirming that it constitutes a successful and fruitful moment for joining both academy and industry research efforts.

Arjen De Vries (Delft University of Technology/Spinque, Netherlands) opened the day with his keynote “How to build the next 1000 search engines?!”. The talk showed how Spinque, a startup from the Netherlands, develops a technology able to easily “instantiate” different search engines. They aim at answering the many possible needs that different companies have with respect to search. In particular, they propose a new “search by strategy” paradigm that aims at graphically solving the need of search on heterogeneous data sources (Web, news, patents, twitter, personal information). Demos of different search engines instantiated over different search domains convinced the audience about the powerfulness of the approach.

The morning session continued with three oral presentations. The first one “Usefulness of Sentiment Analysis” by Jussy Karlgren introduced the vision of Gavagai, a startup company from Sweden (and subsidiary of SICS). Starting from the observation that “big data” changes everything, the mission of Gavagai is to develop automated and scalable methods for retrieving actionable intelligence from dynamic text streams. He described the possible uses of such a system ranging from supporting customer purchase decisions to real-time security over Twitter. Rather than focussing too much on what Gavagai in particular is doing, Jussi actually gave a broad overview of sentiment analysis as a field that is still emerging. One of the key points was that seeing sentiment analysis as a technique that classifies items as positive, negative or neutral is far too simplistic for realistic applications.

The second presentation made by Amit Moran from Relegence/AOL (Israel), demonstrated the effort of AOL on developing sentiment analysis tools on Twitter. By using two interesting examples (political analysis on current US election campaigns and movie ratings), Amit nicely illustrated how sentiment analysis could represent an appealing market analysis tool for companies that want to have a real-time feedback on their products and thus, an important opportunity for companies offering this type of service on the Web.

Rianne Kaptein from Oxyme (a marketing company in the Netherlands), presented TuneIn, a search engine enabling powerful searches on Twitter. TuneIn is able to provide a very broad range of information associated with a given search (e.g., most active users on a topic, terms related and widely used for a topic, etc.). The philosophy inspiring Oxyme is to tackle the task by helping the user with an easy and appealing visualization of the information. The staff of Oxyme solve this issue in a clever way by developing smart interfaces enriched with visual analytics tools like, for example, bi-dimensional tag clouds (i.e., tag clouds using different colors associated with words for representing a second dimension of information like, as in this case, its freshness).

Just before the coffee break

After a short coffee break, the noon session started with the invited talk of Alexandros Karatzoglou from Telefonica R&D (Spain). He presented “Adding Context to Recommendations” in which he showed a collaborative filtering method (based on tensor factorization) that allows for a flexible and generic integration of contextual information by modeling the data as a User-Item-Context tensor instead of the traditional approach, based on User-Item matrices.

Hugo Zaragoza from WebSays (Spain) started off by giving the company’s vision as making technology so cheap that anybody can do text analytics. In his talk on “Developing the Websays Text Analytics system for Discourse and Opinion Mining” he then went on to present a platform enabling the possibility for tracking in a complete real-time fashion, what users say of a given product. Starting from a real-time crawling of social media, blogs, news, forums, etc., the system is able to detect and control opinions, mentions, announcements, crises, etc. The user-friendly interface (that makes use of a dashboard like Google Analytics) shows to users statistics of trends by means of indicators, topic clouds, charts, etc. Active learning and semi-supervised learning are considered some of the main methods to push this forward. An interesting point brought up by Hugo was that performance measures in academia have not much to do with the real world.

Sivan Ravid from Relegence/AOL (Israel) did the last talk of the morning session. In her talk, “Analyzing the Web: Building AOL’s Core Text Analysis Platform” she demonstrated the research challenges addressed in building the text analysis platform used by her company. Some interesting aspects of their approach are the use of an underlying ontology with more than 4 million entities and the fact that they combine machine learning and rule-based approaches. Also, the philosophy is that if your technology is not sure enough about something, then it’s better not to guess.

Paolo Ferragina opened the afternoon session with his invited talk on “Topic-based annotations of short texts, with applications”. He interestingly described the model and the techniques that are used behind TagMe, a powerful tool that is able to identify meaningful short-phrases on-the-fly in unstructured text (the TagMe project has actually won a Google Faculty Award in 2010). He also illustrated different applications of TagMe, ranging from online clustering of Web search results to news categorization. He definitely convinced the audience about the effectiveness and the efficiency of the approach proposed.

Ravi Chandra Jammalamadaka from eBay (U.S.A.) gave the last talk of the day, “Synonym Mining and Usage in an Ecommerce Search Engine”. He described the efforts of the eBay staff in developing systems aiming at mining synonyms and how to use them in their real-world ecommerce search engine. In particular, he discussed the interesting challenges in product search for synonym mining and query disambiguation. Trying out a whole range of methods utilizing both log data as well as titles (e.g. to identify acronyms) he concluded that Mutual Information (applied to large texts) is the best metric to do this task.

An interesting point highlighted by all speakers throughout the day regards the difficulties in managing and processing the stream of dynamic data coming from the Web. In particular, the speakers agreed on the need of a huge amount of editorial work associated with their proposed techniques as a way to effectively increase the overall results of their platforms.

The concluding remarks from the PC chairs condensed the activities of the day around two golden keywords: sentiment analysis and social networks. The majority of the research presented during the day, in fact, exploits social networking platforms in one way or another (in particular, Twitter) for many different commercial ideas as they constitute the richest sources of real-time, most dynamic and vast human traces ever built.

Finally, the PC chairs closed the Industry Day by highlighting the atmosphere we were breathing in the conference room. They pointed out the high-quality work presented and the valuable opportunity that this year’s Industry Day @ ECIR constituted as a moment for making industry and academia talk each other. Congratulations to the PC chairs for organizing such an interesting and appealing event.

About Franco Maria Nardini
Franco Maria Nardini

Franco Maria Nardini is currently a research fellow at ISTI-CNR in Pisa. He received his Ph.D. from the Department of Information Engineering of the University of Pisa in 2011 discussing his thesis "Query Log Mining to Enhance User Experience in Search Engines". His research interests are mainly focused on Information retrieval and Web mining. His work is focused on developing techniques for extracting and exploiting valuable knowledge from the behavior of Web search engine users, and on the use of those techniques within efficient and effective solutions to increase the user experience in Web search engines.

Leave a Reply

You must be logged in to post a comment.