Fake news, misinformation and toxic content were certainly the central theme to this years annual one day Search Solutions conference at the BCS headquarters in London. There were many other great presentations of the day to discuss as, but did not have the essence of relevance of the central theme of the day.
The misinformation topic was introduced in the session just before lunch. Perhaps it was the hunger of speakers and audience that made the session undisputedly the most exciting. The conflicting views of one presenter and an employee of a company providing a popular search engine was just one of many highlights.
Mark Harwood of Elastic Search kicked off this session with “Tackling toxic content with the elastic stack”. He presented methods to better identify toxic and racist music online. In his example, audience members learned how Elastic search could be used to identify musicians highly likely to be associated with white supremacists. The most thought provoking component of his talk was his philosophy that platforms should not take a black and white approach to blocking potentially toxic content, but instead should take a “grey” approach to classification. It also noteworthy that his presentation was voted best of the day.
Rumor detection surrounding the topic of mental health was the focus of the talk given by Anna Kolliakou. The Pheme project to identify facts and rumors was part of her work. The work she presented was much more focused on methods from NLP and how they can be utilized to identify potential misinformation and stigma issues with mental health. Three specific examples were provided, with identification of Stigma towards mental health after the German Wings crash being one case study.
The runner up for best speaker of the day, Phil Bradley, would certainly have won “speaker to stir the pot most” if such a category existed. He rightfully pointed out that Trump is not the inventor of fake news, with examples going back to the 1400’s. He did not beat around the bush with other problems of misinformation, such as money being a great motivator to present alternative facts as well as some of the failures of search engines (e.g. misinformation for cancer cures). Slides from his presentation can be found here. I would best sum up his talk with one word, FIREY!
The discussion surrounding misinformation continued over lunch and reignited again in the Fishbowl panel discussion at the end of the day. The presenters from the morning session as well as Mevan Babakar from Full Fact and Dhruv Ghulati from FACTMATA joined together for discussion with the audience. Even thought Captain Udo brought champagne to the table about 20 minutes into the discussion, this Fishbowl session felt more intense from previous years. It is clear that many attendees and presenters have great concerns and high motivation to address the issues of misinformation and there is a growing sense of urgency to address this important problem.
BEYOND KEYWORD MATCHING: SEARCHING WITH TAXONOMIES & KNOWLEDGE GRAPHS
During this session Edgar Meij, of Bloomberg presented some excellent examples of NLP and ontology work. One example included a supply chain data graph for beer production, the other example demonstrated some of the search functionalities of their terminal product. The anecdotal information was quite interesting, for instance that some user boolean searches exceed 20000 characters in length. He also informed the audience of the annual data science research program offered through Bloomberg.
Also covered in the session the use of taxonomies at Lexis Nexus by Mark Fea. The Christmas Tree visualizations to determine effectiveness of taxonomies were quite intriguing. Also, his statement that “greater topic granularity allows us to re-configure structures to more easily meet customer needs” was thought provoking.
FROM SEARCH TO CONVERSATION
Major players in information retrieval were represented in this session.
Filip Radlinski of Google, presented an important piece of theoretical research on conversational IR from his time at Microsoft Research. His work, which was published at CHIIR 2017, provides the framework and key properties for a conversation system between a system and a user.
Fabrizio Silvestri of Facebook (From the query alteration team in London) and formerly from Yahoo Research, presented some examples of challenges with search in the domain of social media. Specifically he focused on elements of query re-writing, spelling correction and ranking of results.
BEYOND WEB SEARCH
The last main session of the day demonstrated research topics in IR beyond textual data.
From Microsoft and Bing, Nicola Cancedda presented methods used to help users identify what attachments should be included in an email response. TF-IDF and neural networks are both being used in the solution to address the problem of time constraints and allow users to more quickly find attachments to send in a response. Also important in the presentation was how to build a model, while adhering privacy standards for user emails.
Mark Stanger presented the work from a the DataSearch project at Elsevier. The engine allows researchers to find research data from past publications. This functionality is definitely important for reproducibility matters as well as new research. The project makes use of Lucene technology. Also noteworthy is the use of word embeddings to find similar search terms.
LAST BUT NOT LEAST
As mentioned, Mark Harwood won the best speaker award. Elsevier DataSearch, was the winner of the best search project award. And the best startup of the year was awarded to Search|hub.io.
The event closed with informal chat and drinks in the BCS foyer. See you next year!