The 11th Bibliometric-enhanced Information Retrieval Workshop (BIR 2021) was held online on April 1st, 2021, at ECIR 2021 as virtual event only for the 2nd time due to the pandemic situation. As before, the aim of the interdisciplinary BIR workshop series is to bring together researchers from different communities, especially scientometrics/bibliometrics and information retrieval. In doing so, BIR has a long-established tradition. BIR was launched at ECIR in 2014 and was held at ECIR each year since then. All pointers to past and future workshops as well as to proceedings are hosted at https://sites.google.com/view/bir-ws/. BIR 2021 was organised by Ingo Frommholz (University of Wolverhampton, UK), Philipp Mayr (GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany), Guillaume Cabanac (University of Toulouse, France) and Suzan Verberne (Leiden University, the Netherlands).
The workshop attracted around 57 participants at peak times but a larger number throughout due to participants dropping in and out. This demonstrates that BIR covers a vivid and highly relevant area, not least triggered by the current pandemic and its impact on publishing, for instance due to the need to discover high quality research results quickly, which is a classical IR problem. Most of the talks have been recorded; the recordings are available in the BIR 2021 YouTube playlist.
This year five papers were accepted as full papers and four papers as short papers. Both long and short papers have been scheduled for presentation during the workshop and are included in the CEUR-WS proceedings. In addition, the workshop featured three keynote talks:
Ludo Waltman (CWTS, the Netherlands) addressed openness, transparency, and inclusivity in science, and in particular the question: what does it mean for information retrieval? This was nicely related to the panel about open access during ECIR the day before. Ludo discussed the ways in which research is moving in a direction of increased openness, transparency, and inclusivity and the new possibilities this offers for scholarly literature search. Calls for increased transparency and inclusivity raise complex questions about the responsibilities of those who manage search systems for scholarly literature and about the benefits as well as the risks of new AI-based approaches to scholarly literature search. While acknowledging that there are no easy answers, Ludo shared his thoughts on the various issues that the BIR community may need to reflect on, including open metadata, references, and abstracts.
Lucy Lu Wang (Allen Institute for AI, USA) discussed Text mining insights from the COVID-19 pandemic. She described the emergence of novel information retrieval and NLP tasks with the potential to change the way information from the scientific literature is communicated to healthcare providers and public health researchers. Lucy discussed some of the ways the computing community came together to tackle this challenge, with the release of open data resources like CORD-19 and the introduction of various shared tasks for evaluation. She also presented her work on scientific fact-checking, a novel NLP task that looks to address issues around scientific misinformation, and its practical uses in managing conflicting information arising from COVID-19 pandemic publishing.
Jimmy Lin (University of Waterloo, Canada) presented approaches to domain adaptation for scientific texts and discussed the limits of scale. He argued that a fundamental assumption behind bibliometric-enhanced information retrieval is that ranking models need to be adapted to handle scientific text, which are very different from the typical corpora (Wikipedia, books, web crawls, etc.) used to pretrain large-scale transformers. One common approach is to take a large “general-domain” model and then apply domain adaptation techniques to “customize” it for a specific (scientific) domain. However, appears that the far less satisfying approach of “just throwing more data at the problem” with increasingly larger pretrained transformers seems to be more effective. In fact, over the last year, Jimmy’s group has “won” multiple community-wide shared evaluations focused on texts related to the novel coronavirus SARS-CoV-2 using exactly this approach: document ranking (TREC-COVID, TREC Health Misinformation), question answering (EPIC-QA), and fact verification (SciFcat). Jimmy shared their efforts to grapple with the issues of why “smarter” is not better than “larger”, and opened up the discussion to try to understand why.
The following research papers were presented in 3 sessions:
- Shintaro Yamamoto, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš and Shigeo Morishima: Self-Supervised Learning for Visual Summary Identification in Scientific Publications
- Pablo Accuosto, Mariana Neves and Horacio Saggion: Argumentation mining in scientific literature: From computational linguistics to biomedicine
- Frederique Bordignon, Liana Ermakova and Marianne Noel Preprint abstracts in times of crisis: a comparative study with the pre-pandemic period
- Hiran H. Lathabai, Abhirup Nandy and Vivek Kumar Singh: Expertise based institutional recommendation in different thematic areas Ahmed Abura’Ed and Horacio Saggion: A select and rewrite approach to the generation of related work reports
- Jacqueline Sachse: Bibliometric Indicators and Relevance Criteria – An Online Experiment
- Ken Voskuil and Suzan Verberne: Improving reference mining in patents with BERT
- Manajit Chakraborty, David Zimmermann and Fabio Crestani: PatentQuest: A User-Oriented Tool for Integrated Patent Search
- Daria Alexander and Arjen P. de Vries: “This research is funded by…”: Named Entity Recognition of financial information in research papers
All in all BIR 2021 was a great success and showed again the relevance of the interdisciplinary BIR workshop series to the communities involved. Our hopes are for a new BIR in 2022 and to meet the members of our communities in person.