CLEF 2020 – some student perspectives

Editor – I’m delighted to be able to publish reports on the CLEF2020 conference from some of the students who attended the conference, which took place in September 2020. I gave them a set of four headings and a word length. I feel it is important to capture the views of students on events as they see things with a different perspective to those of us who have spent much of our working lives inside a conference venue.

Boshko Koloski

My research

I am currently working as an undergraduate researcher at the Jožef Štefan Institute in Ljubljana. My primary interest is in the NLP, I also have interests in graph and optimization fields. My main research is focused on documentation representation enrichment with knowledge graphs and fake news detection. I have worked on several publications and industry projects. Currently, I am researching the impact of knowledge base enrichment of fake news representations and how this enrichment affects the results. The results so far are promising and yield improvement of all the tasks tested.

My expectations for the conference

I wanted to attend CLEF2020 because I am interested in discovering advances in various fields by researchers. Since several workshops of the conference are based on text processing and information and feature extraction from texts. I expected various approaches to different problems and a solid ground for further networking. The most interesting workshops at the conference for me are the PAN and eHealth. I took part in two tasks of the PAN workshop – the fake-news spreaders profiling and the celebrity profiling task. Our models scored third and second place accordingly. I am interested in overviewing the approaches of other participants in the tasks.

My conference experience

I liked how the conference organizers exploited the COVID situation to invite more researchers to present their work. With this, I was able to enlist on more workshops and watch the presentations of authors that pre-recorded them.  The organizational and expertise of participants at the conference were at a high level. The Q&A sessions provided a deeper insight into the mechanisms and the architectures of models produced by various authors. However, I only managed to attend the PAN sessions of the conference. A suggestion by me for the organizers would be if it is possible to record the whole session including both the presentation and the Q&A.

My next steps

The importance of stylometric features in profiling authors and their significance is something I learned. I want to further explore how our approach based on constructing only latent space by matrix factorization can be enriched to improve the model’s performance. I learned how the sentiment affects the author profiling tasks and how to effectively employ it to your model’s architecture.  Additionally, I saw how the network-based approaches as currently the state of the art models coped with solving some tasks.  My departments’ webpage is

Sandaru Seneviratne

My research

Comprehension of medical text can be a tough task for people with no or less medical background due to the amount of medical jargon including abbreviation, acronyms and other complex terminologies present in them. In my research I focus on how natural language processing and machine learning techniques can be used to improve the readability and understandability of medical content in diabetes domain. I mainly focus on text simplification; both lexical and syntactic to ensure that people with diabetes can obtain consumer friendly health information avoiding any miscomprehension and misunderstandings.

My expectations for the conference

This was my first experience in CLEF2020 and in an information retrieval task. The main goal of participating in the task was to try out baseline models in information retrieval with the data provided in the task to get an understanding about the techniques. Since I did not have prior experience handling large data collections, this was a great learning opportunity which threw many challenges and taught technical as well as other skills like planning, goal setting, delegation, time management, etc.

My conference experience

This was my first conference experience as a workshop participant and a presenter. My main interests were in the CLEF eHealth lab series related to information extraction and retrieval. The conference sessions were well organised and the keynote presentations and discussions were very insightful showing new avenues in the research field. The organisers and the research community were supportive during the sessions encouraging the participants and guiding them. Presenting in front of senior researchers was a valuable opportunity and the conference experience, for me as a first-time presenter showed me my strengths as well as where I could improve on.

My next steps

The task CLEF eHealth is related to my current research and provided me with insights on how the medical information retrieval techniques and improvements in the area can be aligned with my research along with the importance of consumer friendly health information. The conference experience made me aware of the value of soft skills from planning to presenting the work effectively and showed me the challenges that I would encounter throughout the years in the research field.

These learnings related to research and personal development will have a huge impact in my research life.

Twitter handle : @inu_sen

Louise Bloch

My research:

I started my PhD project in the middle of this year. The project deals with the use of machine learning for the early detection of Alzheimer’s disease. In particular, the aim is to use machine learning methods to train interpretable, transferrable and reproducible models that predict Alzheimer’s disease at the early stages. In the beginning, I mainly used classical machine learning methods for this purpose. To learn how to use deep learning and to know the typical problems and solution methods, I participated in the SnakeCLEF with a group of students.

My expectation for the conference:

I expected to discover typical deep learning problems and possible solutions from the CLEF conference. Thus, I was particularly interested in the alternative solution strategies of the other participants. Besides, it was of great interest to me how the lab organizers choose the datasets used, as I hoped to gain knowledge about what to look for to generate a meaningful and representative Alzheimer’s disease dataset. In this context, I was interested in those datasets that also deal with problems in the health sector.

My conference experience:

By participating in the SnakeCLEF challenge, I was able to familiarize myself with the use of deep learning processes through an exciting task. The CLEF conference was a great experience where I was able to expand my knowledge about deep learning and the techniques other researchers used to overcome the problems of small and unbalanced datasets. I particularly liked the presentations of the other lab participants not only of the SnakeCLEF challenge but of all LifeCLEF challenges, the ImageCLEF challenges and the eHealth challenge. Due to scheduling conflicts, I was not able to attend a lot of the plenary conference meetings, but I did like the presentation of the paper “Query or Document Translation for Academic Search – What’s the Real Difference?” from Petras et al. because it presented some views on data augmentation which might be important in some parts of my future work.

My next steps:

As I already mentioned, I have learned a lot about the typical problems occurring by training deep learning models and some possible solutions to these problems. I will use the practical experience I have gained with deep learning methods in the future to investigate the early detection of Alzheimer’s disease in more detail. Additionally, I was able to gather some additional inspiration for my future research. These inspirations primarily relate to the compilation of datasets, the combination of metadata and image data, data expansion and transfer learning.

My department link:


Silvia Corbara

My research

I am a PhD student in Data Science, and I am currently researching on the Authorship Identification field, which comprises of methods and techniques aiming to solve tasks such as: identifying the most probable author of a certain document within a set of candidate, determine whether the same author wrote two different documents, and so on. In particular, I am focusing on cultural heritage problems (thus, concerning texts of cultural and historical value), where issues such as lack of large datasets and processing tools are often particularly relevant, especially when tackling ancient languages such as Latin.

My expectations for the conference

Given my field of study, many labs within CLEF are of great importance for me: firstly, the PAN shared tasks have always been a great opportunity to see what other teams and researchers around the world have been working on regarding Authorship Identification; secondly, other labs, such as eRisk and CheckThat!, offer an easy access to projects that are different, but still close to my studies, allowing me to broaden my perspectives. Moreover, scrolling through the participants list, many very well-known names emerge, making the conference an unmissable chance to meet and dialogue with them ‘in person’.

My conference experience

Sadly, due to the mandatory online format, conferences face the risk to become a continuous change of microphone among 10-minutes articles presentations, rushed and impersonal. This was not the case with CLEF: many highly interesting exchanges sparkled through all the sessions I have attended, harnessing valuable hints from the current events (e.g. COVID-19) and critically reflecting on the nature of the labs themselves. In addition, the labs committees offered comprehensive explanations regarding the organization details, which not only helps to better understand the various task, the evaluation metrics etc., but it also shows the required steps for such a competition, along with the issues one may have to deal with while planning one, something I feel it is always worth to be aware of.

My next steps

During this year’s PAN, it was interesting to see that researchers seem to be focusing more and more towards (deep) neural approaches, often obtaining truly astonishing results in terms of accuracy and efficiency. Furthermore, the attendance to labs that lie slightly outside my research field actually gave me some invaluable suggestions for techniques that I would like to try to tweak and adapt to my work. Finally, as I previously anticipated, the discussions among the attendees helped me to reflect on the nature and reasons of a task, more than aiming only at its execution.

My department link:

About Martin White
Martin White

Martin is an information scientist and the author of Making Search Work and Enterprise Search. He has been involved with optimising search applications since the mid-1970s and has worked on search projects in both Europe and North America. Since 2002 he has been a Visiting Professor at the Information School, University of Sheffield and is currently working on developing new approaches to search evaluation.

Leave a Reply

You must be logged in to post a comment.