hi

In the April issue

To start with IRSG affairs, our Chairman Udo Kruschwitz comments on some aspects of the very successful ECIR 2023 conference  and our Secretary, Steven Zimmerman has an update on Committee Membership following the 2022 AGM held in November following the Search Solutions 2022 event. The ECIR 2023 conference was a great success, and there are excellent reviews of the Conference itself and the Industry Day which is a popular feature of ECIR. ECIR is also the venue for the presentation of the Karen Spark Jones Award, and I am grateful to William Wang for allowing me to interview him (sadly remotely!) about the way that his career has developed. Clearly a very worthy recipient of the Award. The role of managing the Awards process for the KSJ Award 2023 has now passed from Professor Jochen Leidner to Dr Haiming Liu.

Another Award that IRSG is associated with is the Tony Kent Strix Award, managed by the UKeiG. In 2022 there were, exceptionally, two winners, Professor Iadh Ounis and Dr Ryen White, and Graham McDonald has written a report on their lectures.

On the subject of conferences, IRSG has published an invitation to run one-day events for the Group, details of which can be found on the IRSG web site.

Andy MacFarlane provides a list of forthcoming events (with my thanks for meeting a difficult deadline). Among them are Search Solutions 2023 and (though not an IRSG event) CIKM 2023 in Birmingham in October.

The last few months have seen a torrent of announcements and pontifications about the evolution of LLMs and associated applications. I’ve included a report of an Alan Turing Institute conference I attended in late February as a way of demonstrating the rate of evolution of these technologies. In my tail-end slot I offer some reflections on the speed of development of AIGC applications against the speed of research and research publication. Editors always have an in-tray and I’ve included a few items from mine that you might find of interest.

As a demonstration of ChatGPT in operation, I had invited Steve Zimmerman to write about his transition from academic to practitioner and in this issue you will find both Steve’s original text and the ChatGPT summary version.

Another perspective on the work of search practitioners is a fascinating account by Christoffer Stjernlöf  of IR software development for e-commerce search, where high relevance is absolutely essential and yet very challenging to achieve. To complement a feature on e-commerce search I have written a review of the first English edition of Understanding Search Engines by Professor Dirk Lewandowski, amazingly the first book on this topic to be published, 28 years after the launch of Alta Vista. But well worth waiting for!!

In the October issue…

The next issue of Informer will close for copy contributions on 13 October and be published towards the end of the month with full details of the Search Solutions 2023 programme

It will also definitely and absolutely be my last issue!  So if you have always looked at Informer and thought you could do a better job than me (very likely!) this is your opportunity to demonstrate your skills. As regular readers are aware, my take on IR life is very much from a practitioner perspective, so perhaps it might be a good time to pass the baton to someone with an alternative perspective.

We had hoped to introduce a new web platform for Informer this year but for a range of reasons this has not happened. It would be ideal to have the 2024 Editor in place by early September at the latest so that we could jointly work on the options for the future based on how the October issue is put together even if there is not time to make a change for the October issue itself.   If you would like to chat about what the Editorship entails then contact me on martin.white@intranetfocus.com

The Editor’s in-tray

A small miscellany of search-related items that have arrived in my in-tray recently that you might be interested in

Microsoft Search Hero Mastermind Group

This is a new and very enterprising training course for Microsoft search managers that has a mix of tuition and mentoring spread out over a three month period. The course has been developed by Agnes Molnar, Search Explained. The course is spread out over a three month period.

Continue reading “The Editor’s in-tray”

CIKM 2023 Birmingham 21-25 October 2023

The 31st ACM International Conference on Information and Knowledge Management (CIKM) will be held in Birmingham on 21-25 October. The last time this conference was held in Europe was 2018!  It is not a BCS event but many members of IRSG are involved in the event.

The General Chairs are

Ingo Frommholz, University of Wolverhampton, UK

Frank Hopfgartner, University of Koblenz, Germany

Mark Lee, University of Birmingham, UK

Michael Oakes, Independent Researcher, UK

The Informer Editor is acting as Chair of the Sponsorship Committee. Rumours that I get a percentage from the income I generate are sadly a hallucination.

Search Solutions 2023 London, 21/22 November

A mark-your-diary item for the annual Search Solutions 2023 conference. It will be held at the BCS London office in Moorgate and will be on-site only. Tuesday 21 will be Tutorials Day and Wednesday 22 will be Conference Day. There will be call for papers and tutorials in early May and this will be posted on the BCS IRSG web site . In association with the Conference there will also be the Search Industry Awards.

Karen Spärck Jones (KSJ) Award – 2023 timeline for nominations

Professor Jochen Leidner, the current chair of Karen Spärck Jones (KSJ) Award, presented the trophy to the 2022 KSJ award winner, Professor William Wang, who delivered his keynote presentation at ECIR 2023.

Since Jochen has chaired three years of the KSJ Award, and 2022 is the last year of his term I am now the Chair of the Awards Committee for the 2023-2025 period.  You can contact me at h.liu@soton.ac.uk

The call for nominations of KSJ Award 2023 will be out soon. Detailed information about the award and the nomination can also be found on the BCS IRSG KSJ Award page: https://www.bcs.org/membership-and-registrations/member-communities/information-retrieval-specialist-group/awards/karen-spaerck-jones-award/

Continue reading “Karen Spärck Jones (KSJ) Award – 2023 timeline for nominations”

IRSG 2022-2023 AGM and Committee Elections

The IRSG Annual General Meeting (AGM) took place immediately after Search Solutions 2022, where the committee election results were announced.  The draft 2021 minutes can be found here, with 2021 AGM confirmed minutes found here.  The full list of current committee members is now available on the IRSG governance page.  All committee positions were filled unopposed.

Newly appointed ordinary committee members include Monica Paramita (Sheffield University), Sean MacAvaney (Glasgow University), and David Rau (University of Amsterdam).  Reappointed committee members are Udo Kruschwitz (Chair), Ingo Frommholz (Treasurer), and Haiming Liu (Membership Secretary).  We wish to thank outgoing committee member Krisztian Balog (ECIR 2022 Committee Member) for their service.  Full details of the BCS IRSG committee are provided on our governance page.

A reminder our elections take place every Autumn, so please watch our governance page and the IR listserv to which you can subscribe to for future election announcements.  We are very keen to have new members on our committee.

Steve Zimmerman (IRSG Secretary)

Academia and the Enterprise – Steve Zimmerman

Academia and the Enterprise

It is an honour to be asked by a highly respected contributor to the enterprise search community to share my journey from academia into the enterprise.   Admittedly, it has been an unusual journey, so perhaps it’s best to say a bit about where things are at the exact moment before diving into the details.  

Now

Currently, I am a Senior Data Scientist in the NLP team at a large multinational, and there has never been a more interesting time to work in search and NLP.   This is a strong statement given my journey into search and NLP, which began 10 years ago,  has always been fascinating.   So what makes this journey even more fascinating now?    Probably not surprising to you, the latest generation of large language models (LLMs) is what has made the work even more interesting.  A former colleague told me about ChatGPT on December 1st and said it will be as big as Google.   

Continue reading “Academia and the Enterprise – Steve Zimmerman”

Alan Turing Institute Conference on LLMs February 2023 – a historical perspective?

This symposium, organized by the Alan Turing Institute,  was held at the IET, Savoy Place on 23 February and attracted around 350 delegates, including what seemed to be the entire UK machine learning research community. There were seven presentations and a panel session that I was not able to stay for.

(Note from the Editor. Two months  seems to be a lifetime in LLM world and I thought twice about including this! However it does highlight the very proactive role that the Alan Turing Institute is taking on behalf of the UK AI community)

I came away with pages of notes made at the symposium but as I have worked through them I have decided not to report on a paper-by-paper basis but instead to synthesize what to me were some (certainly not all!) of the take-aways of the day.

Continue reading “Alan Turing Institute Conference on LLMs February 2023 – a historical perspective?”

ECIR 2023 Industry Day

The European Conference on Information Retrieval (ECIR) seems to have emerged from the pandemic stronger than ever. This year, in Dublin, saw the highest number of attendees ever, I’m told, at over 380 in-person and virtual attendees. I’m not a computing academic, so I don’t sit through the academic presentations, but the last day of the conference is termed the Industry Day, and is designed to straddle the divide between the academy and the real world. This year, there were around 60 stalwarts who stayed the course for a very full day of no fewer than 13 presentations. What makes the industry day exceptional is the range and number of questions asked by the audience: there is an informal air to the event that, I think, encourages discussion. This was a conference where the questions were not to display the questioner’s knowledge, but to provide a vital reality check. Have you tried this with users? What do you do about fake news? Is there a feedback loop once it goes live?

Continue reading “ECIR 2023 Industry Day”

Call for proposals for IRSG one-day events in 2023

The Information Retrieval Specialist Group (IRSG) of the BCS invites proposals for the organisation of one day events supported by BCS. Proposals will be evaluated based on the organisational and financial plans and benefits to the Information Retrieval community.

Important dates

* Submission deadline for this round: 19-May-2023

* Notification: 02-Jun-2023

Continue reading “Call for proposals for IRSG one-day events in 2023”

Relevance under uncertainty – the commercial realities of IR development

Relevance Under Uncertainty – How Loop54 does software engineering to advance relevance

Loop54 (on the market under the name FactFinder Infinity) is a technology that integrates with e-commerce stores and determines based on visitor interactions, in real time, which the most relevant products are for each individual user at every moment. It attempts to perform the function a really good salesperson would if you step into a brick-and-mortar store: figure out as quickly as possible exactly what you are interested in and guide you directly to that. Just as with a really good salesperson, the visitor is not meant to notice that anything out of the ordinary happened. This is not the business of definitive rights and wrongs, but ever so many shades of roughly correct.

John Carmack put it fairly well when he said about neural networks that “It is interesting that things still train even when various parts are pretty wrong — as long as the sign is right most of the time, progress is often made.”

Continue reading “Relevance under uncertainty – the commercial realities of IR development”

Tony Kent Strix Award 2022 lectures

The 2022 annual memorial lecture for the International Tony Kent Strix Award  hosted by the UK electronic information Group (UKeiG) in partnership with the International Society for Knowledge Organisation UK (ISKO UK) , the Royal Society of Chemical Information and Computer Applications Group (RSC CICAG)  and the British Computer Society Information Retrieval Specialist Group (BCS IRSG), was held on Thursday February 23rd 2023, hosted by Dion Lindsay

The award is given in recognition of outstanding practical innovation or achievements in the field of information retrieval. This year, in a break from tradition and accounting for the fact that there was no award given in 2021, there were two recipients of the award: Professor Iadh Ounis (Professor of Information Retrieval at the University of Glasgow) and Dr Ryen White  (General Manager and Partner Research Director at Microsoft Research), who is also an alumni of the University of Glasgow Information Retrieval Group. The online event was very well attended and regarded as a great success by the organisers.

Professor Ounis’ talk, titled ‘Perspectives on Experimentation and Reproducibility in Information Retrieval: Then and Now’, discussed the challenges of (and the real need for) reproducing experimental findings in the modern neural information retrieval era. The talk provided great insights into the complex information retrieval pipelines and the long dependency chains that exist between artifacts of modern information retrieval systems. Professor Ounis particularly noted the need, in this modern age, for more granular reproduction methods that can fully replicate all of the ingredients that contribute to the core advancements in information retrieval systems. The talk also provided an overview of how the PyTerrier information retrieval platform, developed at the University of Glasgow, can simplify the process of constructing and replicating modern neural retrieval architectures, by simplifying the process of constructing complex IR pipelines and combining modular system components using standard Python operators and expressions.

Dr White’s talk, titled ‘Intelligent Futures in Task Assistance’, provided an overview of Dr White’s contributions to understanding user interactions in search systems and his goal of providing a better experience for search engine users. In particular, Dr White discussed the importance of productivity assistance and task driven information retrieval in today’s modern era of digital assistants. Dr white provided insights into the necessity of systems to be able to decompose complex tasks to automatically identify and prioritise microtasks, and schedule activities through task duration estimation. The talk discussed the main lessons that have been learned from research on task intelligence and provided numerous insights into the potential future directions of artificially intelligent digital assistants.

Both of the talks were packed with many insights and interesting reflections on the developments that have led to today’s intelligent information age, and what is in store for the future. The slides from the talks are available from Professor Ounis’  and Dr White’s  websites (links above) and the video recording of the talks will be made available from UKeiG . I highly recommend checking them out. .

[Graham McDonald is a lecturer in Information Retrieval at the School of Computing Science, University of Glasgow. His research interests include responsible and fair information retrieval, sensitivity-aware search, and active-leaning strategies in decision support systems for document review tasks]

Editor – The UK e-information Group (UKeiG) has  announced that it plans to launch a call for nominations for the 2023 Tony Kent Strix and Jason Farradane Awards at its 29 June Zoom Members’ Afternoon. There will also be a special announcement about a third international award.

Understanding Search Engines – Dirk Lewandowski

It is remarkable that this book is unique in its coverage of the development of the technology, business and impact of web search. The web search engines play such an important role in our lives and our business activities that we take them for granted. It’s not just a lack of books but a lack of research papers as well, other than those that look at elements of the search process. Last year I published a history of enterprise search and trying to confirm dates and vendors and technical developments was a far from easy exercise. The development of online retrieval services was expertly documented by Charles Bourne and Trudi Bellardo Hahn in 2002 and Stephen Arnold’s interviews with the pioneers of online and web search give a sense of the pace of change from 2008 -2012 but the focus is primarily on enterprise search.

The arrival of this English edition of a book originally published in German by an author with an outstanding reputation in the science and the use of web search is very timely given the launch of so many AI-Generated Content (AIGC) applications, all of them offering search-by-prompt rather than search-by-query. Dirk Lewandowski is a Professor of Department of Information, Hamburg University of Applied Sciences and from 2013-2020 was Editor-in-Chief of the Aslib Journal of Information Management. Continue reading “Understanding Search Engines – Dirk Lewandowski”

Events Spring 2023

One Day Events 

Search Solutions 2023: The groups annual industry focused event, includes a tutorial day. 21-22 November 2023. 

Continue reading “Events Spring 2023”

ECIR 2023 Dublin – conference report

Last week I attended the 45th edition of the European Conference on Information Retrieval (ECIR) which was hosted in Dublin, Ireland. Given that this was only my second in-person conference I have ever attended it was great to bump into several familiar faces throughout the week. One reason for that could be that this year’s ECIR was the biggest ECIR of all time with a record-breaking number of more than 400 registered attendees. The high level of interest was evident on-site, as more than 300 people attended the keynote talks and receptions.

Continue reading “ECIR 2023 Dublin – conference report”

Karen Spärck Jones Award 2023 – Timeline for nominations

 

Professor Jochen Leidner  has been the Chair of the KSJ Awards panel since 2019. As his term of office has now come to a conclusion Dr. Haiming Liu (University of Southampton) is taking on the role for the next three years. The Award now alternates between ECIR and the Annual Conference of the European Chapter of the Association for Computational Linguistics promote integration between the IR and NLP communities. Karen Spärck Jones was an active member of both communities.

The call for nominations of KSJ Award 2023 will be out soon. Detailed information about the award and the nomination can also be found on the BCS IRSG KSJ Award page. The 2023  Award lecture will take place at EACL 2024, the location of which has not yet been confirmed

Timeline for the 2023 Award:

  • 1 September 2023 — closing date for nominations;
  • 8 September 2023 — deadline for support letters;
  • 8 December 2023 — notification of the prize recipient;
  • April 2024 — recipient presents keynote at EACL 2024.

IRSG would like to thank Microsoft Research Cambridge, the generous sponsors of the Award.

For further information please contact Dr Haiming Liu (KSJ Award Chair 2023-2025), h.liu@soton.ac.uk

An interview with William Wang – KSJ Award Winner 2022

William Wnag - Winner of the Karen Spärck Jones Award 2022

A very worthy winner: William Wang answering questions following his Karen Spärck Jones Award keynote talk

I asked William Wang, who at ECIR 2023 was presented with the Karen Spark Jones Award for 2022, if he could respond to a series of questions about his background and career. I am most grateful to William for the care he put into his replies.

What were your aspirations at high school?

I have been interested in computers since my father bought me an Intel-586 desktop in elementary school. During my junior high and high school years, I was passionate about writing HTML, PHP, and ASP for building websites that provide knowledge for online games. Creating online resources for gamers was a great way to share knowledge and expertise, and it can help fellow gamers improve their skills and enjoy the game even more. It requires a lot of hard work and dedication. Since then, I have become interested in building better technology to provide people with better access to knowledg Continue reading “An interview with William Wang – KSJ Award Winner 2022”

From Udo Kruschwitz – BCS IRSG Chair

ECIR 2023: bigger (and heavier) than ever

ECIR 2023: bigger (and heavier) than ever

Welcome to our Spring 2023 edition of Informer! Have we seen you in Dublin? Not inconceivable given it was the biggest ECIR ever in many respects including number of attendees (more than 400 registrations!). It felt as if everbody was there … Well, in case you missed the conference there is plenty of reading material in this newsletter including a conference report by Gregor Donabauer as well as a review of Industry Day by Michael Upshall, not to forget the interview with William Wang, the winner of the Karen Spärck Jones Award 2022 and a keynote speaker at this year’s ECIR.

In fact, Martin has done all his magic once again to compile a new edition of Informer that feels bigger and more varied than ever. I let you explore it rather than pointing out each contribution I particularly like (of which there are many).

Let me just highlight one item of interest to our group in particular. The CORE conference rankings are currently being updated and it is very reassuring that the compilation of material to support the case for ECIR to remain an A-ranked conference is in the safe hands of Sean MacAvaney, Lecturer at the University of Glasgow and newly elected IRSG committee member. This task is not an easy one and involves much manual work such as the collection of statistics ranging from citation counts to identifying key members of the IR community and comparison with competitor conferences.

The last month has however also been a very sad one as the wider AI / NLP / IR community lost three key members, Chris Cieri, Dragomir Radev and Yorick Wilks. Their contributions and good spirit will not be forgotten but in each individual case it feels like the end of an era.

Search Solutions 2022 Conference report

Search Solutions is managed by the Information Retrieval Specialist Group of the British Computer Society and is the only broad-spectrum search event outside of the USA. The conference was held at the BCS London office on 23 November, preceded by a day of Tutorials.

The conference was held at the BCS London office on 23 November, the first on-site Search Solutions event since 2019! This is a brief summary of the presentations, with a link to the author and also links to research papers and web sites mentioned by the authors in the course of their presentation. Heavy note taking!

The conference opened with a presentation by Natasha den Dekker on the approach being taken by LexisNexis to understand the expectations of users and the extent to which the search applications meet them. In the process Natasha gave a very good introduction to user research, describing the differences between behavioural and attitudinal techniques. with an emphasis on the benefits and challenges of A/B testing. She also highlighted the importance of diary studies, which take a lot of effort to set up and execute but bring substantial rewards in understanding the day-by-day use of a search application. (See also https://www.nngroup.com/articles/guide-ux-research-methods/)

The next paper was presented by Amy Walduck over a Zoom link from Brisbane, Australia. Amy started with a moving acknowledgement of the debt that Queensland owed to its antecendents. Amy described a topographical approach to understanding large-scale user logs of over 8 million searches a year on the Library Catalogue, all based on open source software and open data that had been redacted to remove any personal information. Amy remarked that there had been a steady trend over the last few years of queries being framed as questions, in particular ‘How’ and ‘What’ question formats. The software application was constructed with open source software.

After a break Brammert Ottens (Spotify) outlined the search strategy that had been adopted by the company, supporting both text and voice search. He framed his presentation around Mindsets (Focused, Open and Exploratory) and Intents (Listen, Organise, Share and Fact Check). Spotify are fortunate in being able to follow the history of a search as it has data on what the user then listened to and for how long, making it easier (but still very challenging at scale) to optimize the search experience. (See also https://dl.acm.org/doi/10.1145/3290605.3300529)

Another large scale search implementation was described by Mohamed Yahya from Bloomberg. He focused on recent efforts to develop question answering functionality, with the criterion that the outcome has to be correct at the time of presentation and explainable. The target was high precision rather than high recall. The system took a view on whether the question was answerable, given the scope of the repository, and if there was not adequate confidence the response was presented as a display of results rather than a narrative text response.

Of course, when it comes to scale Google takes the accolade. Filip Radinski talked about the increasingly blurred boundary between search and recommendation, focusing on the challenges of searching for film information based on soft attributes, such as scary, uplifting and boring. This comes down to the issue of subjectivity, which Filip discussed in terms of degree, semantic and compositional. Filip reflected on a number of overarching issues in his paper, including transparency (data, model and algorithm) and the lack of an adequate range of corpora to work on natural language search. (See also https://arxiv.org/abs/2205.09403 )

After lunch Farhad Shokraneh gave a quite impassioned paper about the problems that systematic searching gives rise to in a paper entitled ‘Futures of Systematic Searching’ in which the plural was not a spelling mistake! Farad started out describing the process of setting up a systematic review and the challenges of coping with a situation where the review process was in effect invalidated because of one or more research papers being published since the original scope of the review had been finalized. He emphasized that it was not just a matter of rerunning the search as more recent research might require the scope and strategy to be reconsidered. Another issue he mentioned when a machine learning routine decided to downgrade the relevance of papers that did not have an abstract. Farad concluded by presenting four versions of the future of systematic reviews. (See also https://www.sciencedirect.com/science/article/pii/S266730532200031X )

Gavin Moore (University Hospitals Coventry & Warwickshire NHS Trust) continued the healthcare theme with an application that he and Andrew Doyle had developed to be able to store and search clinical guidelines. I know from a project I carried out a few years ago for a major hospital that this is far from a trivial challenge as there are both Trust and NHS wide guidelines which up until March 2022 were maintained by NICE. The solution was based on the Google app and was an excellent example of how a very effective search solution could be developed with very limited resources.

The final session of the day was on enterprise search, which started out with Cedric UImer and Julien Massiera giving a demonstration of integrating Spacy into the Datafari open source application to give an enhanced semantic search capability, including entity extraction and refinement. (See also https://irsg.bcs.org/informer/2022/11/the-evolution-of-datafari-a-european-open-source-enterprise-search-application-cedric-ulmer-ceo/)

This was followed by Paul Lewis describing a project that he and his colleagues at Pureinsights were working on at the Publications Office of the European Commission. Currently this is working in just two languages (English and French) but in time will be expanded to most, if not all, of the official EU languages. What was notable about this implementation was the use of a knowledge graph developed out of the Oracle RDF repository, together with a quite complex content processing stack to deliver a very high-quality search experience. Both this presentation and the previous one from Datafari highlighted the move towards hybrid search applications built on a stack of individual components.

The conference concluded with a number of lightning presentations, each lasting five minutes, from Andy Neill and Richard Giazzi ( the Thompson Reuters HighQ deal support application), René Kreigler (OpenSource Connections) on the effective management of e-commerce search and Sean MacAvaney (University of Glasgow) on rethinking reranking. Cedric Ulmer reminded everyone of the four freedoms of open source software, namely the freedom to use, the freedom to distribute, the freedom to modify and the freedom to understand (exemplary documentation).

Next up were the Search Industry Awards, managed by Tony Russell-Rose.

The winners were

Best Search User Experience – Reza Rawassizadeh and Yi Rong working on ODSearch at Boston University https://paperswithcode.com/paper/odsearch-a-fast-and-resource-efficient-on

Most Promising Startup  Giotto – Matteo Caorsi  Chief Technology Officer  https://compliance.giotto.ai/

Search Professional of the Year Adam Tocock The Hillingdon Hospitals Library Services

Best paper at Search Solutions 2022 (voted by the audience) Filip Radinski, with Farad Shokraneh and Phil Lewis tied for second place

ChatGPT take on Academia and Enterprise

The following is the version authored by ChatGPT of Steven Zimmerman’s article titled ‘Academia and the Enterprise‘.   This version is notably shorter than the original article, and the facts  that remain are all correct.  However, ChatGPT has removed many anecdotes, including the mention of SkyNet (should we be concerned).   Which version do you prefer?

Academia and the Enterprise

It is an honour to be asked by a highly respected contributor to the enterprise search community to share my journey from academia to the enterprise. Admittedly, my journey has been unusual, so perhaps it’s best to provide some context before diving into the details.

Now

I am currently a Senior Data Scientist in the NLP team at a large multinational, and I can confidently say that there has never been a more interesting time to work in search and NLP. My journey into this field began 10 years ago, and it has always been fascinating, but the latest generation of large language models (LLMs) has made the work even more interesting.

A former colleague introduced me to ChatGPT on December 1st and claimed that it would be as big as Google. Now, just over four months later, I tend to agree with this assessment. The initial impact of ChatGPT is so significant that even South Park recently aired an entire episode about its powers and related dangers, co-written with ChatGPT nonetheless. It’s noteworthy that there is yet to be an episode devoted to the release of Google.

Admittedly, there is nothing that new with respect to ChatGPT as it builds upon an existing body of research in the space of generative AI. While there has been buzz around models like DALLE and deep fakes in recent years, ChatGPT is the first generative LLM that has garnered mass attention and permitted easy interaction.

Personally, I was blown away by ChatGPT as it was the first AI-based interactive dialogue system that had a feeling of being “real”. However, I quickly realised that there were big holes in many of the legitimate sounding responses it gave, which those in the business of AI and NLP refer to as “hallucination”. This raises questions about whether we should place so much belief in a capability from which its designers caution us that it will “hallucinate” from time to time.

For me, this question ties directly back to my academic research in the somewhat recent emergent field of interactive information retrieval (IIR), which focused on risk mitigation of harms on the Web. Due to this latest technology, there has never been a greater potential for harm, and paradoxically there has never been a greater potential for benefit. It turns out that there has never been a more important time for IIR to play a role in the development of methods and evaluation approaches for the safe usage of this capability. ChatGPT opens up many new research avenues to explore, and the research possibilities on the Web and in the Enterprise are not only massive but also highly important.

Before Now

It may interest some of you to know that I come from a family of computer scientists who have worked for large tech companies. However, I was initially hesitant to follow in their footsteps due to their gruelling work hours. Nevertheless, I found myself working in computing after finishing undergrad when job opportunities were scarce. While working as a contractor in various menial jobs, I took a few computing courses at Northeastern in Boston and soon found myself working full-time as a programmer at a large financial company.

After five years in technology, I took a break to explore the possibility of pursuing graduate studies in atmospheric physics at Cornell. After a couple of years of studying the fundamentals of atmospheric science, I realised that I was more interested in the computing aspects and less interested in deriving the fluid dynamics of the atmosphere. Though I developed my abilities to solve difficult problems independently while at Cornell, I no longer felt excited about an academic career in atmospheric sciences.

Around 2013, I first heard about NLP and the emerging field of data science through a well-known article on the topic, which sparked a flame in me. A well-timed life event led me to relocate to England, and I had the opportunity to join a newly created MSc programme that focused on NLP and search. At the London Text Analytics meetup, which was co-run by Udo Kruschwitz and Tony Russell-Rose, I connected with many companies that were hiring, including the small startup in a garage in Belsize Park that I interned at between my first and second year of my MSc. That startup has now grown into a much larger company called Signal AI.

After completing my MSc, I found full-time work in the data science team of a large newspaper, where I developed document classification pipelines and prototype recommender engines. Timing played an important role here too; Udo Kruschwitz contacted me about an ESRC-funded research grant that looked at human rights in the digital age, which aligned with my concerns about online misinformation campaigns. Specifically, I was very concerned about the false claims surrounding the Brexit referendum. This led me to focus my PhD research on harm mitigation on the Web, initially on hate speech mitigation but then pivoting towards the consideration of the human in the system.

Around the time I submitted my paper on this topic to LREC for review, I attended the Autumn School for Information Retrieval and Foraging (ASIRF) at Dagstuhl and read Daniel Kahneman’s “Thinking, Fast and Slow”, lent to me by a fellow PhD student in the Psychology department who researched judgement and decision making in medicine. Attendance at ASIRF introduced me to many great researchers, most notably David Elsweiler, who lectured on the fundamentals of IIR studies. The book and the Autumn school were the foundation for a rapid update to my PhD research plan to include the consideration of the human in the system. This shift in research led to co-authored papers with David Elsweiler and the aforementioned PhD student (Alistiar Thorpe).

Concurrently with my PhD research, my advisor encouraged me to explore avenues in the private sector. He connected me with an enterprise search expert at a large energy company in London, which led to an internship that took place during my PhD. This internship transitioned to my current full-time role as a search and NLP researcher in the private sector. My research is predominantly in the private sector and heavily focused on enterprise search. Applications of NLP and search have interested me from the first day I set foot in the field.

I close with some key learnings from my experience.

When considering an advanced degree in Search/NLP

  • It’s beneficial to take an interdisciplinary approach to your research. While my core research was in computer science, it also considered a broad set of fields. In today’s world, we can’t afford to take a narrow view of the problems we face.
  • Pursuing a PhD is a massive commitment, and I strongly advise against self-funding.
  • While ideology can be a great motivator for research, it’s important to be prepared to let it go. My experiences with hate speech research taught me a lot about this matter.

For those pursuing or recently enrolled in a PhD program, here are some helpful tips:

  • Dive into hands-on work early on in your PhD. Start building experiments and aim to publish your findings as soon as possible.
  • Consider applying for a doctoral consortium, such as the one offered by SIGIR. This is a fantastic opportunity to connect with other researchers in your field and gain valuable experience.
  • Attend summer schools to expand your knowledge base and build connections with potential co-authors. For example, both the ASIRF and the summer school for Bounded Rationality at the Max Planck Institute for Human Development are great options.
  • Consider doing an internship or placement at a company to get a sense of whether academia, the private sector, or a combination of the two is the right fit for you.

When it comes to choosing between academia and industry, it’s important to understand that it’s a spectrum, and you need to find what’s right for you after your PhD. There are several considerations and possibilities to keep in mind:

  • Evaluation is much more straightforward in academia than in the private sector. Academia offers greater experimental control, while industry has many moving parts and people to work with.
  • Pure industry jobs tend to pay more, but pure academia offers more freedom (although this freedom has eroded in recent years).
  • Industry also offers the opportunity to investigate interesting research problems in search and NLP, but the problem is typically business-driven, making it easier to define.
  • Some private sector companies offer research positions that allocate some time for academic work outside of the company.
  • It’s common for individuals with full-time academic appointments to do side research in the private sector.
  • It’s possible to work in the private sector and still maintain an academic affiliation to conduct research on the side.
  • If you’re interested in a full-time academic appointment, it’s important to talk to people in that field and fully understand the responsibilities involved, which are quite different from a PhD or post-doc. You’ll also have to create course syllabuses, teaching slides, grade assignments, and do administrative work.

And finally – from the Editor

I’ve included a conference report in this issue of Informer on the Alan Turing Institute conference on LLMs (held in London on 23 February) to give an indication of the speed of development of LLMs and related applications (ChatGPT, Microsoft Copilot and oh so many more!) over the last few months.

The opportunities for research into the performance and possibilities of LLMs (I’m using this in a very generic way) are both colossal and essential if we are to get the best from this technology and avoid the worst it has to offer. It has struck me that the publication of this research is not keeping up with the speed of development. Even in journals that pride themselves on early publication the papers have a historical perspective which is interesting but of questionable long term value. There is also the challenge of finding peer reviewers that have an appropriate level of expertise in the topics.

Continue reading “And finally – from the Editor”

In the Autumn issue

Let me start with IRSG business. Udo has written his last post as Chair as a new Chair will take over after the AGM on 23 November.   There have been no candidates to take on the Editorship of Informer so next year we are planning to publish just two issues (in April and November) in what we hope will be an interim situation just for the coming year.

The AGM will take place at the end of the Search Solutions Conference, and you will find the programmes for both the Conference and for the Tutorials on 22 November. Details are also on the IRSG web site.

The two feature articles in this issue report on some aspects of the research underway in the DoSSIER project and give an insight into enterprise search developments from a vendor perspective.

I suspect that few readers will be familiar with the work of Carlos Cuadra, who died in August. I knew him quite well and have written a short obituary and listed some of his major achievements in the development of commercial information retrieval services.

Moving on to books, you will find a review of a book published in 2006 that still remains one of the best introductions to the technology and business impact of enterprise search. I thought it had vanished but came across it recently as an open access download. Also on open access is my attempt to write a history of enterprise search from 1938 (that is not a misprint!) to 2022. There are of course many books on various aspects of search from both a practitioner and research perspective and I have listed out what I hope is a representative selection of books published since 2010.

Andy Macfarlane provides his usual list of IR-related conferences around the world, many of which you can probably observe from the comfort of your home office.

And finally some thoughts from me about what I regard as a rather substantial gap in IR research, probably because (at least in my opinion) search is a poorly understood wicked problem.

Martin White

From the BCS IRSG Chair

Welcome to this autumn issue of Informer from your Chair. Hopefully you had a spectacular summer (or winter if you happen to be in Australia) with plenty of sun (or snow) and perhaps some conference trips?

A warm welcome to the AGM

Well, if your answer to the last point is ‘unfortunately not’, then maybe you should consider joining us at Search Solutions 2022 in London later this month. We are looking forward to an in-person event and the speakers (what a line-up!) are equally excited to be back in the room with people like you asking questions and discussing the latest in search technology and deployment.

But wait, wasn’t there some other key date in the diary that month … you are right, our AGM will be co-located with Search Solutions and that is your chance to make your voice known. At the AGM we will also welcome new committee members and officers. As I write this we are still open for nominations including for the role of Chair to the committee (has it really been two years that I took on the role?).

Looking ahead, we also have ECIR 2023 in Dublin nicely shaping up with some record numbers of submissions. Right now the various review processes are well underway, but one thing we know already: workshops have been announced and it is … a record number of accepted submissions.

So who will host ECIR 2024? You will find out at the AGM …

Let’s get back to Informer. I am very pleased that Martin has worked his magic yet again to compile a comprehensive list of contributions (I am amazed myself every time I check out the new issue I come across some surprising stories not covered elsewhere, such as David Hawking’s new book reviewed in the summer edition). Just to pick out something from the current issue, we have two contributions for the newly established Graduate Corner. Wojciech Kusa and Georgios Peikos are both PhD students in the DoSSIER project. And if that was not all, you even get a report on the DoSSIER summer school that was held recently in Greece at the birthplace of Aristotle …

I hope you enjoy this issue of Informer, and hope even more that I see you at one of the forthcoming events we are involved in. Sunny greetings from Bavaria … (easy to say as it’s always sunny here) …

Udo Kruschwitz

IRSG AGM and Elections

The BCS IRSG Annual General Meeting (AGM) is scheduled to take place on Wednesday November 23rd at 6PM.  The AGM will take place immediately following the close of Search Solutions 2022, which is being held at the BCS London office at 25 Copthall Avenue
London EC2R 7BP.    As with the Search Solutions conference, the AGM will be run in a hybrid format.

During the AGM, updates will be provided including announcement of the ECIR 2024 location and election results for the new committee members.   If you are interested in becoming a committee member please see the election page here.   The deadline is officially November 3rd, however due to a low response of candidates, the window is being extended.