In the Autumn 2021 issue

I’ve given a lot of prominence to the forthcoming Search Solutions 2021 event on 23/24 November with Tutorials on the Tuesday  and the Conference on the Wednesday. Apart from one tutorial it will (sadly) be a virtual event. The IRSG AGM will take place at the end of the Conference, and a call for nominations to the Committee is included in this issue. Over the last few months the IRSG web site has been cleansed and migrated into the BCS Group template. We are working on a new template for Informer, but that may not be visible until early 2022. ECIR 2022 is now gathering momentum. It will be held on-site in Stavanger on 10-14 April but there will also be support for virtually-attending delegates.

The two feature articles this month are on the newly-announced IR Anthology, with some 60,000 research papers on information retrieval, and on the impact on IT budgets of moving search from on-premise to cloud platforms. There is a brief note on the publication of a history of the Institute of Information Scientists (1958-2002) and a review of an excellent book by Susan Walsh on classifying and fixing dirty data. And finally an alert to a profile I am writing on the life and achievements of G. Malcolm Dyson (1902-1978) a brilliant British chemist who transformed the fortunes of Chemical Abstracts Service whilst acting as Director of Research from 1959-1962. The issue closes with a comprehensive list of forthcoming conferences.

Much of this issue has been authored by me and that should not be the case. I’d be delighted to have contributions from members of IRSG about (just as examples)

  • Research projects you are working on
  • Conferences you have participated in
  • Departments that you are proud of
  • Visions of the future you would like to test out
  • People who have inspired you to take IR seriously
  • Books you have enjoyed (or perhaps not enjoyed!) reading
  • Applications you have developed
  • Problems you are facing and would welcome solutions
  • Problems you have solved which may have a wider application

Search Solutions 2021 23/24 November

We had hoped to run the Search Solutions 2021 Conference and Tutorials on-site at the BCS London HQ but constraints on the number of delegates that could be accommodated because of Covid concerns meant that last month we made the decision to go virtual with the Conference (which worked well last year) and with one of the tutorials. As you will see the one-day tutorial will be held at the BCS London HQ.

Details of both events follow but this is the Eventbrite registration link.

The Tutorials 23 November

Tutorial 1 Overview of Natural Language Processing

Michael Oakes

BCS, Ground Floor, 25 Copthall Avenue, London, EC2R 7BP

This tutorial will give an overview of Natural Language Processing, which is the computer processing of human-produced speech and text). The textbook “Speech and Language Processing” by Daniel Jurafsky and James H Martin will be used as a basis for the tutorial. The levels we will cover are morphology (shapes of subword units),  phonology (pronunciation of subword units), spelling checkers, automatic assignment of grammatical classes to words, relations among words, parsing with context-free grammars, meaning representations, word sense disambiguation, pragmatics (language above the sentence level) and a brief introduction to machine translation.

What we expect the attendees to gain an overview of the field of Natural Language Processing. Lecture style presentations will be interspersed with practical exercise where we carry out the actions of the computer on pen and paper.


10:00am: Overview of Natural Language Processing

10:30am: Regular Expressions and Finite State Automata

11:00am:  Speech Processing

12:00noon: Dealing with Spelling Errors

12:30pm: Automatic Part-of-Speech Tagging

2:00pm: Syntax: A Context-Free Grammar for English

2:30pm: Semantic Representations

3:00pm: Discourse Analysis

4:00pm: Machine Translation

4:30pm: Questions and Answers

5:00pm: End

Tutorial 2 – Practitioners’ Evaluation Roundtable – A virtual tutorial

Ingo Frommholz and Jochen Leidner

Information systems that are deployed in production settings and used operationally by hundreds or thousands of users are typically more complex than systems developed in academic research, which makes them much harder to evaluate. However, not evaluating a system is not a viable option, as it corresponds to “flying blindly” – the positive or negative impact of any change would remain unknown. As a consequence, many practitioners come up with their own protocols for assessing system quality in terms of the relevance of rankings given a query. In the academic world, several initiatives such as TREC1, MediaEval2 or CLEF3 are striving to provide benchmarks and datasets to make different solutions and algorithms comparable to each other for some specified task. A further example is Kaggle4. While BCS Search Solutions in the past has been successful in transferring knowledge among practitioners on the one hand, and academics and practitioners, on the other hand, we think evaluation is a topic that would require more attention. While we think there is no “one size fits all” solution, we also believe that there should be an exchange of ideas, solutions and experiences when it comes to evaluation information and search systems in an enterprise environment.

Instead of a full tutorial, we think the topic of evaluation needs to be driven by the participants. Hence we will conduct a round-table discussion (in lieu of a tutorial) at the upcoming BCS Search Solutions. Our aim is to provide an open forum where practitioners can share methods, metrics, challenges, and tricks of the trade with their peers. After a short introductory presentation that emphasises the importance of IR evaluation and sketches its history to set the scene and align participants, the format is one of free discussion without moderation. A human recorder will take notes, which may be published in a suitable venue (e.g. SIGIR Forum or BCS Informer) if findings emerge that are worthy to be preserved.

3.00pm: A brief history and introduction of IR systems evaluation – Ingo Frommholz & Jochen Leidner

3.45pm: Discussion & Lightning talks: Methods, metrics, challenges — how do practitioners evaluate their systems so far? – All participants

4.45pm: Discussion/Breakout Groups: Evaluation in “real-world” environments – all participants

5.30pm: Discussion of results/wrap up – all participants

6.00pm: Closing

The Conference 24 November

This year the format of the conference is based around paired papers (with a couple of exceptions) on specific themes, so that attendees can get two different perspectives on the themes. There will then be a Q&A session for both the speakers.

Incorporated into the agenda will be the presentation of the  BCS Search Industry Awards (organized by Tony Russell-Rose), one of which will be the SS 2021 Best Paper award which (for obvious reasons!) comes right at the end of the conference.

There will be two panel sessions at the end of the conference. The first of these will be a panel of some speakers and session chairs reflecting on what they have heard and learned during the conference. The second will be some invited panelists who will be asked to success what the themes for the SS2022 conference should be.

Once the final session is completed the AGM will take place. This will be open to all attendees but voting is of course only open to members of the Information Retrieval Specialist Group.

Inevitably attendees will come in and out of the event during the day, which is why each session starts on the hour so there is no excuse for missing a session that is of particular interest. We hope to make recordings of the presentations available but that may not be the case for every presentation, so please do not assume that you can miss a presentation and catch up later!

09.00 Formulating and treating information needs at work

Professor Katriina Byström, Department of Archivistics, Library and Information Science, Oslo Metropolitan University

10.00 Training for IR and data science

Professor Paul Clough, Information School, University of Sheffield and Peak Indicators

Olivia Foulds, Department of Computer Science, University of Strathclyde

11.00 Identifying and addressing misinformation

Dr. Andy MacFarlane, City, University of London

Dr. David Corney, NLP Engineer, FullFact

12.00 Searching the enterprise

Steve Sale, Search and Taxonomy Architect, AstraZeneca

John Western, Regional VP, Yext

13.00 Break

14.00 Systematic searching

Drs. Ing Rene Spijker, Academic Medical Centre, University of Amsterdam

BCS Search Industry Awards

15.00 Digital asset management

Tim Gollins, Head of Preservation and Information Management, National Records of Scotland

Theresa Regli, Consultant

16.00 Panel sessions

What have we learned today?

What are the priorities for 2022?

Search Solutions 2021 Best Paper Award


Call for nominations to the BCS IRSG Committee

The BCS Information Retrieval Specialist Group invites nominations for the following positions:

– Vice-Chair

– Secretary

– Inclusion Officer

– Six ordinary members of the committee

IRSG web site revisions

The new BCS IRSG web site has been up and running for a couple of months now. The web team at the BCS HQ were a pleasure to work with, and I’d like to thank Simon Curd and Fiona James For their patience and expertise converting my suggestions into the BCS Group template. There are a few pages which need a polish but overall it seems to be working well.

I’d like to draw your attention to the Resources page, which has been completely revised.

ECIR 2022 Stavanger 10-14 April 2022 


The 44th European Conference on Information Retrieval will take place on site in Stavanger, Norway.  Sessions will also be streamed for delegates who are not able to travel to Norway. There has been a very good response to the invitations to all the sections of this conference. The conference team has set up an excellent web site and you can track developments via Twitter https://twitter.com/ecir2022

IR and ACL Anthologies

The US equivalent of IRSG is SIGIR, which publishes its Forum newsletter every six months. This is always a very good read and you do not have to be a member of SIGIR to do so. One of the feature articles in the June issue (which only came online in October!) is an introduction to the IR Anthology, a recently releaased structured collection of almost 60,000 research papers. The well-establishedACL Anthology currently hosts 71505 papers on the study of computational linguistics and natural language processing.

A description of the genesis and structure of the ACL Anthology is now somewhat out-of-date but remains an excellent introduction to the concept of an anthology of research papers. The IR Anthology has been established by Martin Potthast (Leipzig University), Benno Stein (Bauhaus-Universität, Weimar) and Matthias Hagen (Martin-Luther-Universität Halle-Wittenberg).

Big Information and big budgets

The concept of Big Data has been around for some time. John Mashey at Silicon Graphics is usually credited with inventing the term in a presentation he gave in 1998. Without doubt big data is very difficult to manage and the demand for people with data science skills never seems to slow down. However much less attention has been paid to Big Information and the equivalent need for information scientists, a term invented in 1958 by Jason Farradane.

In early October a group of investigative journalists released the Pandora papers, The Pandora paper revelations came from a very large tranche of documents: 2.94 terabytes of data in all, 11.9 million records and documents dating back to the 1970s. I would recommend a very good article in Wired UK which provides a substantial amount of information on how the information in these documents was surfaced and analyzed.

History of the Institute of Information Scientists 1958-2002

Over the last two years I have been working with Dr. Sandra Ward and Professor Charles Oppenheim in writing a history of the Institute of Information Scientists. The IIS was founded in 1958, largely due to the vision and commitment of Jason Farradane and the support of G. Malcolm Dyson. The IIS merged/was taken over by the Library Association in 2002. The archive of the IIS, such as the minutes of Council and Committees and the minutes of the AGM have vanished so we had to compile the history by reading through back issues of Inform, the newsletter of the IIS, and related publications.

The IIS played a very important role in supporting the early promotion of the technology and applications of text retrieval. launching a very well-attended Text Retrieval conferences.  If you would like to get a sense of the software applications that were available pre-Google and Microsoft there is a good 1994 review article in the Journal of Information Science, which was the journal of the IIS.

The 60,000 word history can be downloaded from a dedicated web site. This history is very much work-in-progress. We encourage comments about errors and omissions and will then revise the document ahead of publication in 2022, to mark 20 years since the IIS disappeared.

Book Review ‘Between the spreadsheets’ Susan Walsh

The full title of this book is Between the Spreadsheets – Classifying and Fixing Dirty Data. Hooray – at last a book that focuses on content quality and does so in a very practical way. Susan Walsh (aka The Classification Guru)  is an information entrepreneur who somewhat accidentally fell into the business of sorting out messy data. At the heart of Susan’s methodology is COAT, which focuses on data Consistency, Organisation, Accuracy and Trustworthiness. Having spent much of this year working on an e-commerce search project I can confirm that even market leading e-commerce companies are at the mercy of poor quality data generated by suppliers. The company had to depend on suppliers paying attention to data quality and yet in search after search rogue products were presented purely as a result of inconsistent and often incoherent codes being applied to products.

The chapter headings define the scope as The Dangers of Dirty Data, Supplier Normalisation, Taxonomies, Spend Data Classificaion, Basic Data Cleansing, a Dirty Data Maturity Model and Data Horror Stories.

Events Autumn 2021

Note: Due to the COVID-19 crisis some events have been cancelled, postponed or will be run virtually. We have provided information on each of the events with the current status at the time of writing. Please check the URL of the event for further details.

And finally….

I suspect that the name G. Malcolm Dyson in the History of the IIS item above will be unfamiliar to anyone who has not been in chemical information retrieval for quite a number of decades. Dyson developed a linear notation for organic chemical compounds in 1946, initially with a view to supporting the use of punched cards to retrieve information.  In 1959 Dyson was Research Director at Chemical Abstracts Service and had started working with H.P Luhn (IBM) on using a computer to handle the searching process, even if the 1401 computer only had 8k of core memory. The cheminformatics research at the Information School, University of Sheffield, can trace its origins back to Emeritus Professor Michael Lynch, who worked at CAS (initially with Dyson) from 1961 to 1965.

In the Summer 2021 issue

I have often wondered if Editors of newspapers ever worry about whether there will be enough news to fill the next edition. I have a basic structure for Informer issues that runs several issues ahead because it is largely shaped by conference announcements and then reports of those conferences. Those items alone would make for a rather boring Informer so I am always pleased when I receive unexpected contributed articles.

I will start this issue by guiding you to a just such a contribution; a report from a SIGIR workshop on IR for Children 2000-2020: Where Are We Now? This is a most important topic if we are to ensure that the next generation of search users can not only gain the skills of searching but also understanding how to assess the information they find. I was very fortunate that my grandfather was the part-time librarian of the village library in Hampshire and that gave me an introduction to the concept of reading and finding at the age of 5.

As I write this introduction the good news is that in the UK Covid infections are decreasing quite rapidly. This opens up the opportunities to return to an office (albeit in hybrid mode) and to run on-site conferences. The next IRSG event will be Search Solutions 2021 on 23/24 November. We hope to be able to host this at the London HQ of the BCS but also have a Plan B! Then next comes a very northerly ECIR 2022 in Stavanger and you should note the dates of the event and the deadlines for submissions. If you want to start using up the travel budget held over from 2020 and 2021 then Andy MacFarlane has a long list of conferences for you to consider.

The Spring issue was full of ECIR 2021 reports so I held over a summary of the Industry Day presentations until this issue.

A sub-theme of this issue is ‘awards’ and you will find details of the nomination process for the Karen Spärk Jones Award, the Strix Award (IRSG participates in the judging) and the Search Industry Awards.

I would also highlight the BCS/CPHC Distinguished Dissertation award for 2020, which has been awarded to Dr. David Maxwell, at the time undertaking post-graduate research at the University of Glasgow. The way in which the thesis is presented is quite outstanding, and the science and art of good information presentation leads me to a review of a new book on this topic by the American statistician Edward Tufte.

Web design also requires careful attention to good practice in presentation and in late August we plan to release a new version of the IRSG web site curated by the web team at the BCS under the direction of Simon Curd.

The web site included a list of journals that publish papers on information retrieval. I am planning to revise and expand this list for the new site. Can you take a look at the draft list and let me know if there are any missing? Thank you.

We are also considering an upgrade to the Informer template but work on this is still in progress and the first issue with the new template will probably be the Winter 2022 issue.

There are also a few items of IR news for you. Professor Emily Bender (University of Washington) gives an excellent presentation on stochastic parrots in the context of language models, there is now an IR Anthology (over 53,000 papers!) that complements the well-established ACL Anthology and there are some signs that paid-for/ad+bias web search may be coming to your desktop in the very near future.

And finally, reflections from me on the topic of inverted file indexes and how they might present a barrier to the implementation of IR innovations captured in the IR Anthology. The concept of an ‘inverted file’ probably dates back to 1947 and the era of punched cards.

Search Solutions 2021 24 November – call for presentations and tutorials

Since 2007 the BSC IRSG Search Solutions conference has been an event where search good practice and academic IR research come together to share experience, user and business requirements and visions for the future of search. This year the conference will take place on 24 November, and we would welcome proposals for 30- minute presentations on any topic related to improving search performance and the user experience. We are especially keen to have papers from managers of B2B and B2C e-commerce, enterprise and intranet search and professional search applications.

ECIR 2022 10-14 April 2022 Stavanger, Norway

The European Conference on Information Retrieval (ECIR) is the premier European forum for the presentation of new research results in the broadly conceived area of Information Retrieval.  ECIR features full-paper and poster presentations, system demonstrations, tutorials, workshops, an industry-oriented event, and traditionally has a strong focus on the active participation of early-career researchers.  The 44th edition of ECIR is planned to be held as a physical conference (with support for remote attendance) at the northernmost location in the history of the conference, in Stavanger, Norway between April 10 and 14, 2022.

The list of the conference organisers can be found here.

The very well designed conference web site is now live

The important dates are

Workshops Submission: September 9 Notification: October 7

Full Papers Submission: October 7 Notification: November 18

Reproducibility Papers Submission: October 14 Notification: November 25

Short Papers Submission: October 21 Notification: November 25

Tutorials & Doctoral Consortium Submission: November 11 Notification: December 16

Industry Day Submission: January 14 Notification: February 10

ECIR 2021 Industry Day highlights

The Spring issue of Informer contained a number of articles on the very successful ECIR 2021 event, so I decided to carry over some reflections on the Industry Day to this issue. My first visit to ECIR was the 2011 Dublin event where I presented my SearchCheck methodology, which in the event was eventually launched in 2020. But that’s another story! The Industry Day has always been important to me as a search practitioner as it gives me a glimpse into how search have a direct impact on business and society. There were ten speakers at the Industry Day in ECIR2021 and several were of considerable direct interest to me. I’m going to focus on these as they are such good examples of search in the real world. The full programme for the day (1 April) can be found on the ECIR 2021 Program pages.

The Information Retrieval Anthology 2021: inaugural status report and challenges ahead

To quote from a paper published in the June 2021 issue of SIGIR Forum, “The Information Retrieval Anthology, IR Anthology for short, is an endeavor to create a comprehensive collection of metadata and full texts of IR-related publications. We report on its first release, the use cases it can serve, as well as the challenges lying ahead to develop it towards a resource that serves the IR community for years to come. The IR Anthology’s metadata browser and full text search engine are available at IR.webis.de.”  The anthology is a quite amazing piece of research that takes the approach of the ACL Anthology and extends it to Information Retrieval, indexing and categorising 53,673 research papers on information retrieval from journals and conference proceedings.  The papers are presented in a year-by-year chronology. 

At the time of writing this item the June 2021 issue of SIGIR Forum exists on the ACM Digital Library (and is therefore restricted access to ACM Members) but the open access version has not been added to the SIGIR Forum web site. The Anthology listing is available but this site gives no background information on the project. The list of proceedings and journals is very useful though I am surprised to see that the Journal of Information Science is not included.

In passing I should note that there is a very good overview of the ECIR 2021 conference in this issue of Forum.

It is disappointing that there is no indication (as of 3 August) as to why the OA issue is not available on the SIGIR web site. However there is an open-access summary version of the paper presented at SIGIR 2021

Search Industry Awards 2021 – call for nominations

We are delighted to announce this year’s Search Industry Awards, celebrating the best search innovations of 2021. Presented by the Information Retrieval Specialist Group of the BCS, these awards recognize people, projects, and organizations that have excelled in the design of search and information retrieval products and services. If you know of any people, projects, or products that deserve recognition, let us know by submitting a nomination. Alternatively, if you’re involved with something special yourself, you can submit an application today. Nominations will remain open until 1st November.  Winners will receive a framed certificate and a public listing on the IRSG Awards site.

Karen Spärck Jones Award 2021 – Second Call for Nominations

A pioneer of information retrieval, the computer science sub-discipline that also underpins the technology of modern Web search engines, Karen Spärck Jones was a British professor of Computers and Information at the University of Cambridge in Cambridge. Her contributions to the fields of Natural Language Processing (NLP) and Information Retrieval (IR), especially with regard to experimentation, have been outstanding, highly influential and lasting, and include the introduction of InverseDocument Frequency for relevance ranking.

To learn more about Karen and her work, see:

Strix Award 2021 – call for nominations

The UK e-information Group (UKeiG) is delighted to announce the call for nominations for the prestigious Tony Kent Strix Award 2021. Nominations should be received by 6 pm GMT on Thursday 30th September 2021.

The Tony Kent Strix Award was inaugurated in 1998 by the Institute of Information Scientists. It is now presented by UKeiG in partnership with the International Society for Knowledge Organisation UK (ISKO UK), the Royal Society of Chemistry Chemical Information and Computer Applications Group (RSC CICAG) and the British Computer Society Information Retrieval Specialist Group (BCS IRSG).

The Award is given in recognition of an outstanding practical innovation or achievement in the field of information retrieval in its widest sense. This could take the form of an application or service, or an overall appreciation of past achievements that have led to significant advances. The award is open to individuals or groups from anywhere in the world.

Nominations must be for a major, sustained or influential achievement that meets one or more of the following criteria:

Report on the SIGIR 2021 Workshop “IR for Children 2000-2020: Where Are We Now?”


(Monica Landoni, Theo Huibers, Emiliana Murgia, and Sole Pera)

This year, researchers and practitioners gathered during a workshop co-located with the 44th edition of the renowned ACM SIGIR conference to discuss the current status of information retrieval (IR) research targeting children.

The idea of hosting a workshop at ACM SIGIR first emerged from discussions among us organizers. It became apparent that even after more than 20 years since researchers and practitioners have heard from Yahooligans (a commercial search engine targeting children) and PuppyIR (a research project focused specifically on IR technology for children), research in this important area has not seen the steady growth that other areas of IR targeting mainstream users have experienced. With that in mind, the call for contributions for the IR for Children 2000-2020: Where Are We Now? Workshop specifically asked for vision papers reflecting on what could be the cause for the lack of consistent research outcomes in this area. It also enquired on topics that should be considered in the future, if as a community we are to continue to advance knowledge in this area.

Journals publishing information retrieval research – are there any missing?

The IRSG web site (see above) has  carried a list of journals that cover information retrieval research for many years. However there are a number of broken links and some rather strange omissions, such as the Journal of Information Science. The table below is a draft of the revised version, which I have expanded a little into information and knowledge management, both of which have a significant dependency on high-quality search. Could you take a look to see if any titles are missing? If there are could you please email the titles to me at martin.white@intranetfocus.com. No particular deadline as we can update the page after launch, but it would be good to have as comprehensive list as possible ready for the launch at the end of August. I have matched these against the journals included in the scope of the IR Anthology.

David Maxwell wins the 2020 BCS/CPHC Distinguished Dissertation award

The BCS in collaboration with the Council of Professors and Heads of Computing (CPHC) gives an annual award for the best thesis in computer science. David (at that time at the University of Glasgow) was recognised for his thesis: ‘Modelling search and stopping in interactive information retrieval.’ David is now undertaking post-graduate research at the University of Delft.

(The notes below are taken verbatim from the BCS Press Release. Somewhat strangely there was no link to the thesis, which can be downloaded from here)

Book Review – Seeing with Fresh Eyes:  Meaning Space Data Truth by Edward Tufte

I am concerned at the lack of interest in the IR search community in the design of search results pages and the design of the individual results snippets. You could argue that these are topics  are out-of-scope and yet recent work on perceptual speed suggests that we should be taking more care about information design if only because no one else seems to be concerned with it.

I will admit to having a long running and very considerable interest in information design, looking at (for example) the colour contrast and legibility of road signs, a topic where the UK has long been at the peak of good practice. This interest was stimulated by reading The Visual Display of Quantitative Information by the American statistician Edward Tufte in 1980. Since that ground-breaking book Tufte has gone on to write more books on the subject of information presentation. Seeing with Fresh Eyes was published in 2020 and I am very grateful to Pam Mozier at Graphics Press, Connecticut, for making a review copy available.

Is there now a business in paid-for web search?

I would commend a blog post by Stephen Arnold on the future of paid-for web search, prompted by the release of Neeva which comes free for three months and then you pay $4.95 a month. (That fee is in very very small print on the home page!) Over the years several companies have tried to take on Google, notably Exalead, and of course Microsoft. A lot of potential here for the research community to carry out a host of A/B tests on Neeva vs Google. Incidentally if you have not come across Stephen before he is a very well informed (and usually highly sceptical) watcher and analyst of the search business. Some years ago now we co-authored a guide (actually quite a large report) about the successful management of enterprise search but it failed to make the best sellers list. Indeed it failed to sell at all!