And finally!

And finally!

As a way of keeping in touch with information retrieval research during the lock down I started to look at some sections of arXiv on a regular basis. After a few months I homed in on the following sections as being the most fruitful.

Artificial Intelligence authors/titles recent submissions (arxiv.org)

Computation and Language authors/titles recent submissions (arxiv.org)

Computers and Society authors/titles recent submissions (arxiv.org)

Digital Libraries authors/titles recent submissions (arxiv.org)

Human-Computer Interaction authors/titles recent submissions (arxiv.org)

Information Retrieval authors/titles recent submissions (arxiv.org)

Social and Information Networks authors/titles recent submissions (arxiv.org)

Over the last six months in particular the volume of pre-prints seems to have increased substantially, with Artificial Intelligence, Information Retrieval and Computation and Language often exceeding 100 new papers a day. Scanning them has been an interesting exercise because it replicates the challenges of scanning research results, especially when Microsoft 365 decides that life is easier with no snippets.

The first step in the process is the initial read-through to note items that have at least some indication of relevance. This comes down to perceptual speed, and I’m also conscious of the extent to which initial capitalization is helpful in this process. Another factor is the extent to which I can comprehend the title.

Zero-shot Slot Filling with DPR and RAG

BigGreen at SemEval-2021 Task 1: Lexical Complexity Prediction with Assembly Models

Just two examples of many where the title is only intelligible to a small number of research teams working in that area. Is that a good idea if you have a genuine interest in achieving a high impact with your research? As a result on many occasions I have to read through the abstract, which for some reason best known to arXiv is presented in a 50 word line length. The sentences below are just a single line in arXiv.

Bayesian optimization is a popular algorithm for sequential optimization of a latent objective function when sampling from the objective is costly. The search path of the algorithm is governed by the acquisition function, which defines the agent’s search strategy. Conceptually, the acquisition function characterizes how the optimizer balances exploration and exploitation when

Scanning these abstracts with such a long line length and minimal inter-line spacing is quite challenging. Even more challenging is that authors often forget that the purpose of the abstract in arXiv is to entice you to click on the link and then read the full paper. Invariably the abstract on arXiv is the same as in the paper but to me the objectives of the pre-print abstract (read me!) and the pre-published paper itself (in case you get lost!) are different.

As breaking news on pre-print servers, at last there is significant progress from Springer Nature towards linking the published (and often somewhat different!) paper with the pre-print.

 

About Martin White
Martin White

Martin is an information scientist and the author of Making Search Work and Enterprise Search. He has been involved with optimising search applications since the mid-1970s and has worked on search projects in both Europe and North America. Since 2002 he has been a Visiting Professor at the Information School, University of Sheffield and is currently working on developing new approaches to search evaluation.

Leave a Reply

You must be logged in to post a comment.