Informer
Newsletter of the BCS Information Retrieval Specialist Group
  • Informer Home
  • Issues
    • Summer 2018
    • Autumn 2018
    • Winter 2019
    • Spring 2019
    • Summer 2019
    • Autumn 2019
    • Winter 2020
    • Spring 2020
    • Summer 2020
    • Autumn 2020
    • Winter 2021
    • Spring 2021
    • Summer 2021
    • Autumn 2021
    • Winter 2022
    • Spring 2022
  • Articles by Topic
  • Authors
  • About Informer
Browse: Home / 2015 / October / Search strategies considered harmful

Search strategies considered harmful

By Tony Russell-Rose on 29th October 2015

Over the last few months I’ve been looking in detail at the process of search strategy formulation, i.e. the various ways in which professionals go about solving the problem of resolving complex information needs.

Some professions (e.g. recruitment professionals) employ complex search queries to address sourcing needs, generating queries such as this:

(“business analyst” or “systems analyst” or “system analyst”
or “data analyst” or “requirements analyst” or “functional
analyst”) and crystal and report* and analy* and data near
analy* and not inventory and not retail and not (ecommerce
or  “e-commerce” or b2b or b2c)

This particular query is designed to retrieve candidates who match a typical client brief. As you can see, it’s essentially a complex Boolean expression, and the challenge of creating and optimising such expressions is the subject of a number of social media forums.

Other professions adopt a different approach. Healthcare professionals, particularly those that are involved in the creation of systematic (literature) reviews, tend to adopt a line by line approach such as this (the published Medline strategy for Oral protein calorie supplementation for children with chronic disease):

  1. randomized controlled trial.pt.
  2. controlled clinical trial.pt.
  3. randomized.ab.
  4. placebo.ab.
  5. clinical trials as topic.sh.
  6. randomly.ab.
  7. trial.ti.
  8. 1 or 2 or 3 or 4 or 5 or 6 or 7
  9. (animals not (humans and animals)).sh.
  10. 8 not 9
  11. exp Child/
  12. ADOLESCENT/
  13. exp infant/
  14. child hospitalized/
  15. adolescent hospitalized/
  16. (child$ or infant$ or toddler$ or adolescen$ or teenage$).tw.
  17. or/11-16
  18. Child Nutrition Sciences/
  19. exp Dietary Proteins/
  20. Dietary Supplements/
  21. Dietetics/
  22. or/18-21
  23. exp Infant, Newborn/
  24. exp Overweight/
  25. exp Eating Disorders/
  26. Athletes/
  27. exp Sports/
  28. exp Pregnancy/
  29. exp Viruses/
  30. (newborn$ or obes$ or “eating disorder$” or pregnan$ or childbirth or virus$ or influenza).tw.
  31. or/23-30
  32. 10 and 17 and 22
  33. 32 not 31

In this type of formalism, the search strategy is built up incrementally, as a set of discrete expressions which are referred to by line number and combined using various operators. This type of procedural approach has the advantage that strategies can be built up using techniques such as successive fractions, building blocks, and so on. It also allows the searcher to review the number of results returned at each step, and refine the expression accordingly.

Over the last few months I’ve got used to seeing some quite complex search strategies, often extending over a hundred lines or more. However, a few things about the formalism still strike as being a bit odd.

Firstly, the use of logical statements connected via numbered lines above does rather remind me of first generation BASIC. I’m not saying that the language didn’t have its place, but several decades on we’d like to think we now have recourse to rather more structured approaches. But more to the point, what’s happening with all those line numbers – are they really the best way to organize a collection of logical expressions? Just when we most need a principled mechanism for structuring our approach, it seems we are forced to rely on something as arbitrary as a line number. As any undergraduate computer scientist will tell you, the liberal use of such ‘goto’ statements is indeed considered harmful.

Secondly, and continuing the programming language metaphor, I wonder just how much support there is for constructing expressions that are syntactically correct and semantically transparent. A well-designed (programming) language, for example, should support concepts such as:

  • Encapsulation: the concept whereby data and functions are packed into a single component. To a degree, this is true of the line by line approach above, but it is compromised by the lack of facility for naming and invoking discrete elements of computation (other than by an arbitrary number).
  • Abstraction: the ability to generalize from a set of behaviours, e.g. the use of a template which can be populated for a given instance. In the example above we can see that lines 11 to 17 are probably intended to express the population element of the PICO process. So why not abstract this component out? That way, when we need to (re)use it, it could be instantiated on a case by case basis, e.g. male adults in strategy X, female infants in strategy Y, and so on. (OK, I know that some people equate abstraction with hiding implementation details, but I think the generalization sense is more pertinent here).

Likewise, I can imagine cases where we would want our search strategy to encompass other concepts such as inheritance, modularity, etc. So I am left thinking: why has the design of search strategies apparently changed so little when programming languages have changed so rapidly?

Of course, if you’re writing the control software for an Airbus 320 you might argue that you need tools and approaches that deal with a few orders of complexity more than your ‘average’ search strategy. But both endeavours are trying to find elegant and parsimonious ways to express complex logical constructs, both are concerned with syntactic correctness, and both need semantic transparency and pragmatic effectiveness. I wonder – is this formalism a bit like the QWERTY keyboard – a flawed and outdated design, but one that is ubiquitous by little more than convention?

About Tony Russell-Rose
Tony Russell-Rose

Tony Russell-Rose is founder of 2Dsearch (https://www.2dsearch.com), a start-up applying artificial intelligence, natural language processing and data visualisation to create the next generation of professional search tools. He is also director of UXLabs, a research and design studio specialising in complex search and information access applications. He has served as vice-chair of the BCS Information Retrieval group and chair of the CIEHF Human-Computer Interaction group. Previously Tony has led R&D teams at Canon, Reuters, Oracle, HP Labs and BT Labs. He is author of "Designing the Search Experience" (Elsevier, 2013) and publishes widely on IR, HCI and NLP.

« Previous Next »

Search

Recent comments

  • Tony Russell-Rose on And finally….from the Editor
  • New Informer – Winter 2021 | Information Interaction on ECIR 2022 and ECIR 2023 – locations confirmed
  • New Informer – Winter 2021 | Information Interaction on Strix Lecture 2020 event – 26 November 2020
  • New Informer – Winter 2021 | Information Interaction on Editorial
  • New Informer – Autumn 2020 | Information Interaction on ECIR 2021 – planning for a virtual conference

Categories

  • News and alerts
  • ECIR2023
  • Summer 2022
  • Spring 2022
  • IRSG management
  • Winter 2022
  • Autumn 2021
  • ECIR 2022
  • Summer 2021
  • ECIR 2021 Conference Supplement
  • Awards
  • Spring 2021
  • Winter 2021
  • Autumn 2020
  • Summer 2020
  • Spring 2020
  • Winter 2020
  • Autumn 2019
  • Summer 2019
  • Spring 2019
  • Winter 2019
  • Autumn 2018
  • Summer 2018
  • Spring 2018
  • Winter 2018
  • Autumn 2017
  • Summer 2017
  • Spring 2017
  • Winter 2017
  • Autumn 2016
  • Summer 2016
  • Spring 2016
  • Winter 2016
  • Autumn 2015
  • Promotion
  • Summer 2015
  • Spring 2015
  • Winter 2015
  • Autumn 2014
  • Summer 2014
  • Spring 2014
  • Winter 2014
  • Autumn 2013
  • Summer 2013
  • Org Overview
  • Spring 2013
  • Winter 2013
  • Conference Review
  • Feature Article
  • Editorial
  • Events
  • Book Review
  • Autumn 2012
  • Summer 2012
  • Spring 2012
  • Winter 2012
  • Uncategorized

Tags

awards BCS Boolean City University clustering conference conferences design ECIR editorial enterprise seach enterprise search events Faceted search facets HCIR information architecture information discovery Information Retrieval information seeking interaction design IR IR practice IRSG log analysis MSR multimedia retrieval navigation recruitment search Search Solutions search strategies sensemaking site search ss12 survey taxonomy text analytics tutorial user experience user study wayfinding web search weka workshop

Authors

  • Agnes Molnar (1)
  • Alberto Purpura (1)
  • Aldo Lipani (1)
  • Alejandra Gonzalez-Beltran (1)
  • Allan Hanbury (1)
  • Amit Kumar Jaiswal (1)
  • Andy Macfarlane (49)
  • Benjamin Kille (1)
  • Benno Stein (1)
  • Birger Larsen (1)
  • Carsten Eickhoff (1)
  • Cathal Gurrin (8)
  • Charlie Hull (2)
  • Chris Madge (1)
  • Thomas Mandl (1)
  • Claudia Hauff (1)
  • Colin Wilkie (1)
  • David Elsweiler (1)
  • David Haynes (1)
  • David Maxwell (1)
  • Deirdre Lungley (1)
  • Dennis Aumiller (2)
  • Djoerd Hiemstra (1)
  • Franco Maria Nardini (1)
  • Frank Hopfgartner (13)
  • Gabriel Tanase (1)
  • Gabriella Kazai (5)
  • Giorgio Maria Di Nunzio (1)
  • Haiming Liu (2)
  • Helen Clegg (1)
  • Helen Lippell (1)
  • Iadh Ounis (1)
  • Ingo Frommholz (9)
  • Joao Magalheis (4)
  • Jochen L. Leidner (3)
  • John Tait (7)
  • Jolanta Pietraszko (1)
  • Jon Chamberlain (4)
  • Jose Alberto Equivel (1)
  • Julie Glanville (1)
  • Kamran Abbasi (1)
  • Katherine Allen (3)
  • Kurt Kragh Sørensen (1)
  • Linda Achilles (1)
  • Luca Soldaini (1)
  • Marc Sloan (2)
  • Marco Palomino (2)
  • Marianne Sweeny (1)
  • Marina Santini (1)
  • Markus Schedl (1)
  • Martin White (133)
  • Mateusz Dubiel (1)
  • Michael Oakes (1)
  • Mike Salampasis (1)
  • Mohammad Aliannejadi (1)
  • Morgan Harvey (1)
  • Nandita Tripathi (1)
  • Natasha Chowdory (1)
  • Norbert Fuhr (1)
  • Olivia Foulds (1)
  • Parth Mehta (2)
  • Paul Cleverley (1)
  • Paul Matthews (2)
  • Pedro Ruas (1)
  • Philipp Mayr (1)
  • Roland Roller (1)
  • Roman Kern (1)
  • Ronan Cummins (2)
  • Sam Marshall (1)
  • Samuel Dodson (1)
  • Selina Meyer (1)
  • Silviu Paun (1)
  • Song Chen (1)
  • Stefan Rueger (1)
  • Stephane Goldstein (1)
  • Stephanie Segura Rodas (1)
  • Steven Zimmerman (7)
  • Thanh Vu (1)
  • Tony Russell-Rose (31)
  • Trung Huynh (1)
  • Tu Bui (1)
  • Tyler Tate (8)
  • Udo Kruschwitz (34)
  • Val Gillet (1)

Copyright © 2022 Informer.

Powered by WordPress and Hybrid.