[Note from the Editor – this is the third of the three ECIR 2020 keynotes. Summaries of the keynotes given by Chirag Shah and Jamie Callum can be found in the Spring 2020 issue of Informer]
Joana Gonçalves de Sá is an Associate Professor at Nova School of Business and Economics, Universidade Nova de Lisboa and the leader of the Data Science and Policy research group. Her current research uses data analytics and machine learning to study complex problems at the interface between Biomedicine, Computation, Policy, Social Sciences, and Mathematics. These include epidemiology, critical thinking, network dynamics, political discourse, and their applications to human-behavior, with a large ethical and societal focus.
Joana’s keynote could not have been more topical. The title of her paper was Focusing the Macroscope – How We Use Data to Understand Behaviour. The reference to Macroscope was because she wanted to provide a perspective on how assessing information created by and used by networks could be used to understand behaviours. There has been a great deal of research in this area, especially around using Twitter as a way of identifying changes in behaviour.
Joana started out with a look at some aspects of industrial revolutions, highlighting the time it has often taken to respond to major societal changes. For example, the first loom was build in 1784 and the weaving industry in the UK made very heavy use of child labour but the issues that this created were not addressed until the 1833 Factory Act.
Moving on to considering health care Joana reviewed in detail the efforts that have been made to collate weak signals from sources such as Google search logs in order to predict the onset and scale of flu epidemics. The lessons she drew from the case of flu epidemics is that using just one or two signals is not adequate as there is a tendency to significantly under-estimate or over-estimate both the onset and the scale. The lessons learned from work on flu epidemics is that correlation is not the same as causation and that there are many different reasons why people may want to search on topics around influenza so a simple linear extrapolation can be quite misleading. A solution is to find a factor that is a good indicator of the onset of a pandemic as this at least provides a degree of early warning.
Joana then went on to discuss the wide range of search queries which needed to be taken into account, and the benefits of clustering these. A core element of these was to track the onset of anxiety about a flu pandemic. Anxiety is assessed through searches relating to how concerned people were about the likely impact of influenza and another parameter is that of a perception of the risk that a person is likely to contract the disease. One solution has been to distinguish between searches for information about flu infections and searches for news that is flu-related. Joana referred to a recent paper that used the dynamic behaviour of words in Tweets https://arxiv.org/pdf/2004.03516.pdf [download] in which the authors noted that their results suggest that surveillance of change in usage of epidemiology-related words on social media may be useful in forecasting later change in disease case numbers, but we emphasize that our current findings are not causal or necessarily predictive.
The three lessons drawn from research are
- Online information seeking can have different underlying reasons
- In this context it is possible to distinguish between different possible motivations
- This analysis has implications for disease tracking and for understanding human behaviour.
Joana concluded that there is a delicate balance that needs to be monitored and maintained between the rights of the individual against the potential benefits to society