Patterns in Media Content

Big Data Analysis of News and Social Media Content

Ilias Flaounas, Saatviga Sudhahar, Thomas Lansdall-Welfare, Elena Hensiger, Nello Cristianini (*)

Intelligent Systems Laboratory, University of Bristol

(*) corresponding author

The analysis of media content has been central in social sciences, due to the key role that media plays in shaping public opinion. This kind of analysis typically relies on the preliminary coding of the text being examined, a step that involves reading and annotating it, and that limits the sizes of the corpora that can be analysed. The use of modern technologies from Artificial Intelligence allows researchers to automate the process of applying different codes in the same text. Computational technologies also enable the automation of data collection, preparation, management and visualisation. This provides opportunities for performing massive scale investigations, real time monitoring, and system-level modelling of the global media system. The present article reviews the work performed by the Intelligent Systems Laboratory in Bristol University towards this direction. We describe how the analysis of Twitter content can reveal mood changes in entire populations, how the political relations among US leaders can be extracted from large corpora, how we can determine what news people really want to read, how gender-bias and writing-style in articles change among different outlets, and what EU news outlets can tell us about cultural similarities in Europe. Most importantly, this survey aims to demonstrate some of the steps that can be automated, allowing researchers to access macroscopic patterns that would be otherwise out of reach.

Download the survey article with pointers to the peer reviewed publications

On the Current Paradigm in Artificial Intelligence

Nello Cristianini – (Draft of article prepared for AIComm issue on History of AI)

The field of Artificial Intelligence (AI) has undergone many transformations, most recently the emergence of data-driven approaches centred on machine learning technology. The present article examines that paradigm shift by using the conceptual tools developed by Thomas Kuhn, and by analysing the contents of the longest running conference series in the field. A paradigm shift occurs when a new set of assumptions and values replaces the previous one within a given scientific community. These are often conveyed implicitly, by the choice of success stories that exemplify and define what a given field of research is about, demonstrating what kind of questions and answers are appropriate. The replacement of these exemplar stories corresponds to a shift in goals, methods, and expectations. We discuss the most recent such transition in the field of Artificial Intelligence, as well as commenting on some earlier ones.

Using Twitter to Monitor Collective Mood

Large scale analysis of social media content allows for real time discovery of macro-scale patterns in public opinion and sentiment. In this paper we analyse a collection of 484 million tweets generated by more than 9.8 million users from the United Kingdom over the past 31 months, a period marked by economic downturn and some social tensions. Our findings, besides corroborating our choice of method for the detection of public mood, also present intriguing patterns that can be explained in terms of events and social changes. On the one hand, the time series we obtain show that periodic events such as Christmas and Halloween evoke similar mood patterns every year. On the other hand, we see that a significant increase in negative mood indicators coincide with the announcement of the cuts to public spending by the government, and that this effect is still lasting. We also detect events such as the riots of summer 2011, as well as a possible calming effect coinciding with the run up to the royal wedding.


REFERENCE: Thomas Lansdall-Welfare, Vasileios Lampos and Nello Cristianini: Effects of the Recession on Public Mood in the UK. Accepted for publication in the International Workshop on Social Media Applications in News and Entertainment (SMANE), 2012.

Monitoring Social Media to Detect Possible Hazards

— —
Note that an improved version of this article has been published in Natural Hazards Observer, Volume XXXVI, Number 4, pp. 7-9, March 2012.
— —

Vasileios Lampos and Nello Cristianini
Intelligent Systems Laboratory
University of Bristol

Abstract. Real time monitoring of environmental and social conditions is an important part of developing early warning of natural hazards such as epidemics and floods. Rather than relying on dedicated infrastructure, such as sensor networks, it is possible to gather valuable information by monitoring public communications from people on the ground. A rich source of raw data is provided by social media, such as Blogs, Twitter or Facebook. In this study we describe two experiments based on the use of Twitter content in the UK, showing that it is possible to detect a flu epidemic, and to assess the levels of rainfall, by analysing text data. These measurements can in turn be used as inputs of more complex systems, for example for the prediction of floods, or disease propagation.


The fast expansion of the social web that is currently under way means that large numbers of people can publish their thoughts at no cost. Current estimates put the number of Facebook users at 800 million and of Twitter active users at 100 million [1, 2]. The result is a massive stream of digital text that has attracted the attention of marketers [3], politicians [4] and social scientists [5]. By analysing the stream of communications in an unmediated way, without relying on questionnaires or interviews, many scientists are having direct access to people’s opinions and observations for the first time. Perhaps equally important they have access – although indirectly – to situations on the ground that affect the web users, such as for example extreme weather conditions, as long as these are mentioned in the messages being published.

The analysis of social media content is a statistical game, as there is no guarantee that a specific user will describe the weather state in her current location when we need it. But by gathering a large amount of messages from a given location, and by monitoring the right keywords and expressions, it is possible to obtain indirect statistical evidence in favour of a given weather state. In this article we describe two experiments that we have conducted by using Twitter content in the United Kingdom, showing that it can be used to infer the levels of rainfall or of influenza-like-illness (ILI) in a given location, with significant accuracy. The enabling technology behind this study is Statistical Learning Theory, a branch of Artificial Intelligence concerned with the automatic detection of statistical patterns in data.

The use of Twitter data is particularly convenient because its users can only exchange very short messages that are often geo-located, and because this data is freely available via an API [6]. Furthermore the use of this data does not raise the serious privacy concerns that would be raised by the analysis – say – of email or SMS messages, as this is all data that the users have willingly made public.

We believe that the kind of signal that we can extract from that textual stream can be of interest in its own right, and be a valuable input to more complex modelling software, aimed at the prediction of epidemics or floods, as well as other hazards.

What is Intelligence? Modelling And Designing Cognitive Behaviour

The lecture is available at:

While the question in the title has remained unanswered for thousands of years, it is perhaps easier to address the apparently similar question: “What is intelligence for?” We take a pragmatic approach to intelligent behavior, and we examine systems that can pursue goals in their environment, using information gathered from it in order to make useful decisions, autonomously and robustly. We review the fundamental aspects of their behavior, methods to model it and architectures to realize it. The discussion will cover both natural and artificial systems, ranging from single cells to software agents.