MST Leveson Coverage Analysis – Data and Q&A

Newspaper Coverage of the Leveson Inquiry – the Raw Data

The three excel spreadsheets attached, and the corresponding links to versions in Google Docs, contain a dataset we have collected over the last few months on the coverage of the Leveson Inquiry by the national daily and Sunday newspapers between July 2011 and November 2012.

We are publishing this raw data, and the methodology behind its collection, before publishing our report and analysis, for three reasons:

1. To identify mistakes in the data

2. To answer questions about the methodology

3. To enable other people to do analyses using the raw data

This is also in line with our commitment to open data.

Below, we have published a Q&A that explains how we have conducted our data analysis, and the way in which we are evaluating the data.

Over the next week or two, once we have taken account of responses to the raw data and methodology, we will also be publishing:

  • an interactive timeline on press coverage of the Leveson Inquiry
  • analysis of the data;
  • a commentary on our data;
  • a report on the coverage

Access the raw data here:


Google Docs:

MST Leveson Coverage Full Dataset

MST Leveson Coverage List of Stories by Date

MST Leveson Coverage Wordcount by Date


Excel Versions:

MST Leveson Coverage Wordcount by Date

MST Leveson Coverage Stories by Date

Leveson Coverage Full Dataset


If you have any problems accessing the data, or would like to contact us about the data or methodology, please email Gordon Neil Ramsay at or call 020 7727 5252



Q. What is this data?

A: This is a database of all the articles published on phone hacking and the Leveson Inquiry from July 14th 2011 until the eve of the publication of the Leveson Report on November 28th 2012. It is split into 3 datasets.


Q: What else are you publishing?

A: There are several parts of this project, and we will be publishing different parts at several stages over the next week or two. We’re beginning by putting the raw data up, with a question and answer about how we collected the data and the methodology used. Over the next week or two we will also be publishing: an interactive timeline on press coverage of the Leveson Inquiry; a commentary on our data; and – once we have incorporated responses to our data and methodology from users – a full report.


Q: Why have you collected and published this data?

A: The Leveson Inquiry was the first review of the British press played out almost entirely in the public eye. The public nature of the hearings, streamed on the Inquiry website and transcribed for the public to access, has shone a light on the operation of the British press and its relations with politicians and the police.

This project builds on this openness, by collecting and making the data on the coverage of the Inquiry and the run-up to the report public. By publishing the raw data we aim to give people access to do their own analyses, to come to their own conclusions, and to highlight any errors if they find them.

We will be publishing our own analysis of this data, but we hope that this data will generate other analyses, and provide a foundation for future research.

The Leveson Inquiry provided a unique opportunity to evaluate press coverage of a significant ongoing event and area of public policy in which the interests of the press itself, on whom we relied for coverage, were linked to possible outcomes. The strength of a plural press is its ability to provide multiple viewpoints and explanations on any area of public life, whether specific policy areas, or moral or cultural issues – a reflection of the free circulation of ideas that underpins our democracy. In addition, the public benefits from a press that can and does provide this multiplicity of viewpoints.

In the case of the Leveson Inquiry, however, since there was always a realistic chance that the outcome of the Inquiry would lead to substantial changes to the way newspapers would be regulated, there is significant value in recording the extent to which a variety of viewpoints was expressed in newspaper coverage.


Q: How did you select the stories?

A: We tried to gather all stories relevant to the Leveson Inquiry published by every national daily and Sunday newspaper, from the announcement of the Inquiry in July 2011, up to the publication of the Report in November 2012. This meant coming up with two benchmarks: first, how to specify which stories were “relevant”; and second, how to gather all of those stories for study.

For the first, we used keyword searches in the Factiva database (the same database used by British Library Newspapers), selecting any stories that contained one or more or the terms “Leveson”, “Inquiry”, or “Hacking”. This cast a very wide net, and although it is possible that this technique did not gather every single Leveson Inquiry-related story, we are not sure how such a story could not include at least one of these terms.

Stories were then excluded if they did not meet any of three criteria:

  • The story was specifically about the Leveson Inquiry, press regulation in relation to the Inquiry, or speculation about the Inquiry or its outcomes
  • The story made multiple references to the Inquiry throughout
  • The story contained an evaluative judgement about the Inquiry, either by the author or by a source solicited to comment on the story

We believe that this was a sufficient and realistic approach to gathering all Leveson-related coverage in the national press.


Q: How and why was positive and negative coverage of the Leveson Inquiry measured?

A: As well as showing how much coverage there was of the Leveson Inquiry, the project looked at how the balance of positive and negative coverage was spread across all newspapers. Given the size of the dataset, this was done using quantitative methods, measuring the occurrence of pro- and anti-Leveson viewpoints in newspaper coverage.

The purpose of this is to establish the extent to which Leveson was represented in the British press in a positive or negative frame. To do this, we recorded the presence of certain types of statement related to the Inquiry, which can be divided into ‘positive’ or ‘negative’ categories, which allowed us to measure two things:

  • The number of instances of the different evaluative statements across all stories
  • The ‘tone’ of each story, based on whether it contained only ‘positive’ statements, only ‘negative’ statements, both types of statements, or none (neutral).

The statements – or ‘frames’ as they are referred to in the report – were devised following a pilot study, and as the following question shows, were tested by other researchers to ensure that they were valid for this project.


Q: How reliable are the methods and the results?

A: The methods employed for the project were used for three reasons:

  • They are practical, given the size of the dataset and the time involved
  • They are straightforward to use and to understand, and can be used to generate coherent and robust results (they do not make claims based on probabilities)
  • They are based on methods used in previous quantitative studies of media content.

To ensure that conclusions are reliable (and that the study could be replicated in other circumstances), we worked with other trained researchers to evaluate certain variables. The level of agreement was sufficiently high in all cases to justify using them in the study.


Q: What if you have made any mistakes?

A: We took great care to eliminate mistakes, but the size of the dataset (2,016 stories x 27 variables = 54,432 individual pieces of data) means that it is possible there may be some input errors. To help protect against this, the measures related to “relevance to the inquiry” and the positive/negative framing variables were given special attention when input, given their significance for the project.

Beyond errors in the dataset, there may also be disagreement among users of this research concerning whether stories should have been included or stories that may have been missed by the gathering techniques. By making the data – and the methods by which the data was evaluated available to the public, we are inviting investigation of any stories that have been missed,  or which may invite disagreement on how they have been coded. While we are confident in the methods we have employed, and while inter-coder testing has confirmed to us that they are reliable, we are also ready to amend any errors or correct any articles that have been mis-coded.

The summaries included in this web resource and in the attached documents containing data are therefore subject to revision, prior to publication of a comprehensive report.


Q: What next?

A: The methods employed in this study were suited to a situation in which the outcome of the Leveson Inquiry remained in doubt; once the report was published on 29th November 2012, the nature of the discussions shifted to the concrete recommendations made in the report rather than speculation of the likely outcomes of the Inquiry, or coverage of the Inquiry itself, the public component of which had ended several weeks previously. The methodology for assessing coverage post-publication of the report should therefore be slightly different.

Yet the nature of coverage of press regulation post-Leveson is just as relevant for the British public. It is envisaged that the present project will be the first part of a continuing analysis of the coverage of events surrounding the Inquiry and its outcomes. Some excellent work on newspaper stories following the release of the Leveson report has been conducted by the Media Policy Project at the London School of Economics (including here and here), and we believe that it is important that analysis of this type continues to illuminate the coverage of a significant public policy area.