Crowdsourcing, verification and ‘alpha users’

On Thursday 3rd March, the Media Standards Trust, BBC College of Journalism and science training for journalists programme at the Royal Statistical Society hosted an afternoon of workshops on data and news sourcing. One of the first of these was on ‘crowdsourcing and other innovations in news sourcing’.

The opening session of a day of workshops imparted much wisdom on crowdsourcing – and if you’d like to utilise the wisdom of crowds, Joanna Geary (The Times) and Joel Gunter ( have more.

Paul Lewis, special projects editor at The Guardian, talked about using twitter in finding out how two people had died: Ian Tomlinson at the G20 protests, and Jimmy Mubenga on BA Flight 77 to Angola. Lewis had only signed up to twitter two days before the G20 protests. Twenty people came to The Guardian to contradict the police’s version of events – only one of them via ‘conventional means’ – with video footage eventually being uncovered. Those coming forward via twitter could be plotted on a map by time and location. On the Mubenga death, Lewis had sent out a tweet asking if anyone knew what had happened. The tweet made it to a passenger who had been sitting three seats away, now working on an Angolan oilfield, who provided testimony of what had occurred. Lewis identified two problems with crowdsourcing: you can’t control where the crowd is going, so you should try to identify the information you want to find; and people often rely on snippets from individuals which can’t be corroborated and can be wrong (such as the tweets incorrectly announcing the death of Gabrielle Giffords). Without safeguards, we are susceptible to being misled.

Paul Bradshaw, visiting professor at City University, talked about the Help Me Investigate website which he had founded. Success rates on stories were higher than expected, with successful investigations including the overspend by Birmingham City Council on their website, and claims by the publishers of The London Weekly which turned out to be false. Bradshaw identified five qualities common to successful investigations :

1.    Alpha users – an extremely active user who can drive the investigation
2.    Momentum – feedback loops and visible results
3.    Modularisation – breaking the story down into smaller, more manageable parts
4.    Publicness – making the story visible and making the public aware
5.    A certain amount of expertise – brought by the diversity of the group involved.

Bradshaw distinguished between two types of crowdsourcing: mechanical (as in The Guardian’s MPs’ expenses coverage) and the Wisdom of Crowds (the diversity of a group leading to positive results). He also mentioned some of the problems – legal issues meant the whole of Help Me Investigate couldn’t be made public, the need for editorial drive – but also some opportunities: now that the code for the site had been released, there had been interest in some global versions.

Turi Munthe, CEO and founder of Demotix, agreed with much of what the previous speakers had said. He liked the idea of ‘alpha users’, a better name for what he called ‘nodes’. Either way, the idea showed that crowds function according to pyramidal hierarchical structures, and avoided any distinction between professional journalists and other human beings on the ground. He used the recent Egyptian uprising as an example of the node structure: Google executive Wael Ghonim’s twitter account had gone quiet (he was incarcerated for 12 days by the authorities) before a tweet on the 7th February. While this was retweeted thousands of times, the more cautious Andy Carvin at NPR was interested in verifying the tweet and was able to find a contact in the Middle East, who could confirm away from twitter that Ghonim was free.

Verification was a continual issue, with Demotix stronger on the ground in some areas than others. In the Iran protests of 2009, most information was coming from twitter (much of it recycled) from a particular, though critical, demographic, which led to a particular narrative developing in the Western media. Asked by chair George Brock what Demotix would do differently if the protests happened again, Munthe said that for Wikipedia to rival the Encyclopaedia Britannica, it needed the whole demographic involved – it was the same for Demotix, needing real stories to represent a ‘360 view’. To know what was going on we would need twitter, Facebook, and reporters on the ground in (e.g.) south Tehran. Egypt was different, since there was a greater demographic depth given previous protests (e.g. on flickr in 2006). It was fascinating to see how the crowd had switched from email to twitter. Again, networks were key – an old friend (previously a wire journalist) in Paris had heard from people saying that while Tahrir Square was safe, the entrances (off-camera) were not, which led to a shift in focus.

Questions from the floor led to some disagreement about whether certain issues (such as disability) which were ignored by the mainstream media would receive more or less coverage because of crowd-sourcing. Paul Lewis thought that crowdsourced journalism could ignore swathes of stories – if the media were seldom interested in certain topics, the crowd might never be. For Lewis, a central tenet of journalism was that it should be story-led, not methodology-led – certain types of stories (e.g. those with secret documents) didn’t lend themselves to crowdsourcing. He had a ‘passive but willing to become involved’ approach to crowdsourcing: journalists shouldn’t start out by asking what the method might lead us to, but by asking what is important; obsession with innovation might lead to a loss of passion about what really mattered. Turi Munthe disagreed: the web had finally allowed people, communities, to come together. The MSM might not be interested, but these groups now had the power to make stories. Responding to a question about vocal minorities, Munthe noted that the impact wasn’t just on the media, but also on politics: while there were lots of people tweeting, blogging and protesting, they weren’t players from a political perspective – it was still difficult for a mass movement to emerge within a relatively flat political structure.

Discussion then focused on verification. Turi Munthe said Demotix tended to rely on the top of the pyramid – it was all about networks, and Demotix were lucky to know a lot of Egyptian bloggers. Paul Lewis thought people wanting to disseminate false information would become sophisticated and embed themselves in networks. With the Mubenga source, he spoke on the phone to the source, to others on the flight and received (via email) a scan of the source’s boarding pass. Other organisations might not have taken the time to check. Instantaneity meant some old rules had been forgotten. In the old media world, facts had been black and white – now there was lots of grey matter.

Mark Henderson of The Times asked if crowdsourcing should be treated as a starting point, a preliminary before digging? Lewis answered that it depends on the story you’re dealing with, and the nature of the crowd. On other occasions, just being on there could take you to places you wouldn’t otherwise get to. Turi Munthe added that an important spin-off from interacting with crowds was a new form of journalism – the live-blog. Trust and authenticity are critical to its success.

Bella Hurrell, editor of the specials team at the BBC News website, then gave a presentation on her team’s experience of crowdsourcing. In her view, there were two kinds: the ‘tell us what’s happening where you are’ type, and the ‘we’ve got lots of information and could do with your help’ type. There was little passion for the former, according to Hurrell, but the latter was more successful. For MPs’ expenses, the BBC created a page for each MP and asked readers to email if they saw something, producing some really intelligent questions about the data. When dealing with User Generated Content, verification was clearly a big problem. Hurrell wasn’t sure if the BBC would consider UGC ‘crowdsourcing’ as such.

After some discussion on The Guardian’s MPs’ expenses project, and the impact of the story at national and local level, Turi Munthe ended on an optimistic note – while crowdsourcing had started as a gimmick, news organisations were now genuinely using the crowd in their journalism.