Posted on

I published an update to our dataset of crowdsourced annotations on controversy aspects, as part of the ControCurator project.

Experimental Setup

We evaluated the controversy aspects through a crowdsourcing experiment using the CrowdFlower platform. The collected annotations from this experiment were evaluated using the CrowdTruth methodology for measuring the quality of the annotations, the annotators, and the annotated articles. The relevance of each of the aspects was collected by asking the annotators whether they applied to the main topic of a given newspaper article. For this, we used a collection of 5 048 The Guardian articles that were retrieved through the Guardian news API. In order to save cost and focus on the main topic of an article only the first two paragraphs of each article were used. In an initial pilot we used 100 articles to test the use of a five point likert-scale answers versus “yes/no/I don’t know” type answers, and additionally whether showing five comments would help annotators identify whether the topic in an article is controversial. In a second pilot we evaluated with the same dataset whether rephrasing of the aspects and adding the time-persistence would make the identification more clear.


The results of the first pilot showed that for both settings when showing the article comments the number of annotators that select “I don’t know” option is significantly smaller (p-value = 0.003). Additionally, we found that the “yes/no/I don’t know” setup always finished faster. Although this difference is not significant (p-value = 0.0519), it may indicate that annotators were more willing to perform this task. Based on this we conclude that the variant with comments and yes-no answers gave the best performance in terms of speed and annotation quality. The results of the second pilot showed the rephrasing of the questions improved the identification as the number of people that selected the “I don’t know” option dropped from 15% to 3% with p=0.0001.

In the main experiment 5048 articles were annotated by 1 659 annotators resulting in 31 888 annotations. The evaluation of the controversy aspects was a two-fold: first the Pearson correlation coefficients were measured in order to identify how strong an aspect correlated with controversy in each judgment. Second, linear regression was applied to learn the regression coefficient between all of the aspects combined and the controversy score for a judgment. This value indicates the weight of an aspect with respect to the other aspects. The emotion aspect of an article was found to be the strongest indicator for controversy using both measures, while the multitude of actors was the weakest. The openness was said to be present most in 70.9% of the annotations, was annotated with a majority in 73% of the articles, and was found to be the most clearly represented aspect.


This dataset is built and used for the following papers. Please cite them if you decide to use our work.

Benjamin Timmermans, Lora Aroyo, Tobias Kuhn, Kaspar Beelen, Evangelos Kanoulas, Bob van de Velde, Gerben van Eerten: ControCurator: Understanding Controversy Using Collective Intelligence. Collective Intelligence Conference 2017

  title={ControCurator: Human-Machine Framework For Identifying Controversy},
  author={Timmermans, Benjamin and Beelen, Kaspar and Aroyo, Lora and Kanoulas, Evangelos and Kuhn, Tobias and van de Velde, Bob and van Eerten, Gerben},
  journal={Collective Intelligence Conference},

Benjamin Timmermans, Kaspar Beelen, Lora Aroyo, Evangelos Kanoulas, Tobias Kuhn, Bob van de Velde, Gerben van Eerten: ControCurator: Human-Machine Framework For Identifying Controversy. ICT Open 2017

  title={ControCurator: Understanding Controversy Using Collective Intelligence},
  author={Timmermans, B and Aroyo, L and Kuhn, T and Beelen, K and Kanoulas, E and van de Velde, B},
  journal={ICT Open},
Posted on

On June 15-16 the Collective Intelligence conference took place at New York University. The CrowdTruth team was present with Lora Aroyo, Chris Welty and Benjamin Timmermans. Together with Anca Dumitrache and Oana Inel we published a total of six papers at the conference.


The first keynote was presented by Geoff Mulgan, CEO of NESTA. He set the context of the conference by stating that there is a problem with technological development, namely that it only takes knowledge out of society and does not put it back in. Also, he made it clear that many of the tools we see today like Google Maps are actually nothing more than companies that were bought and merged together. This combination of things is what creates the power. He also defined what the biggest trends are in collective intelligence: the observation e.g. citizen generated data on floods, predictive models e.g. fighting fires with data, memory e.g. what works centers on crime reduction, and judgement e.g. adaptive learning tool for schools. Though, there are a few issues with collective intelligence: Who pays for all of this? What skills are needed for CI? What are the design principles of CI? What are the centers of expertise? These are all not yet clear. However, what is clear is that there is a new field emerging through combining AI with CI: Intelligence Design. We used to think systems resolve this intelligence, but actually we need to steer and design it.

In a plenary session there was an interesting talk on public innovation by Thomas Kalil. He defined the value of concreteness as things that happen when particular people or organisations take some action in pursuit of a goal. These actions are more likely to affect change if you can articulate who would needs to do what. He said he would like to identify the current barriers to prediction markets and areas where governments could be a user and funder of collective intelligence. This can be achieved through connecting people that are working to solve similar problems locally, e.g. in local education. Then change can be driven realistically, by making clear who needs to do what. Though, it was noted also that people need to be willing and able for change to work.

Parallel Sessions

There were several interesting talks during the parallel sessions. Thomas Malone spoke about using contest webs to address the problem of global climate change. He claims that funding science can be both straightforward and challenging, for instance government policy does not always correctly address the need of a domain issues, and even conflicts of interest may exist. Also, fundamental research can be tough to convince the general public of its use, as it is not sexy. Digital entrepreneurship is furthermore something that is often overlooked. There are hard problems, and there are new ways of solving them. It is essential now to split the problems up into parts, solve each of them with AI, and combine them back together.

Chris Welty presented our work on Crowdsourcing Ambiguity Aware Ground Truth at Collective Intelligence 2017.

Also Mark Whiting presented his work on Daemo, a new crowdsourcing platform that has a self-governing marketplace. He stress the fact that crowdsourcing platforms are notoriously disconnected from user interests. His new platform has a user driven design, in order to get rid of the flaws that exist in for instance Amazon Mechanical Turk.

Plenary Talks

Daniel Weld from the University of Washington presented his work on argumentation support in crowdsourcing. Their work uses argumentation support in crowd tasks to allow workers to reconsider their answers based on the argumentation of others. They found this to significantly increase the annotation quality of the crowd. He also claimed that humans will always need to stay in the loop of machine intelligence, for instance to define what the crowd should work on. Through this, hybrid human-machine systems are predicted to become very powerful.

Hila Lifshitz-Assaf of NYU Stern School of Business gave an interesting talk on changing innovation processes. The process of innovation has changed from a lane inventor, to labs, to collaborative networks, and now into open innovation platforms. The main issue with this is that the best practices of innovation fail in the new environment. In standard research and development there is a clearly defined and selectively permeable, whereas with open innovation platforms this is not the case. Experts can participate from in and outside the organisation. It is like open innovation: managing undefined and constantly changing knowledge in which anyone can participate. For this to work, you have to change from being a problem solve to a solution seeker. It is a shift from thinking: The lab is my world, to the world is my lab. Still, problem formulation is key as you need to define the problems in ways that cross boundaries. The question always remains, what is really the problem?

Poster Sessions

In the poster sessions there were several interesting works presented, for instance work on real-time synchronous crowdsourcing using “human swarms” by Louis Rosenberg. Their work allows people to change their answers through the influence of the rest of the swarm of people. Another interesting poster was by Jie Ren of Fordham University, who presented a method for comparing the divergent thinking and creative performance of crowds compared to experts. We ourselves had a total of five posters covering both poster sessions, which were received well by the audience.