Today I gave a talk in our Weekly AI meeting on the topic of ControCurator. This is a project that I am currently working on, which has the goal to enable the discovery and understanding of controversial issues and events by combining human-machine active learning workflows.
In the talk I explained the issue of defining the space of a controversy, and how this relates to for instance wicked problems. You can see the slides below.
An important aspect of the semantic web is that systems have an understanding of the content and context of text, images, sounds and videos. Although research in these fields has progressed over the last years, there is still a semantic gap between data available of multimedia and metadata annotated by humans describing the content. This research investigates how the complete interpretation space of humans about the content and context of this data can be captured. The methodology consists of using open-ended crowdsourcing tasks that optimize the capturing of multiple interpretations combined with disagreement based metrics for evaluation of the results. These descriptions can be used meaningfully to improve information retrieval and recommendation of multimedia, to train and evaluate machine learning components and the training and assessment of experts.
As part of the Watson Innovation course at the Vrije Universiteit Amsterdam, I presented a lecture on crowdsourcing ground truth data through human computation. In the lecture I explained the need of cognitive systems for large amounts of annotated data, and how the wisdom of the crowd should be used to gather this data with the CrowdTruth methodology.
Today Lora Aroyo presented the first lecture of the Watson Innovation course at the Vrije Universiteit. The topic of the lecture was Cognitive Computing, IBM Watson and looking inside the mind of Watson. There was a high attendance of motivated bachelor and master students with various backgrounds, such as artificial intelligence, computer science, business administration, business analytics and information sciences. We are looking forward to see them develop their ideas with Watson.
On Thursday 9th of October was the Netherlands eScience symposium in the Amsterdam Arena. This yearly event attracts scientists and researchers from many different disciplines. In the digital humanities track, Oana Inel of the CrowdTruth team gave a talk on the Dive+ project. This is a digital cultural heritage project in which innovative access to online collections is provided, with the purpose of supporting digital humanities scholars and online exploration for the general public. This project is supported by the Netherlands eScience center, and uses CrowdTruth for the crowdsourcing of events in historical data. The talk titled “Towards New Cultural Commons with DIVE+” can be seen below.
On Friday 11th of September I pitched the medical relation extraction work of my CrowdTruth colleague Anca Dumitrache at the third Amsterdam Data Science: Coffee and Data event. The purpose of this was to get in touch with researchers that have medical datasets that are for instance incomplete or contain errors. With our research, we want to investigate how we can improve the quality of this data. Several other interesting presentations on data science in the medical domain were given at this event, which was hosted on the top floor of the VU University Amsterdam. Together with Merel van Empel, we also presented our latest work on gamification of crowdsourcing for advancing biology using BioCrowd. Feel fee to try out the game and provide us with feedback.
On Monday 31st of August I presented the preliminary results of my work on sound representations during the weekly Artificial Intelligence meeting at the VU University Amsterdam. In this collaboration with Emiel van Miltenburg, a sound corpus is built with annotations on how people perceive these sounds. Sounds can often be interpreted in multiple ways, but tags in sound corpora do not directly relate to the acoustic features of sounds. Because of this limited representation of what can be heard in a sound, the ranking of search results is not optimal. In this research, we use crowdsourcing to build an annotated corpus of sounds from freesound.org with meaningful representations that are perceptually grounded. The presented slides can be seen below or on slideshare.