We present our latest work on the CrowdTruth framework, titled “Human Computing for the Real World”, at the ICT Open 2017 conference on 21st and 22nd of March 2017. I made a new video that demonstrates the different aspects of the framework for dealing with ambiguity in data, crowdsourcing of human interpretations, and evaluating disagreement between annotations.
Our demo of ControCurator titled “ControCurator: Human-Machine Framework For Identifying Controversy” will be shown at ICT Open 2017. In this demo the ControCurator human-machine framework for identifying controversy in multimodal data is shown. The goal of ControCurator is to enable modern information access systems to discover and understand controversial topics and events by bringing together crowds and machines in a joint active learning workflow for the creation of adequate training data. This active learning workflow allows a user to identify and understand controversy in ongoing issues, regardless of whether there is existing knowledge on the topic.
The course is run by Lora Aroyo, Anca Dumitrache, Benjamin Timmermans and Oana Inel from the VU, and Robert-Jan Sips and Zoltan Szlavik from IBM. In the course the students were challenged by Amsterdam Marketing to solve the issue of the increasing overcrowdedness of tourists in the city center of Amsterdam. The city is culturally rich with many places to visit, yet most visitors cluster around a limited set of popular locations. The students came up with ideas to motivate visitors to spread in the city and provide them with relevant information for their visit.
Brainstem tumors are a rare form of childhood cancer for which there is currently no cure. The Semmy Foundation aims to increase the survival of children with this type of cancer by supporting scientific research. The Center for Advanced Studies at IBM Netherlands is supporting this research by developing a cognitive system that allows doctors and researchers to quicker analyse MRI-scans and better detect anomalies in the brainstem.
In order to gather training data, a crowdsourcing event was held at the festival Lowlands, which is a 3-day music festival that took place from 19-21 August 2016 and welcomed 55k visitors. At the science fair, IBM had a booth that hosted both this research and showcase of the Weather stations of the Tahmo project with TU Delft.
In the crowdsourcing task, the participants were asked to draw the shape of the brainstem and tumor in an MRI scan. Gathering data on whether a particular layer of a scan contains the brainstem and determining its size should allow a classifier to recognize the tumors. Furthermore, the annotator quality can be measured with the CrowdTruth methodology by analysing the precision of the edges that were drawn in relation to their alcohol and drug use that we collected. The hypothesis is that people under influence can still make valuable contributions, but that these are of lower quality than sober people. This may make the reliability of online crowd workers more clear, because it is unknown under what conditions they make their annotations.
The initial results in the heatmap of drawn pixels give an indication of the overall location of the brainstem, but further analysis will follow on the individual scans in order to measure the worker quality and generating 3d models.
From 2 to 16th of July we organized the Big Data in Society Summerschool at the Vrije Universiteit Amsterdam. As part of our Collaborative Innovation Center with IBM, we presented an introduction of the technical and theoretical underpinnings of IBM Watson and discussed the use of big data and implications for society. We looked at examples of how the original Watson system can be adapted to new domains and tasks, and presented the CrowdTruth approach for gathering training and evaluation data in this context. The participating students, which ranged from bachelor to PhD level, said they learned a lot from the lectures and found the practical hands-on sessions very useful.
An important aspect of the semantic web is that systems have an understanding of the content and context of text, images, sounds and videos. Although research in these fields has progressed over the last years, there is still a semantic gap between data available of multimedia and metadata annotated by humans describing the content. This research investigates how the complete interpretation space of humans about the content and context of this data can be captured. The methodology consists of using open-ended crowdsourcing tasks that optimize the capturing of multiple interpretations combined with disagreement based metrics for evaluation of the results. These descriptions can be used meaningfully to improve information retrieval and recommendation of multimedia, to train and evaluate machine learning components and the training and assessment of experts.
On the 22nd of March we presented our latest work on CrowdTruth at the ICT.OPEN 2016 conference. We are happy to announce that our poster received the best poster award in the Human and the Machine track. Furthermore, Anca Dumitrache gave a presentation and pitched our poster which resulted in the 2nd prize for best poster of the conference. It is a good signal that from the almost 200 posters the importance of the CrowdTruth initiative was recognized.
Today we released version 2.0 of the CrowdTruth framework. In the update the data model of the platform is changed, so that data and crowdsourcing results can be managed and reused more easily. This allows for several new features that have been integrated, such as project management and permissions. Users can create projects and share their crowdsourcing jobs within these projects. The media search page has been updated to accommodate any type of data, where you can search through the media in the platform. Another improvement to the platform is the automatic setup of new installations. This makes it easier for new users to get started straight away. You can find a list of the changes in the change log. Try out the platform and get started!
Recently the CrowdTruth team got a paper accepted at ICT Open 2016. As part of this upcoming conference, I visited a masterclass on scientific poster design at NWO. The class was given by two professional designers.
The most important thing in your poster is having a clear message. This can be achieved by creating a visual focus. This means that you should not give all images the same size, but guide the reader visually with placement and size of text and images. You have to be able to read the main message from far away and can include the fine details smaller for when the reader is up close. In order to achieve this, there should only be one main focus point to start from.
After having a starting point, there should be a clear hierarchy throughout the poster. The amount of levels of information should be reduces as much as possible, for instance four or five maximum. Most of the content from your paper is not suitable for the poster, only use the most suitable parts, and optionally include more text with details using a small font size at the bottom. Organize the message systematically by using a grid so that all elements are aligned along this grid.
The typography is another very important but also often forgotten aspect of poster design. Choose one proper typography that is well readable and has enough options to variate in size and style. Though, try to minimize the differences in font size, matching the hierarchy structure of the content. Write easy to read sentences but make sure the lines are not too short or long to improve the readability.
The colors of the poster are also an important aspect. Do not use a picture or image with different colors behind a text, it usually makes it too difficult to read. Applying a drop shadow to solve this is not a good solution. Try to never use shadows. Instead, focus on having a high contrast between the text and background color.
For using images and graphics, apply the same rules as for text color. Choose the most important image and decide if it communicates with your audience. It is better to choose one powerful image than a lot of random images. The chronological order of the poster can be changed by positioning the main thing in an unusual position, but then this focus point and the continuing hierarchy must be very clear.
Finally, it is best with scientific posters to just put all logos in a clear line at the bottom in a color bar. They could also be placed vertically, although this is less common and tends to take up more space. When in doubt, just put something big in the poster to get the attention of the audience. Make the poster stand out from the 200 other ones in the same room.