Popular Scientific Summary of StaViCTA

The StaViCTA project focused on the analysis of stance in English written online social media, such as blog or Twitter posts. Taking stance is the expression of attitudes, judgments, doubts, trust, or certainty about a specific topic, and its analysis is crucial for application fields like crisis management, financial analytics, or business intelligence. Our research interests were to identify the language resources for such expressions in the context of social media and how they act together over time. In addition, we wanted to research the technologies that are needed to achieve these goals. Consequently, the members of StaViCTA came from different subjects—linguistics, computational linguistics, and information visualization—in order to exploit synergies and to enable human beings to make sense of large dynamic text data and allow for exploration, control and final evaluation of the analysis processes and results.

To increase the chances of finding enough stance expressions in social media, we concentrated on political blogs, for example on the Brexit. The ten stance categories chosen are AGREEMENT/DISAGREEMENT, CERTAINTY, CONTRARIETY, HYPOTHETICALITY, NECESSITY, PREDICTION, TACT/RUDENESS, SOURCE OF KNOWLEDGE, UNCERTAINTY, and VOLITION. From this, we compiled a gold standard stance corpus on which we then carried out further analyses, for instance, which of these stance categories co-occur together and which not. This corpus has been made available publicly. We then attested that our notional approach was successful in identifying stance-taking in discourse.

On the computational side, we developed so-called machine learning classifiers that are specialized on political texts and able to identify the above-mentioned categories in social media texts. We applied a machine learning technique called active learning for the automatic selection of useful training samples and for subsequent interactive querying of a person to manually provide the right classification. Here, we showed the usefulness of a number of methods, which optimize for resource-efficiency when collecting training data, implemented them, and made them freely available via SND.

Interactive visualization helps to bring all these concepts together and provides the users with a tool to effectively access the textual online data, to apply and interpret the classifiers and their results, but also to make the process of building the training data for the classifiers more efficient and analyzable. Thus, we developed a number of novel, web-based visualization approaches for investigating lexical features for stance phenomena in social media and for supporting text data annotation and classifier training by using active learning stance classification. Finally, we implemented visualization tools for specific application areas, such as digital humanities, that built on our project results.