Talks and presentations

Creating an army of hacker-scholars

July 21, 2022

International conference, peer-reviewed, Digital Humanities Annual Conference, Tokyo, Japan

International similarities and differences in what students struggle with on programming courses aimed at non-STEM students.

SELF & FEIL Emotion Lexicons for Finnish

March 15, 2022

International conference, peer-reviewed, Digital Humanities in the Nordic and Baltic Countries 2022, Uppsala, Sweden

I introduce a Sentiment and Emotion Lexicon for Finnish (SELF) and a Finnish Emotion Intensity Lexicon (FEIL). Sentiment analysis and emotion detection require annotated data regardless of the chosen approach, but most existing resources are for the English language. To overcome this, the SELF and FEIL lexicons use projected annotations from existing resources with carefully edited translations and domain adaptations. In this paper the creation process and translation issues are explained in detail to allow others to create similar lexicons for other languages. The usefulness of SELF and FEIL are demonstrated via several interdisciplinary affect-related projects. To our best knowledge, this is the first comprehensive sentiment and emotion lexicon for Finnish.

The Validity of Lexicon-based Emotion Analysis in Interdisciplinary Research

December 20, 2021

International conference, peer-reviewed, NLP4DH@ICON�'21 , Zoom

Lexicon-based sentiment and emotion analysis methods are widely used particularly in applied Natural Language Processing (NLP) projects in fields such as computational social science and digital humanities. These lexicon-based methods have often been criticized for their lack of validation and accuracy – sometimes fairly. However, in this paper, we argue that lexicon-based methods work well particularly when moving up in granularity and show how useful lexicon-based methods can be for projects where neither qualitative analysis nor a machine learning-based approach is possible. Indeed, we argue that the measure of a lexicon's accuracy should be grounded in its usefulness.

Japanese Beauty Marketing on Social Media: Critical Discourse Analysis Meets NLP

December 19, 2021

International conference, peer-reviewed, NLP4DH@ICON�'21 , Zoom

This project is a pilot study intending to combine traditional corpus linguistics, Natural Language Processing, critical discourse analysis, and digital humanities to gain an up-to-date understanding of how beauty is being marketed on social media, specifically Instagram, to followers. We use topic modeling combined with critical discourse analysis and NLP tools for insights into the ``Japanese Beauty Myth" and show an overview of the dataset that we make publicly available.

Breaking into Academia

October 01, 2021

Invited panelist, at Future Digileaders, , Stockholm, Sweden

For this panel, I together with two more senior academics discussed the dos and don'ts of breaking into academia as a career.

Coding for Digital Humanities

September 01, 2021

Invited panelist, EADH conference, Kraznoyarsk, Russia

For this panel, I together with the other winners of the EADH small grants 2020 presented our projects.

Concepts of Beauty on Japanese Social Media.

July 21, 2021

National conference, peer-reviewed, Japanese Association for Digital Humanities, Tokyo, Japan

How is beauty discussed on social media and what kind of images are used to convey beauty to consumers?

XED: Multilabel Emotion Dataset

December 10, 2020

International conference, peer-reviewed, COLING, Barcelone, Spain

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik's core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

Emotion Annotation

October 01, 2020

International conference, peer-reviewed, Digital Humanities in the Nordic Countries, Riga, Latvia

With the prevalence of machine learning in natural language processing and other fields, an increasing number of crowd-sourced data sets are created and published. However, very little has been written about the annotation process from the point of view of the annotators. This pilot study aims to help fill the gap and provide insights into how to maximize the quality of the annotation output of crowd-sourced annotations with a focus on fine-grained sentence-level sentiment and emotion annotation from the annotators point of view.

Biased Algorithms

February 01, 2019

International conference, invited speaker, AI in Education, Helsinki, Finland

How do biased data affect algorithms in education?

Computational Bias

October 10, 2018

International conference, invited speaker, Stanford University: AI in Education, October , Stanford, CA, USA

Mitigating and acknowledging biased data and how to ethically deal with biases in data.

Creating a Dataset for Multilingual Fine-grained Emotion-detection Using Gamification-based Annotation

October 01, 2018

International conference, peer-reviewed, WASSA: EMNLP 2018, October , Brussels, Belgium

This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection. We present a flexible tool, \textit{Sentimentator}, that can be used for efficient annotation based on crowd sourcing and a self-perpetuating gold standard. We also present a novel dataset with multi-dimensional annotations of emotions and sentiments in movie subtitles that enables research on sentiment preservation across languages and the creation of robust multilingual emotion detection tools. The tools and datasets are public and open-source and can easily be extended and applied for various purposes.

Sentimentator: A Sentiment and Emotion Annotation Platform

March 20, 2018

International conference, peer-reviewed, Digital Humanities in the Nordic Countries , Helsinki, Finland

We introduce Sentimentator; a publicly available gamified web-based annotation platform for fine-grained sentiment annotation at the sentence-level. Sentimentator is unique in that it moves beyond binary classification. We use a ten-dimensional model which allows for the annotation of 51 unique sentiments and emotions. The platform is gamified with a complex scoring system designed to reward users for high quality annotations. Sentimentator introduces several unique features that have previously not been available, or at best very limited, for sentiment annotation. In particular, it provides streamlined multi-dimensional annotation optimized for sentence-level annotation of movie subtitles. Because the platform is publicly available it will benefit anyone and everyone interested in fine-grained sentiment analysis and emotion detection, as well as annotation of other datasets.

Lexicon-based Sentiment Analysis

February 01, 2018

International conference, Building and Using Language Technology (BAULT) , Helsinki, Finland

Using lexicons as a quick and dirty tool to analyze emotions in multilingual data

The Challenges of Multi-dimensional Sentiment Analysis Across Languages

December 10, 2016

International conference, peer-reviewed, PEOPLES: CoLing, Osaka, Japan

We outline a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in social media channels and a multilingual emotion lexicon for fine-grained sentiment analyses. Parallel data sets make it possible to study the preservation of sentiments and emotions in translation and our assessment reveals that the lexical approach shows great inter-language agreement. However, our manual evaluation also suggests that the use of purely lexical methods is limited and further studies are necessary to pinpoint the cross-lingual differences and to develop better sentiment classifiers.