Posts by Collection

publications

European intercultural workplace: Sverige

Published in G oteborgs Universitet, 2007

Use Google Scholar for full citation

Recommended citation: Jens Allwood, Natalia Lindström, Margreth Börjesson, Charlotte Edeb{\ a}ck, Randi Myhre, Kaarlo Voionmaa, Emily Öhman, European intercultural workplace: Sverige. G"oteborgs Universitet, 2007.

Recent Changes in Indefinite Pronouns with Human Reference: A diachronic corpus study of 200 years of-one/-body and-man indefinite pronoun variation in Late Modern and Present-day English

Published in Linnaeus University, 2015

Use Google Scholar for full citation

Recommended citation: Emily Öhman, Recent Changes in Indefinite Pronouns with Human Reference: A diachronic corpus study of 200 years of-one/-body and-man indefinite pronoun variation in Late Modern and Present-day English. Linnaeus University, 2015.

Language Change Database: A new online resource

Published in ICAME journal, 2016

Use Google Scholar for full citation

Recommended citation: Terttu Nevalainen, Turo Vartiainen, Tanja S{\ a}ily, Joonas Kes{\ a}niemi, Agata Dominowska, Emily Öhman, Language Change Database: A new online resource. ICAME journal, 2016.

The challenges of multi-dimensional sentiment analysis across languages

Published in In the proceedings of Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES) at COLING 2016, 2016

Use Google Scholar for full citation

Recommended citation: Emily Öhman, Timo Honkela, Jörg Tiedemann. The challenges of multi-dimensional sentiment analysis across languages. In the proceedings of Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES) at COLING 2016, 2016.

Sentimentator: Gamifying Fine-grained Sentiment Annotation

Published in In the proceedings of Digital Humanities in the Nordic Countries 2018, 2018

Use Google Scholar for full citation

Recommended citation: Emily Öhman, Kaisla Kajava. Sentimentator: Gamifying Fine-grained Sentiment Annotation. In the proceedings of Digital Humanities in the Nordic Countries 2018, 2018.

Creating a dataset for multilingual fine-grained emotion-detection using gamification-based annotation

Published in In the proceedings of Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis at EMNLP 2018, 2018

Use Google Scholar for full citation

Recommended citation: Emily Öhman, Kaisla Kajava, Jörg Tiedemann, Timo Honkela. Creating a dataset for multilingual fine-grained emotion-detection using gamification-based annotation. In the proceedings of Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis at EMNLP 2018, 2018.

Emotion Preservation in Translation: Evaluating Datasets for Annotation Projection.

Published in In the proceedings of Proceedings of the Digital Humanities in the Nordic Countries 5th Conference, 2020

Use Google Scholar for full citation

Recommended citation: Kaisla Kajava, Emily Öhman, Hui Piao, Jörg Tiedemann, Emotion Preservation in Translation: Evaluating Datasets for Annotation Projection.. In the proceedings of Proceedings of the Digital Humanities in the Nordic Countries 5th Conference, 2020.

LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

Published in In the proceedings of SemEval-2020: International Workshop on Semantic Evaluation - COLING 28th International Conference on Computational Linguistics, Barcelona, Spain, 2020

Use Google Scholar for full citation

Recommended citation: Marc Pàmies, Emily Öhman, Kaisla Kajava, Jörg Tiedemann. LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?. In the proceedings of SemEval-2020: International Workshop on Semantic Evaluation - COLING 28th International Conference on Computational Linguistics, Barcelona, Spain, 2020.

XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Published in In the proceedings of Proceedings of the 28th International Conference on Computational Linguistics, 2020

Use Google Scholar for full citation

Recommended citation: Emily Öhman, Marc Pàmies, Kaisla Kajava, Jörg Tiedemann. XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection. In the proceedings of Proceedings of the 28th International Conference on Computational Linguistics, 2020.

SELF & FEIL: Emotion Lexicons for Finnish

Published in In the proceedings of The 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022), 2022

Use Google Scholar for full citation

Recommended citation: Emily Öhman, SELF & FEIL: Emotion Lexicons for Finnish. In the proceedings of The 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022), 2022.

Introduction to Text Analytics

Published in Sage Publishing, 2024

Coming in November 2024.

Recommended citation: Öhman, E., 2024. Introduction to Text Analytics. Edward Elgar.

research

resources

talks

The Challenges of Multi-dimensional Sentiment Analysis Across Languages

Published:

We outline a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in social media channels and a multilingual emotion lexicon for fine-grained sentiment analyses. Parallel data sets make it possible to study the preservation of sentiments and emotions in translation and our assessment reveals that the lexical approach shows great inter-language agreement. However, our manual evaluation also suggests that the use of purely lexical methods is limited and further studies are necessary to pinpoint the cross-lingual differences and to develop better sentiment classifiers.

Sentimentator: A Sentiment and Emotion Annotation Platform

Published:

We introduce Sentimentator; a publicly available gamified web-based annotation platform for fine-grained sentiment annotation at the sentence-level. Sentimentator is unique in that it moves beyond binary classification. We use a ten-dimensional model which allows for the annotation of 51 unique sentiments and emotions. The platform is gamified with a complex scoring system designed to reward users for high quality annotations. Sentimentator introduces several unique features that have previously not been available, or at best very limited, for sentiment annotation. In particular, it provides streamlined multi-dimensional annotation optimized for sentence-level annotation of movie subtitles. Because the platform is publicly available it will benefit anyone and everyone interested in fine-grained sentiment analysis and emotion detection, as well as annotation of other datasets.

Creating a Dataset for Multilingual Fine-grained Emotion-detection Using Gamification-based Annotation

Published:

This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection. We present a flexible tool, \textit{Sentimentator}, that can be used for efficient annotation based on crowd sourcing and a self-perpetuating gold standard. We also present a novel dataset with multi-dimensional annotations of emotions and sentiments in movie subtitles that enables research on sentiment preservation across languages and the creation of robust multilingual emotion detection tools. The tools and datasets are public and open-source and can easily be extended and applied for various purposes.

Computational Bias

Published:

Mitigating and acknowledging biased data and how to ethically deal with biases in data.

Biased Algorithms

Published:

How do biased data affect algorithms in education?

Emotion Annotation

Published:

With the prevalence of machine learning in natural language processing and other fields, an increasing number of crowd-sourced data sets are created and published. However, very little has been written about the annotation process from the point of view of the annotators. This pilot study aims to help fill the gap and provide insights into how to maximize the quality of the annotation output of crowd-sourced annotations with a focus on fine-grained sentence-level sentiment and emotion annotation from the annotators point of view.

XED: Multilabel Emotion Dataset

Published:

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik's core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

Coding for Digital Humanities

Published:

For this panel, I together with the other winners of the EADH small grants 2020 presented our projects.

Breaking into Academia

Published:

For this panel, I together with two more senior academics discussed the dos and don'ts of breaking into academia as a career.

Japanese Beauty Marketing on Social Media: Critical Discourse Analysis Meets NLP

Published:

This project is a pilot study intending to combine traditional corpus linguistics, Natural Language Processing, critical discourse analysis, and digital humanities to gain an up-to-date understanding of how beauty is being marketed on social media, specifically Instagram, to followers. We use topic modeling combined with critical discourse analysis and NLP tools for insights into the ``Japanese Beauty Myth" and show an overview of the dataset that we make publicly available.

The Validity of Lexicon-based Emotion Analysis in Interdisciplinary Research

Published:

Lexicon-based sentiment and emotion analysis methods are widely used particularly in applied Natural Language Processing (NLP) projects in fields such as computational social science and digital humanities. These lexicon-based methods have often been criticized for their lack of validation and accuracy – sometimes fairly. However, in this paper, we argue that lexicon-based methods work well particularly when moving up in granularity and show how useful lexicon-based methods can be for projects where neither qualitative analysis nor a machine learning-based approach is possible. Indeed, we argue that the measure of a lexicon's accuracy should be grounded in its usefulness.

SELF & FEIL Emotion Lexicons for Finnish

Published:

I introduce a Sentiment and Emotion Lexicon for Finnish (SELF) and a Finnish Emotion Intensity Lexicon (FEIL). Sentiment analysis and emotion detection require annotated data regardless of the chosen approach, but most existing resources are for the English language. To overcome this, the SELF and FEIL lexicons use projected annotations from existing resources with carefully edited translations and domain adaptations. In this paper the creation process and translation issues are explained in detail to allow others to create similar lexicons for other languages. The usefulness of SELF and FEIL are demonstrated via several interdisciplinary affect-related projects. To our best knowledge, this is the first comprehensive sentiment and emotion lexicon for Finnish.

Creating an army of hacker-scholars

Published:

International similarities and differences in what students struggle with on programming courses aimed at non-STEM students.

teaching

Project Course in English Linguistics

Graduate course, University of Helsinki, Department of Modern Languages, English Philology, 2015

For graduate students of English philology and the English teaching program. Focusing on practical tasks for the Language Change Database and teaching students to digest information from academic papers focused on diachronic corpus linguistics.

Introduction to Language Technology

Underaduate course, University of Helsinki, Department of Digital Humanities, 2015

This course was a compulsory course for all students of languages. The focus was on teaching both the applications of language technology and how language and computers are intertwined, but also many practical tools. These tools included:

Methods for Digital Humanities

Graduate course, University of Helsinki, Department of Digital Humanities, 2016

This course is a graduate level course teaching digital humanities methods to students of various humanities and social science disciplines.

Citizen Science: Crowd-sourcing as a Tool for Collecting Quantitative and Qualitative Data

Graduate course, University of Helsinki, Department of Digital Humanities, 2021

This course is a graduate level, UNAEuropa, course teaching students about crowdsourcing and citizen science for various digital humanities projects. This course was co-taught with Suzie Thomas and the focus was split on cultural heritage studies and crowdsourcing annotations for language technology projects.

Digital Intimacy: Media, habits, and affect.

Graduate course, Tampere University, Faculty of Information Technology and Communication Sciences, 2021

This course is a graduate level course on digital intimacy. I was a guest lecturer and talked about computational approaches to measuring intimacy and emotion in online sources.

Introductory Statistics

Undergraduate course, compulsory course, Waseda Unviersity, School of International Liberal Studies, 2022

A compulsory course to all SILS students.

Introduction to Digital Humanities

Undergraduate course, Introductory course, Waseda Unviersity, School of International Liberal Studies (open to all of Waseda), 2022

This course is an undergraduate level introductory course teaching digital humanities methods to students of liberal arts since 2021.

Data and Social Media Analysis

Undergraduate course, Advanced course, Waseda Unviersity, School of International Liberal Studies (open to all of Waseda), 2022

This course is an undergraduate level advanced course teaching intermediate practical programming skills with a data analysis and social media analysis focus since 2021.

Python Programming for Digital Humanities

Undergraduate course, Intermediate course, Waseda Unviersity, School of International Liberal Studies (open to all of Waseda), 2022

This course is an undergraduate level intermediate course teaching practical programming skills for beginners since 2021.

zemi