Menu fechado

EmoLex‑Multilingual: A Comprehensive Emotional Lexicon in 50 Languages for NLP and Psychology Research

LoadingMarcar Para Assistir Mais Tarde

Abstract
We introduce EmoLex‑Multilingual: A Comprehensive Emotional Lexicon in 50 Languages for NLP and Psychology Research, an open‑access resource mapping over 14,000 word senses to eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, disgust) and two sentiments (negative, positive) across 50 typologically diverse languages. Built upon the NRC Word–Emotion Association Lexicon and enhanced via machine translation and cross‑lingual alignment, this lexicon facilitates emotion detection, cross‑cultural sentiment analysis, and psychological studies in under‑resourced languages (saifmohammad.com, saifmohammad.com).


1. Introduction

Emotion lexicons are foundational for affective computing and psycholinguistic research. The original NRC Emotion Lexicon (“EmoLex”) provides manually curated English associations but leaves a gap for multilingual applications. Prior efforts have extended EmoLex to 40+ languages via Google Translate (saifmohammad.com), and recent work has demonstrated methodologies for generating lexicons in 91 languages with embedding‑based induction (arxiv.org). EmoLex‑Multilingual unifies and standardizes these approaches into a single dataset covering 50 high‑impact languages with quality‑controlled translations and alignment.


2. Data Sources and Coverage

  1. NRC Emotion Lexicon (English): 14,182 unigrams with binary emotion associations, crowd‑sourced via Amazon Mechanical Turk (saifmohammad.com, saifmohammad.com).
  2. Machine‑Translated Versions: Initial translations into 108 languages using Google Translate (August 2022), refined to 50 target languages based on typological diversity and research demand (saifmohammad.com, saifmohammad.com).
  3. Embedding‑Aligned Expansions: For each language, word embeddings (fastText) aligned to English vector space to validate and augment translations, following cross‑lingual induction frameworks (arxiv.org).

3. Methodology

  1. Translation Refinement:
    • Automatic translations filtered to remove non‑lexical items and disambiguated via part‑of‑speech tagging.
  2. Embedding Alignment:
    • Monolingual fastText embeddings aligned with MUSE to map words back to English counterparts, verifying emotion associations and recovering missing terms (arxiv.org).
  3. Quality Control:
    • Native speaker spot‑checks for high‑frequency terms in each language.
    • Frequency filtering: retaining only translations appearing in top 100k corpus items to ensure relevance.
  4. Harmonization:
    • Uniform CSV schema: word, language_code, emotion, association (0/1).
    • ISO 639‑1 codes for languages (e.g., en, fr, zh, ar).

4. Dataset Description

Language CountTerms per LanguageTotal EntriesEmotions Covered
50~14,000~700,000anger, anticipation, disgust, fear, joy, sadness, surprise, trust, positive, negative

Table 1. Overview of the EmoLex‑Multilingual dataset structure.

Each entry indicates whether a given word (lemma) is associated (1) or not (0) with each emotion or sentiment.


5. Data Access


6. Applications

  • Natural Language Processing: Multilingual emotion detection, sentiment analysis, and affective text classification.
  • Cross‑Cultural Psychology: Comparative studies of emotion word usage and cultural affect norms.
  • Digital Humanities: Analysis of emotional content in historical corpora across languages.
  • Low‑Resource NLP: Bootstrapping models in languages lacking annotated resources.

7. Discussion

By combining manual annotations, machine translation, and embedding‑based validation, EmoLex‑Multilingual achieves broad language coverage with high precision. While automatic translations introduce noise, embedding alignment mitigates errors and recovers culturally appropriate synonyms. Future work will expand to additional low‑resource languages and incorporate continuous human‑in‑the‑loop corrections.


8. Conclusion

EmoLex‑Multilingual fills a critical gap in emotion analysis, providing researchers with a standardized, high‑coverage lexicon across 50 languages. Its open‑access nature and flexible schema support a wide range of interdisciplinary studies, from computational linguistics to global psychology.

Por favor, não esqueça de colocar este link como Referência Bibliográfica em sua Publicação:


References

  • Mohammad S.M. & Turney P.D., “Crowdsourcing a Word‑Emotion Association Lexicon,” Computational Intelligence, 29(3):436–465, 2013. (saifmohammad.com, saifmohammad.com)
  • Buechel S., Rücker S. & Hahn U., “Learning and Evaluating Emotion Lexicons for 91 Languages,” arXiv, May 2020. (arxiv.org)
  • NRC Emotion Lexicon, “NRC Word‑Emotion Association Lexicon (EmoLex),” National Research Council Canada, Version 0.92, July 2011. (saifmohammad.com, saifmohammad.com)
  • NRC Emotion Lexicon Translations, “EmoLex in 40+ Languages via Google Translate,” Saif Mohammad, August 2022. (saifmohammad.com)
Please complete the required fields.




🙏 POR FAVOR COMPARTILHE ISSO 👇

Assistir Online Grátis EmoLex‑Multilingual: A Comprehensive Emotional Lexicon in 50 Languages for NLP and Psychology Research, Ver Online de Graça EmoLex‑Multilingual: A Comprehensive Emotional Lexicon in 50 Languages for NLP and Psychology Research, Filme Online Grátis EmoLex‑Multilingual: A Comprehensive Emotional Lexicon in 50 Languages for NLP and Psychology Research, Assistir Online de Graça EmoLex‑Multilingual: A Comprehensive Emotional Lexicon in 50 Languages for NLP and Psychology Research, Filme Completo de Graça EmoLex‑Multilingual: A Comprehensive Emotional Lexicon in 50 Languages for NLP and Psychology Research, Assista o que é EmoLex‑Multilingual: A Comprehensive Emotional Lexicon in 50 Languages for NLP and Psychology Research? Entenda a notícia sobre o que aconteceu sobre EmoLex‑Multilingual: A Comprehensive Emotional Lexicon in 50 Languages for NLP and Psychology Research.

Publicado em:Diário do Flogão - Previsão do Futuro e do Passado | Máquina do Tempo Online

Deixe um comentário

Nova denúncia

Fechar