HUSCAP logo Hokkaido Univ. logo

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Automatically annotating a five-billion-word corpus of Japanese blogs for sentiment and affect analysis

This item is licensed under:Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Files in This Item:
Automatically annotating a five-billion-word corpus of japanese blogs for affect and sentiment analysis.pdf1.51 MBPDFView/Open
Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/63969

Title: Automatically annotating a five-billion-word corpus of Japanese blogs for sentiment and affect analysis
Authors: Ptaszynski, Michal Browse this author
Rzepka, Rafal Browse this author →KAKEN DB
Araki, Kenji Browse this author →KAKEN DB
Momouchi, Yoshio Browse this author →KAKEN DB
Issue Date: Jan-2014
Publisher: Elsevier
Journal Title: Computer Speech & Language
Volume: 28
Issue: 1
Start Page: 38
End Page: 55
Publisher DOI: 10.1016/j.csl.2013.04.010
Abstract: This paper presents our research on automaticannotation of a five-billion-word corpus ofJapanese blogs with information on affect andsentiment. We first perform a study in emotionblog corpora to discover that there has beenno large scale emotion corpus available forthe Japanese language. We choose the largestblog corpus for the language and annotate itwith the use of two systems for affect analysis:ML-Ask for word- and sentence-levelaffect analysis and CAO for detailed analysisof emoticons. The annotated informationincludes affective features like sentencesubjectivity (emotive/non-emotive) or emotionclasses (joy, sadness, etc.), useful in affectanalysis. The annotations are also generalizedon a 2-dimensional model of affect to obtaininformation on sentence valence/polarity(positive/negative) useful in sentiment analysis.The annotations are evaluated in severalways. Firstly, on a test set of a thousand sentencesextracted randomly and evaluated byover forty respondents. Secondly, the statisticsof annotations are compared to other existingemotion blog corpora. Finally, the corpus isapplied in several tasks, such as generation ofemotion object ontology or retrieval of emotionaland moral consequences of actions.
Rights: © 2014, Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
http://creativecommons.org/licenses/by-nc-nd/4.0/
Type: article (author version)
URI: http://hdl.handle.net/2115/63969
Appears in Collections:情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: RZEPKA Rafal

Export metadata:

OAI-PMH ( junii2 , jpcoar_1.0 )

MathJax is now OFF:


 

 - Hokkaido University