A Study on Machine Learning-based Approaches for Personality Identification and Translation

RADISAVLJEVIC, Dusan


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Theses >
博士　（情報科学） >

A Study on Machine Learning-based Approaches for Personality Identification and Translation

Files in This Item:

Dusan_Radisavljevic.pdf

2.07 MB

PDF

View/Open

Please use this identifier to cite or link to this item:https://doi.org/10.14943/doctoral.k15665

Related Items in HUSCAP:

論文内容及び審査の要旨
A Study on Machine Learning-based Approaches for Personality Identification and Translation [an abstract of dissertation and a summary of dissertation review]

Title:	A Study on Machine Learning-based Approaches for Personality Identification and Translation
Other Titles:	人格特性の識別と翻訳のための機械学習アプローチに関する研究
Authors:	RADISAVLJEVIC, Dusan Browse this author
Issue Date:	25-Sep-2023
Publisher:	Hokkaido University
Abstract:	The popularity of social media services has resulted in the increasing shift of our interactions into online spaces. This phenomenon and the rapid development of various chatting applications have led to textual messages driving most of our communications. Therefore, correctly interpreting the intentions and feelings that interlocutors try to convey through text has become increasingly more important. While context and familiarity between people play a large part in adequately interpreting their communication, specific patterns they exhibit remain consistent over time. Personality psychology attributes these patterns and differences between people on an individual level to a concept called personality. Personality is the sum of individual differences present in behavioural, emotional, and cognitive patterns and remains relatively consistent over time and context. With this in mind, we can deduce that personality plays an essential part in communication, as it describes both individuality and consistency. The significance of correctly understanding and interpreting an individual’s personality has been the focus of many researchers. Recent decades saw efforts to leverage technological advancements and new computational algorithms for such a task, leading to the formation of the personality computing research field. While personality computing is still relatively new, there has been an increasing interest trend. However, due to the field’s novelty, the lack of research standardisation in the evaluation criteria has made comparing works difficult. In addition, one more problem has been the lack of readily available data labelled with personality-relevant information. The main contributing factors are often the preference for different personality measurements and concerns over data privacy. These two issues, most frequently cited as the main obstacles throughout many of the works on personality computing research, have been my primary moti- vation for researching the possibility of connecting different personality assessment methods. If it were possible to leverage a connection between different personality assessments successfully, it would effectively increase the readily available data for all the personality assessment methods involved. Additionally, taking an approach centred around understanding personality and its reflection in communication can contribute towards developing a standardised evaluation framework, within which it would be possible to successfully replicate and interpret differences in performance across different research works. With this objective in mind, the study described in this thesis starts with the speaker identification task, seeking to establish the possibility of identifying inter- locutors using only text transcripts of their communication. The novel transformer-based approach proposed in this part of the study has been proven able to predict to whom the utterance belongs with a degree of certainty that outperforms the baseline approach, scoring above 70% on the F1 metric. In addition to this, the experiments have resulted in a large dataset based on the textual transcripts of the dialogues from a commercial video game, with over 70, 000 utterances. To the best of my understanding, while previous efforts have examined the prospect of using fantasy texts for dialogue-related tasks, this is the first time data from a commercial video game has been collected and published with a purpose of being used for such a task. Relying on the findings that answer whether or not textual communication reflects personal differences, the study further examines the exact reasons behind these differences – by looking into the relationship between text-based features and two different personality assessment models. The two models in question, namely the Big Five and the Myers-Briggs Type Indicator, have both shown a correlation with certain linguistic features when analysing text from the social media platform Reddit. While this finding confirms the linguistic properties often attributed to the Big Five model due to its lexical background, it has also offered novel insight into similar properties possibly being reflected in the Myers-Briggs Type Indicator, a less-researched personality model. These findings were then employed later in the study to convert data from the more easily obtainable Myers-Briggs Type Indicator and another personality assessment model, the Enneagram, into the Big Five personality assessments. The detailed approach taken during this part of the experiment ensures the study’s reproducibility and comparability. The result is a simple approach that has caused an increase of up to 13.2% in correlation strength on the per-measurement basis for the Pearson r correlation coefficient evaluation metric. In order to better adapt the evaluation criteria to the properties of the domain as well as data, the best-performing approach for the translation of the Myers-Briggs Type Indicator and Enneagram assessments into the Big Five ones was then re-evaluated using the Spearman’s rank correlation coefficient and a root mean squared error evaluation metrics. These re-evaluations have helped confirm the original findings and further substantiate the claim regarding which features and algorithm choice seem most effective for the task.
Abstract:	ソーシャルメディアサービスの普及により、私たちの交流はますますオンライン空間に移行している。この現象と様々なチャットアプリケーションの急速な発展により、私たちのコミュニケーションの大半はテキストメッセージで行われるようになっている。そのため、テキストを通じて対話者が伝えようとする意図や感情を正しく解釈することがますます重要となっている。人々のコミュニケーションを適切に解釈するためには、文脈や親しみやすさが大きく影響しているが、人々が示す特定のパターンには一貫性がある。パーソナリティ心理学ではこのようなパターンや人同士の個人レベルでの違いを、パーソナリティと呼ばれる概念に帰着させている。パーソナリティとは、行動、感情、認知のパターンに存在する個人差の総体であり、時間や文脈にかかわらず比較的一貫したままである。このように考えると、個性と一貫性の両方を表すパーソナリティは、コミュニケーションにおいて不可欠な役割を担っていると推察される。個人のパーソナリティを正しく理解し、解釈することの意義は、多くの研究者が注目してきたところである。ここ数十年、技術の進歩や新しい計算アルゴリズムの活用が進み、パーソナリティコンピューティングの研究分野が形成されるに至った。パーソナリティコンピューティングはまだ比較的新しい研究分野であるが、その関心はますます高まってきている。しかし、この分野の新規性から、評価基準に研究標準がないため、作品の比較は困難である。さらに、性格に関連する情報をラベル付けしたデータが容易に入手できないことも問題になっている。その主な要因は、異なる性格測定の好みとデータのプライバシーに対する懸念であることが多い。この2つの問題は、パーソナリティコンピューティング研究に関する多くの研究において、主な障害として最も頻繁に挙げられており、私が異なる性格評価方法の接続の可能性を研究する最大の動機となっている。もし、異なる性格診断法をうまく接続することができれば、関係するすべての性格診断法の利用可能なデータを効果的に増やすことができる。さらに、パーソナリティの理解とコミュニケーションへの反映を中心としたアプローチをとることで、標準化された評価の枠組みを開発することに貢献し、その中で、異なる研究作品間でのパフォーマンスの違いをうまく再現し解釈することが可能となる。この目的を念頭に置いて、本論文で説明される研究は、話者識別タスクから始まり、対話者のコミュニケーションのテキスト転写物のみを使用して対話者を識別する可能性を確立することを目的としている。この研究で提案された新しいトランスフォーマーベースのアプローチは、F1指標で70%以上のスコアを獲得し、ベースラインアプローチを凌ぐ確実性で、発話が誰のものかを予測できることが証明された。さらに、この実験では、商用ビデオゲームのダイアログのテキストトランスクリプトに基づく大規模なデータセットが得られ、70, 000を超える発話があることがわかった。私の知る限り、ファンタジーテキストを対話関連タスクに利用する見込みは、これまでの取り組みで検討されてきたが、市販のビデオゲームのデータをこのようなタスクに利用するのは、今回が初めてのことである。本研究では、テキストコミュニケーションに個人差があるか否かの答えが得られたことを踏まえ、さらに、テキストベースの特徴と2つの異なる性格評価モデルとの関係を調べることで、その違いを生み出す正確な理由を検証している。ビッグファイブとマイヤーズ・ブリッグス・タイプ・インディケーターという2つの性格診断モデルは、ソーシャルメディアプラットフォームRedditのテキストを分析したところ、いずれも特定の言語的特徴との相関が示された。この発見は、ビッグファイブの語彙的背景からビッグファイブの言語特性を確認するものであるが、同様の特性が、あまり研究されていないMyers-BriggsType Indicatorの性格モデルにも反映されている可能性があるという新しい知見を提供するものである。これらの知見は、より入手しやすいMyers-Briggs Type Indicatorと、もう一つの性格診断モデルであるエニアグラムのデータを、ビッグファイブの性格診断に変換するために、研究の後半で使用されている。実験のこの部分で取られた詳細なアプローチは、研究の再現性と比較可能性を保証するものである。その結果、シンプルなアプローチでありながら、ピアソンr相関係数の評価指標において、測定ごとの相関強度が最大13.2%増加した。また、評価基準を領域やデータの特性に合わせるため、Myers-BriggsType IndicatorとEnneagramをBig 5評価に変換する際に最も優れたアプローチを、スピアマンの順位相関係数と平均2乗誤差の評価指標を用いて再評価した。これらの再評価により、当初の知見が確認され、このタスクに最も効果的と思われる機能とアルゴリズムの選択に関する主張がさらに実証された。
Conffering University:	北海道大学
Degree Report Number:	甲第15665号
Degree Level:	博士
Degree Discipline:	情報科学
Examination Committee Members:	(主査) 特任教授荒木健治, 特任教授坂本雄児, 教授長谷山美紀, 教授土橋宜典, 准教授伊藤敏彦
Degree Affiliation:	情報科学院(情報科学専攻)
(Relation)haspart:	Dušan Radisavljević, Bojan Batalo, Rafal Rzepka, and Kenji Araki. Textbased speaker identification for video game dialogues. In Intelligent Systems and Applications: Proceedings of the 2021 Intelligent Systems Conference (IntelliSys) Volume 3, pages 44–54. Springer, 2022.
	Dušan Radisavljević, Rafal Rzepka, and Kenji Araki. Personality types and traits—examining and leveraging the relationship between different personality models for mutual prediction. Applied Sciences, 13(7):4506, 2023.
	Radisavljević, Dušan, Bojan Batalo, Rafal Rzepka, and Kenji Araki. "Myers- Briggs Type Indicator and the Big Five Model-How Our Personality Affects Language Use." In 2022 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), pp. 1-6. IEEE, 2022.
Type:	theses (doctoral)
URI:	http://hdl.handle.net/2115/90853
Appears in Collections:	課程博士 (Doctorate by way of Advanced Course) > 情報科学院(Graduate School of Information Science and Technology) 学位論文 (Theses) > 博士　（情報科学）

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University