Separate or joint? Estimation of multiple labels from crowdsourced annotations

Duan, Lei; Oyama, Satoshi; Sato, Haruhiko; Kurihara, Masahito

doi:10.1016/j.eswa.2014.03.048


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Separate or joint? Estimation of multiple labels from crowdsourced annotations

Files in This Item:

ESWA_Duan.pdf

733.23 kB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/57537

Title:	Separate or joint? Estimation of multiple labels from crowdsourced annotations
Authors:	Duan, Lei Browse this author
	Oyama, Satoshi Browse this author →KAKEN DB
	Sato, Haruhiko Browse this author
	Kurihara, Masahito Browse this author →KAKEN DB
Keywords:	Multi-label estimation
	Crowdsourced annotation
	Label dependency
	Quality control
	Human computation
Issue Date:	1-Oct-2014
Publisher:	Elsevier
Journal Title:	Expert Systems with Applications
Volume:	41
Issue:	13
Start Page:	5723
End Page:	5732
Publisher DOI:	10.1016/j.eswa.2014.03.048
Abstract:	Artificial intelligence techniques aimed at more naturally simulating human comprehension fit the paradigm of multi-label classification. Generally, an enormous amount of high-quality multi-label data is needed to form a multi-label classifier. The creation of such datasets is usually expensive and time-consuming. A lower cost way to obtain multi-label datasets for use with such comprehension simulation techniques is to use noisy crowdsourced annotations. We propose incorporating label dependency into the label-generation process to estimate the multiple true labels for each instance given crowdsourced multi-label annotations. Three statistical quality control models based on the work of Dawid and Skene are proposed. The label-dependent DS (D-DS) model simply incorporates dependency relationships among all labels. The label pairwise DS (P-DS) model groups labels into pairs to prevent interference from uncorrelated labels. The Bayesian network label-dependent DS (ND-DS) model compactly represents label dependency using conditional independence properties to overcome the data sparsity problem. Results of two experiments, "affect annotation for lines in story" and "intention annotation for tweets", show that (I) the ND-DS model most effectively handles the multi-label estimation problem with annotations provided by only about five workers per instance and that (2) the P-DS model is best if there are pairwise comparison relationships among the labels. To sum up, flexibly using label dependency to obtain multi-label datasets is a promising way to reduce the cost of data collection for future applications with minimal degradation in the quality of the results.
Type:	article (author version)
URI:	http://hdl.handle.net/2115/57537
Appears in Collections:	情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 栗原正仁

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University