HUSCAP logo Hokkaido Univ. logo

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Crowdsourcing chart digitizer : task design and quality control for making legacy open data machine-readable

Files in This Item:
jdsa-1.pdf1.47 MBPDFView/Open
Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/67755

Title: Crowdsourcing chart digitizer : task design and quality control for making legacy open data machine-readable
Authors: Oyama, Satoshi Browse this author →KAKEN DB
Baba, Yukino Browse this author
Ohmukai, Ikki Browse this author
Dokoshi, Hiroaki Browse this author
Kashima, Hisashi Browse this author
Keywords: Crowdsourcing
Open data
Statistical chart
Data extraction
Spreadsheet
Issue Date: 1-Dec-2016
Publisher: Springer
Journal Title: International Journal of Data Science and Analytics
Volume: 2
Issue: 1-2
Start Page: 45
End Page: 60
Publisher DOI: 10.1007/s41060-016-0025-y
Abstract: Despite recent open data initiatives in many countries, a significant percentage of the data provided is in non-machine-readable formats like image format rather than in a machine-readable electronic format, thereby restricting their usability. Various types of software for digitizing data chart images have been developed. However, such software is designed for manual use and thus requires human intervention, making it unsuitable for automatically extracting data from a large number of chart images. This paper describes the first unified framework for converting legacy open data in chart images into a machine-readable and reusable format by using crowdsourcing. Crowd workers are asked not only to extract data from an image of a chart but also to reproduce the chart objects in a spreadsheet. The properties of the reproduced chart objects give their data structures, including series names and values, which are useful for automatic processing of data by computer. Since results produced by crowdsourcing inherently contain errors, a quality control mechanism was developed that improves accuracy by aggregating tables created by different workers for the same chart image and by utilizing the data structures obtained from the reproduced chart objects. Experimental results demonstrated that the proposed framework and mechanism are effective. The proposed framework is not intended to compete with chart digitizing software, and workers can use it if they feel it is useful for extracting data from charts. Experiments in which workers were encouraged to use such software showed that even if workers used it, the extracted data still contained errors. This indicates that quality control is necessary even if workers use software to extract data from chart images.
Rights: The final publication is available at Springer via http://dx.doi.org/10.1007/s41060-016-0025-y
Type: article (author version)
URI: http://hdl.handle.net/2115/67755
Appears in Collections:情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 小山 聡

Export metadata:

OAI-PMH ( junii2 , jpcoar_1.0 )

MathJax is now OFF:


 

 - Hokkaido University