Framework for automatic information extraction from research papers on nanocrystal devices

Dieb, Thaer M.; Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C.

doi:10.3762/bjnano.6.190


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Framework for automatic information extraction from research papers on nanocrystal devices

This item is licensed under:Creative Commons Attribution 2.0 Generic

Files in This Item:

2190-4286-6-190.pdf

907.89 kB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/60145

Title:	Framework for automatic information extraction from research papers on nanocrystal devices
Authors:	Dieb, Thaer M. Browse this author
	Yoshioka, Masaharu Browse this author →KAKEN DB
	Hara, Shinjiro Browse this author →KAKEN DB
	Newton, Marcus C. Browse this author
Keywords:	annotated corpus
	automatic information extraction
	nanocrystal device development
	nanoinformatics
	text mining
Issue Date:	8-Sep-2015
Publisher:	Beilstein-Institut
Journal Title:	Beilstein journal of nanotechnology
Volume:	6
Start Page:	1872
End Page:	1882
Publisher DOI:	10.3762/bjnano.6.190
Abstract:	To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called "NaDev" (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called "NaDevEx" (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39-73%); however, precision is better (75-97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system.
Rights:	http://creativecommons.org/licenses/by/2.0/
Type:	article
URI:	http://hdl.handle.net/2115/60145
Appears in Collections:	情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 吉岡真治

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University