An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping Table

Xu, JinAn; Chen, Yufeng; Ru, Kuang; Zhang, Yujie; Araki, Kenji

doi:10.1587/transinf.2016EDP7425


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping Table

Files in This Item:

E100.D_2016EDP7425.pdf

2.68 MB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/67136

Title:	An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping Table
Authors:	Xu, JinAn Browse this author
	Chen, Yufeng Browse this author
	Ru, Kuang Browse this author
	Zhang, Yujie Browse this author
	Araki, Kenji Browse this author →KAKEN DB
Keywords:	named entity translation equivalents acquisition
	Chinese Hanzi and Japanese Kanji mapping table
	inductive learning
	monolingual corpora
Issue Date:	Aug-2017
Publisher:	電子情報通信学会
Journal Title:	IEICE transactions on information and systems
Volume:	E100D
Issue:	8
Start Page:	1882
End Page:	1892
Publisher DOI:	10.1587/transinf.2016EDP7425
Abstract:	Named Entity Translation Equivalents extraction plays a critical role in machine translation (MT) and cross language information retrieval (CLIR). Traditional methods are often based on large-scale parallel or comparable corpora. However, the applicability of these studies is constrained, mainly because of the scarcity of parallel corpora of the required scale, especially for language pairs of Chinese and Japanese. In this paper, we propose a method considering the characteristics of Chinese and Japanese to automatically extract the Chinese-Japanese Named Entity (NE) translation equivalents based on inductive learning (IL) from monolingual corpora. The method adopts the Chinese Hanzi and Japanese Kanji Mapping Table (HKMT) to calculate the similarity of the NE instances between Japanese and Chinese. Then, we use IL to obtain partial translation rules for NEs by extracting the different parts from high similarity NE instances in Chinese and Japanese. In the end, the feedback processing updates the Chinese and Japanese NE entity similarity and rule sets. Experimental results show that our simple, efficient method, which overcomes the insufficiency of the traditional methods, which are severely dependent on bilingual resource. Compared with other methods, our method combines the language features of Chinese and Japanese with IL for automatically extracting NE pairs. Our use of a weak correlation bilingual text sets and minimal additional knowledge to extract NE pairs effectively reduces the cost of building the corpus and the need for additional knowledge. Our method may help to build a large-scale Chinese-Japanese NE translation dictionary using mono-lingual corpora.
Rights:	Copyright ©2017 The Institute of Electronics, Information and Communication Engineers
Relation:	https://search.ieice.org/
Type:	article
URI:	http://hdl.handle.net/2115/67136
Appears in Collections:	情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 荒木健治

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University