Data filtering in humor generation: comparative analysis of hit rate and co-occurrence rankings as a method to choose usable pun candidates

Dybala, Pawel; Rzepka, Rafal; Araki, Kenji; Sayama, Kohichi

doi:10.1145/2396761.2398698


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Data filtering in humor generation: comparative analysis of hit rate and co-occurrence rankings as a method to choose usable pun candidates

Files in This Item:

Data filtering in humor generation- comparative analysis of hit rate and co-occurrence rankings as a method to choose usable pun candidates..pdf

1.53 MB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/63987

Title:	Data filtering in humor generation: comparative analysis of hit rate and co-occurrence rankings as a method to choose usable pun candidates
Authors:	Dybala, Pawel Browse this author
	Rzepka, Rafal Browse this author →KAKEN DB
	Araki, Kenji Browse this author →KAKEN DB
	Sayama, Kohichi Browse this author →KAKEN DB
Keywords:	humor processing
	web-based data extraction
	AI
	NLP
	HCI
Issue Date:	29-Oct-2012
Publisher:	ACM
Journal Title:	CIKM '12 Proceedings of the 21st ACM international conference on Information and knowledge management
Start Page:	2587
End Page:	2590
Publisher DOI:	10.1145/2396761.2398698
Abstract:	In this paper we propose a method of filtering excessive amount of textual data acquired from the Internet. In our research on pun generation in Japanese we experienced problems with extensively long data processing time, caused by the amount of phonetic candidates generated (i.e. phrases that can be used to generate actual puns) by our system. Simple, naive approach in which we take into considerations only phrases with the highest occurrence in the Internet, can effect in deletion of those candidates that are actually usable. Thus, we propose a data filtering method in which we compare two Internet-based rankings: a co-occurrence ranking and a hit rate ranking, and select only candidates which occupy the same or similar positions in these rankings. In this work we analyze the effects of such data reduction, considering 1 cases: when the candidates are on exactly the same positions in both rankings, and when their positions differ by 1, 2, 3 and 4. The analysis is conducted on data acquired by comparing pun candidates generated by the system (and filtered with our method) with phrases that were actually used in puns created by humans. The results show that the proposed method can be used to filter excessive amounts of textual data acquired from the Internet.
Conference Name:	ACM international conference on Information and knowledge management
Conference Sequence:	21
Conference Place:	Maui, Hawaii
Rights:	© 2012 ACM. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in CIKM '12 Proceedings of the 21st ACM international conference on Information and knowledge management , Pages 2587-2590 ,2012-10-29,http://doi.acm.org/10.1145/2396761.2398698
Type:	proceedings (author version)
URI:	http://hdl.handle.net/2115/63987
Appears in Collections:	情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: RZEPKA Rafal

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University