2024-03-28T22:03:41Zhttps://eprints.lib.hokudai.ac.jp/dspace-oai/requestoai:eprints.lib.hokudai.ac.jp:2115/639872022-11-17T02:08:08Zhdl_2115_20053hdl_2115_145Data filtering in humor generation: comparative analysis of hit rate and co-occurrence rankings as a method to choose usable pun candidatesDybala, Pawel1000080396316Rzepka, Rafal1000050202742Araki, Kenji1000090271733Sayama, Kohichiopen access© 2012 ACM. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in CIKM '12 Proceedings of the 21st ACM international conference on Information and knowledge management , Pages 2587-2590 ,2012-10-29,http://doi.acm.org/10.1145/2396761.2398698humor processingweb-based data extractionAINLPHCI007In this paper we propose a method of filtering excessive amount of textual data acquired from the Internet. In our research on pun generation in Japanese we experienced problems with extensively long data processing time, caused by the amount of phonetic candidates generated (i.e. phrases that can be used to generate actual puns) by our system. Simple, naive approach in which we take into considerations only phrases with the highest occurrence in the Internet, can effect in deletion of those candidates that are actually usable. Thus, we propose a data filtering method in which we compare two Internet-based rankings: a co-occurrence ranking and a hit rate ranking, and select only candidates which occupy the same or similar positions in these rankings. In this work we analyze the effects of such data reduction, considering 1 cases: when the candidates are on exactly the same positions in both rankings, and when their positions differ by 1, 2, 3 and 4. The analysis is conducted on data acquired by comparing pun candidates generated by the system (and filtered with our method) with phrases that were actually used in puns created by humans. The results show that the proposed method can be used to filter excessive amounts of textual data acquired from the Internet.ACM2012-10-29engconference paperAMhttp://hdl.handle.net/2115/63987https://doi.org/10.1145/2396761.2398698CIKM '12 Proceedings of the 21st ACM international conference on Information and knowledge management25872590ACM international conference on Information and knowledge management21Maui, HawaiiUSAhttps://eprints.lib.hokudai.ac.jp/dspace/bitstream/2115/63987/1/Data%20filtering%20in%20humor%20generation-%20comparative%20analysis%20of%20hit%20rate%20and%20co-occurrence%20rankings%20as%20a%20method%20to%20choose%20usable%20pun%20candidates..pdfapplication/pdf1.5 MB2012-10-29