HUSCAP logo Hokkaido Univ. logo

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Unsupervised Spam Detection by Document Probability Estimation with Maximal Overlap Method

Files in This Item:
26_297-1.pdf555.17 kBPDFView/Open
Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/47125

Title: Unsupervised Spam Detection by Document Probability Estimation with Maximal Overlap Method
Authors: Uemura, Takashi Browse this author
Ikeda, Daisuke Browse this author
Kida, Takuya Browse this author →KAKEN DB
Arimura, Hiroki Browse this author
Keywords: unsupervised spam detection
document complexity
suffix tree
maximal overlap method
word salad
Issue Date: 2011
Publisher: 人工知能学会
Journal Title: Transactions of the Japanese Society for Artificial Intelligence
Volume: 26
Issue: 1
Start Page: 297
End Page: 306
Publisher DOI: 10.1527/tjsai.26.297
Abstract: In this paper, we study content-based spam detection for spams that are generated by copying a seed document with some random perturbations. We propose an unsupervised detection algorithm based on an entropy-like measure called document complexity, which reflects how many similar documents exist in the input collection of documents. As the document complexity, however, is an ideal measure like Kolmogorov complexity, we substitute an estimated occurrence probability of each document for its complexity. We also present an efficient algorithm that estimates the probabilities of all documents in the collection in linear time to its total length. Experimental results showed that our algorithm especially works well for word salad spams, which are believed to be difficult to detect automatically.
Type: article
URI: http://hdl.handle.net/2115/47125
Appears in Collections:情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 喜田 拓也

Export metadata:

OAI-PMH ( junii2 , jpcoar_1.0 )

MathJax is now OFF:


 

 - Hokkaido University