HUSCAP logo Hokkaido Univ. logo

Hokkaido University Collection of Scholarly and Academic Papers >
情報科学研究科  >
雑誌発表論文等  >

Similarity Joins on Item Set Collections Using Zero-Suppressed Binary Decision Diagrams

similarity-join-zdd-v7-DASFAA.pdf275.01 kBPDF見る/開く

タイトル: Similarity Joins on Item Set Collections Using Zero-Suppressed Binary Decision Diagrams
著者: Shirai, Yasuyuki 著作を一覧する
Takashima, Hiroyuki 著作を一覧する
Tsuruma, Koji 著作を一覧する
Oyama, Satoshi 著作を一覧する
キーワード: similarity joins
error-tolerant matching
zero-suppressed binary decision diagram
発行日: 2013年
出版者: Springer
引用: Database Systems for Advanced Applications, Part of the Lecture Notes in Computer Science book series (LNCS, volume 7825), ISBN: 978-3-642-37486-9
誌名: Lecture Notes in Computer Science
巻: 7825
開始ページ: 56
終了ページ: 70
出版社 DOI: 10.1007/978-3-642-37487-6_7
抄録: Similarity joins between two collections of item sets have recently been investigated and have attracted significant attention, especially for linguistic applications such as those involving spelling error corrections and data cleaning. In this paper, we propose a new approach to similarity joins for general item set collections, such as purchase history data and research keyword data. The main objective of our research is to efficiently find similar records between two data collections under the constraints of the number of added and deleted items. Efficient matching algorithms are urgently needed in similarity joins because of the combinatorial explosion between two data collections. We developed a matching algorithm based on Zero-suppressed Binary Decision Diagrams (ZDDs) to overcome this difficulty and make matching process more efficient. ZDDs are special types of Binary Decision Diagrams (BDDs), and are suitable for implicitly handling large-scale combinatorial item set data. We present, in this paper, the algorithms for similarity joins between two data collections represented as ZDDs and pruning techniques. We also present the experimental results obtained by comparing their performance with other systems and the results obtained by using real huge data collections to demonstrate their efficiency in actual applications.
Rights: The final publication is available at Springer via
資料タイプ: article (author version)
出現コレクション:雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

提供者: 小山 聡


本サイトに関するご意見・お問い合わせは repo at へお願いします。 - 北海道大学