HUSCAP logo Hokkaido Univ. logo

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Similarity Joins on Item Set Collections Using Zero-Suppressed Binary Decision Diagrams

Files in This Item:
similarity-join-zdd-v7-DASFAA.pdf275.01 kBPDFView/Open
Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/65255

Title: Similarity Joins on Item Set Collections Using Zero-Suppressed Binary Decision Diagrams
Authors: Shirai, Yasuyuki Browse this author
Takashima, Hiroyuki Browse this author
Tsuruma, Koji Browse this author
Oyama, Satoshi Browse this author →KAKEN DB
Keywords: similarity joins
error-tolerant matching
recommendation
zero-suppressed binary decision diagram
Issue Date: 2013
Publisher: Springer
Citation: Database Systems for Advanced Applications, Part of the Lecture Notes in Computer Science book series (LNCS, volume 7825), ISBN: 978-3-642-37486-9
Journal Title: Lecture Notes in Computer Science
Volume: 7825
Start Page: 56
End Page: 70
Publisher DOI: 10.1007/978-3-642-37487-6_7
Abstract: Similarity joins between two collections of item sets have recently been investigated and have attracted significant attention, especially for linguistic applications such as those involving spelling error corrections and data cleaning. In this paper, we propose a new approach to similarity joins for general item set collections, such as purchase history data and research keyword data. The main objective of our research is to efficiently find similar records between two data collections under the constraints of the number of added and deleted items. Efficient matching algorithms are urgently needed in similarity joins because of the combinatorial explosion between two data collections. We developed a matching algorithm based on Zero-suppressed Binary Decision Diagrams (ZDDs) to overcome this difficulty and make matching process more efficient. ZDDs are special types of Binary Decision Diagrams (BDDs), and are suitable for implicitly handling large-scale combinatorial item set data. We present, in this paper, the algorithms for similarity joins between two data collections represented as ZDDs and pruning techniques. We also present the experimental results obtained by comparing their performance with other systems and the results obtained by using real huge data collections to demonstrate their efficiency in actual applications.
Rights: The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-37487-6_7
Type: article (author version)
URI: http://hdl.handle.net/2115/65255
Appears in Collections:情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 小山 聡

Export metadata:

OAI-PMH ( junii2 , jpcoar_1.0 )

MathJax is now OFF:


 

 - Hokkaido University