2024-03-28T20:23:09Zhttps://eprints.lib.hokudai.ac.jp/dspace-oai/requestoai:eprints.lib.hokudai.ac.jp:2115/652552022-11-17T02:08:08Zhdl_2115_20053hdl_2115_145Similarity Joins on Item Set Collections Using Zero-Suppressed Binary Decision DiagramsShirai, YasuyukiTakashima, HiroyukiTsuruma, Koji1000030346100Oyama, Satoshiopen accessThe final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-37487-6_7similarity joinserror-tolerant matchingrecommendationzero-suppressed binary decision diagram007Similarity joins between two collections of item sets have recently been investigated and have attracted significant attention, especially for linguistic applications such as those involving spelling error corrections and data cleaning. In this paper, we propose a new approach to similarity joins for general item set collections, such as purchase history data and research keyword data. The main objective of our research is to efficiently find similar records between two data collections under the constraints of the number of added and deleted items. Efficient matching algorithms are urgently needed in similarity joins because of the combinatorial explosion between two data collections. We developed a matching algorithm based on Zero-suppressed Binary Decision Diagrams (ZDDs) to overcome this difficulty and make matching process more efficient. ZDDs are special types of Binary Decision Diagrams (BDDs), and are suitable for implicitly handling large-scale combinatorial item set data. We present, in this paper, the algorithms for similarity joins between two data collections represented as ZDDs and pruning techniques. We also present the experimental results obtained by comparing their performance with other systems and the results obtained by using real huge data collections to demonstrate their efficiency in actual applications.Springer2013engjournal articleAMhttp://hdl.handle.net/2115/65255Database Systems for Advanced Applications, Part of the Lecture Notes in Computer Science book series (LNCS, volume 7825), ISBN: 978-3-642-37486-9https://doi.org/10.1007/978-3-642-37487-6_70302-9743Lecture Notes in Computer Science78255670https://eprints.lib.hokudai.ac.jp/dspace/bitstream/2115/65255/1/similarity-join-zdd-v7-DASFAA.pdfapplication/pdf275.01 KB2013