Efficient Algorithms for Extracting Frequent Episodes from Event Sequences

Katoh, Takashi


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Theses >
博士　（情報科学） >

Efficient Algorithms for Extracting Frequent Episodes from Event Sequences

Files in This Item:

dt-katoh-0131.pdf

981.39 kB

PDF

View/Open

Please use this identifier to cite or link to this item:https://doi.org/10.14943/doctoral.k9967

Title:	Efficient Algorithms for Extracting Frequent Episodes from Event Sequences
Other Titles:	イベント列からの頻出エピソード抽出に対する効率良いアルゴリズム
Authors:	Katoh, Takashi¹ Browse this author
Authors(alt):	河東, 孝¹
Issue Date:	24-Mar-2011
Publisher:	Hokkaido University
Abstract:	Episode mining is one of the data mining method for time-related data introduced by Mannila et al. in 1997. The purpose of episode mining is to extract all frequent episodes from input event sequences. Here, the episode is formulated as an acyclic labeled digraph in which labels correspond to events and edges represent temporal precedent-subsequent relations in an event sequence. Then, an episode gives a richer representation of temporal relationship than a subsequence, which represents just a linearly ordered relation in sequential pattern mining. For the episodes and the subclasses of episodes, several mining algorithms have been developed by several researchers. As such subclasses of episodes, Katoh et al. have introduced sectorial episodes, diamond episodes and elliptic episodes. These episodes are simpler than general episode but useful to represent the real-world in- formation that are not represented by subsequences. The algorithms designed by Katoh et al. are level-wise; The algorithms rst nd the occurrence information of the serial episodes in an input event sequence, by scanning it just once. After regarding the serial episodes as itemsets, the algorithms then construct the frequent episodes by using one of the frequent itemset mining algorithm that uses breadth- rst search over the space of candidate patterns. While the level-wise algorithms are sufficient to nd frequent episodes efficiently in practice it is difficult to give a theoretical guarantee of the efficiency to the level- wise algorithms from the view of enumeration. Moreover, since level-wise algorithms are exponential in memory complexity, the algorithms may not run on the physical machine with large input. In this thesis, as a space-efficient episode mining algorithm, we newly design the episode-growth algorithms, to enumerate frequent diamond episodes and more general episodes in polynomial time per an output and polynomial space. The algorithms adopts the depth- rst search instead of the level-wise search. In Chapter 3, we study the problem of mining frequent diamond episodes efficiently from an input event sequence with sliding a window. Here, a diamond episode is of the form a 7! E 7! b, which means that every event of E follows an event a and is followed by an event b. Then, we design a polynomial-delay and polynomial-space algorithm PolyFreqDmd that nds all of the frequent diamond episodes without duplicates from an event sequence in O(j j2n) time per an episode and in O(j j + n) space, where and n are an alphabet and the length of the event sequence, respectively. Finally, we give experimental results on arti cial and real-world event sequences with varying several mining parameters to evaluate the efficiency of the algorithm. In Chapter 4, rst we introduce a bipartite episode of the form A7!B for two sets A and B of events, which means that every event of A is followed by every event of B. Then, we present an algorithm that nds all frequent bipartite episodes from an input sequence without duplication in O(j j N) time per an episode and in O(j j2n) space, where is an alphabet, N is total input size of S, and n is the length of S. Finally, we give experimental results on arti cial and real sequences to evaluate the efficiency of the algorithm. In Chapter 5, we introduce the class of k-partite episodes, which are time-series patterns of the form ⟨A1; : : : ;Ak⟩ for sets Ai (1 i k) of events meaning that, in an input event sequence, every event of Ai is followed by every event of Ai+1 for every 1 i < k. Then, we present a backtracking algorithm Kpar and its modi cation Kpar2 that nd all of the frequent k-partite episodes from an input event sequence without duplication. By theoretical analysis, we show that these two algorithms run in polynomial time per an output and polynomial space in total input size. In Chapter 6, we give a simple characterization of episodes in episode mining that are constructible from just information for occurrences of serial episodes, called serially constructible episodes. First, we formulate an episode as an acyclic transitive labeled digraph of which label is an event type in episode mining. Next, we introduce a parallel-free episode that always has an arc between vertices with the same label. Also we formulate a serially constructible episode as an episode embedded into every parallel-free episode containing all of the serial episodes occurring in it. Then, we show that an episode is parallel-free if and only if it is serially constructible. In Chapter 7, we apply episode mining algorithms to the bacterial culture data and extracted the episodes as the time-related rules representing replacements of bacteria and changes for drug resistance as the factors of hospital-acquired infection. A sectorial episode is of the form C 7! r, where C is a set of events and r is an event. This sectorial episode means every event type in C is precedent to an event type r. A sequential episode, which is the simplest form of serial episodes, is an episode of the form A ! B. This sequential episode means that an event type A is precedent to an event type B. An aligned bipartite episode between the genera of bacteria is of the form A ! B, where A and B are the sets of genera of bacteria, satisfying that every genus in A has the same earliest occurrence time, every genus in B has the same earliest occurrence time, and the former is precedent to the latter. In this chapter, we extract the sectorial episodes, the sequential episodes, and aligned bipartite episodes representing changes for drug resistance and replacements of bacteria from bacterial culture data provided from from Osaka Prefecture General Medical Center. Finally, we conclude this thesis and discuss future researches in Chapter 8.
Conffering University:	北海道大学
Degree Report Number:	甲第9967号
Degree Level:	博士
Degree Discipline:	情報科学
Type:	theses (doctoral)
URI:	http://hdl.handle.net/2115/45078
Appears in Collections:	学位論文 (Theses) > 博士　（情報科学）

Submitter: 河東孝

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University