Repetition-Aware Lossless Compression

古谷, 勇


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Theses >
博士　（情報科学） >

Repetition-Aware Lossless Compression

Files in This Item:

Isamu_Furuya.pdf

458.62 kB

PDF

View/Open

Please use this identifier to cite or link to this item:https://doi.org/10.14943/doctoral.k14281

Related Items in HUSCAP:

論文内容及び審査の要旨
Repetition-Aware Lossless Compression [an abstract of dissertation and a summary of dissertation review]

Title:	Repetition-Aware Lossless Compression
Other Titles:	反復構造のための可逆圧縮
Authors:	古谷, 勇 Browse this author
Issue Date:	25-Sep-2020
Publisher:	Hokkaido University
Abstract:	This thesis studies lossless compression techniques for repetitive data. Lossless compression is a type of data compression that allows restoring the original information completely from compressed data. Today's ever-growing information technology industries involve the enormous data growth, and then an efficient method managing large data is desired. Whereas, these large data in our society are in many cases highly repetitive, that is, most of their fragment parts can be obtained from others occurring in other positions in the data with a few modifications. Managing large repetitive data efficiently is getting attention in many fields and demands for a good compression method for such repetitive data are increasing. A repetition-aware compression technique allows to manage these large data more efficiently and this study contributes to the technique. The term repetition-aware means high effectiveness for repetitiveness. Our approaches to repetition-aware compression are through the grammar compression scheme that constructs a formal grammar that generates a language consisting only of the input data. Grammar compression have been preferable over other lossless compression techniques because of some profitable properties including practical high compression performance for repetitive data. The heart of this study is to develop a grammar compression method that aims to construct a small sized formal grammar from the input data. We discuss on three grammar compression frameworks whose differences are the formal grammars used as the description of the compressed data. We consider a contextfree grammar (CFG), a run-length context-free grammar (RLCFG), and a functional program described by λ-term in Chapter 3, 4, and 5,espectively. In Chapter 3, we approach to the problem of repetition-aware compression on CFGbased grammar compression. We analyze a famous algorithm, RePair, and on the basis of the analysis, we design a novel variant of RePair, called MR-RePair. We implement MR-RePair and experimentally confirm the effectiveness of MR-RePair especially for highly repetitive texts. In Chapter 4, we address further improvement of compression performance via the framework of RLCFG-based grammar compression. In the chapter, we design a compression algorithm using RLCFG, called RL-MR-RePair. Furthermore, we propose an encoding scheme for MR-RePair and RL-MR-RePair. The experimental results demonstrate the high compression performance of RL-MR-RePair and the proposed encoding scheme. In Chapter 5, we study on the framework of higher-order compression, which is a grammar compression using a λ-term as the formal grammar. We present a method to obtain a compact λ-term representing a natural number. Obtaining a compact representation of natural numbers can improve the compression effectiveness of repetition, the most fundamental repetitive structure. For given natural number n, we prove that the size of the obtained λ-term becomes O(slog2n) in the best case and O(slog2n)log n/ log log n in the worst case.
Conffering University:	北海道大学
Degree Report Number:	甲第14281号
Degree Level:	博士
Degree Discipline:	情報科学
Examination Committee Members:	(主査) 教授有村博紀, 教授吉岡真治, 教授堀山貴史
Degree Affiliation:	情報科学院（情報科学専攻）
Type:	theses (doctoral)
URI:	http://hdl.handle.net/2115/79532
Appears in Collections:	課程博士 (Doctorate by way of Advanced Course) > 情報科学院(Graduate School of Information Science and Technology) 学位論文 (Theses) > 博士　（情報科学）

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University