Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods

Yonezawa, Kouki; Igarashi, Manabu; Ueno, Keisuke; Takada, Ayato; Ito, Kimihito

doi:10.1371/journal.pone.0057684


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
International Institute for Zoonosis Control >
Peer-reviewed Journal Articles, etc >

Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods

This item is licensed under:Creative Commons Attribution 3.0 Unported

Files in This Item:

journal.pone.0057684.pdf

646.93 kB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/52671

Title:	Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods
Authors:	Yonezawa, Kouki Browse this author
	Igarashi, Manabu Browse this author →KAKEN DB
	Ueno, Keisuke Browse this author →KAKEN DB
	Takada, Ayato Browse this author →KAKEN DB
	Ito, Kimihito Browse this author →KAKEN DB
Issue Date:	27-Feb-2013
Publisher:	Public Library Science
Journal Title:	PLoS One
Volume:	8
Issue:	2
Start Page:	e57684
Publisher DOI:	10.1371/journal.pone.0057684
Abstract:	A large number of nucleotide sequences of various pathogens are available in public databases. The growth of the datasets has resulted in an enormous increase in computational costs. Moreover, due to differences in surveillance activities, the number of sequences found in databases varies from one country to another and from year to year. Therefore, it is important to study resampling methods to reduce the sampling bias. A novel algorithm-called the closest-neighbor trimming method-that resamples a given number of sequences from a large nucleotide sequence dataset was proposed. The performance of the proposed algorithm was compared with other algorithms by using the nucleotide sequences of human H3N2 influenza viruses. We compared the closest-neighbor trimming method with the naive hierarchical clustering algorithm and k-medoids clustering algorithm. Genetic information accumulated in public databases contains sampling bias. The closest-neighbor trimming method can thin out densely sampled sequences from a given dataset. Since nucleotide sequences are among the most widely used materials for life sciences, we anticipate that our algorithm to various datasets will result in reducing sampling bias.
Rights:	http://creativecommons.org/licenses/by/3.0/
Type:	article
URI:	http://hdl.handle.net/2115/52671
Appears in Collections:	人獣共通感染症国際共同研究所 (International Institute for Zoonosis Control) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 伊藤公人

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University