Why Does Large Batch Training Result in Poor Generalization? A Comprehensive Explanation and a Better Strategy from the Viewpoint of Stochastic Optimization

Takase, Tomoumi; Oyama, Satoshi; Kurihara, Masahito

doi:10.1162/neco_a_01089


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Why Does Large Batch Training Result in Poor Generalization? A Comprehensive Explanation and a Better Strategy from the Viewpoint of Stochastic Optimization

Files in This Item:

Tomoumi Takase.pdf

2.41 MB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/71558

Title:	Why Does Large Batch Training Result in Poor Generalization? A Comprehensive Explanation and a Better Strategy from the Viewpoint of Stochastic Optimization
Authors:	Takase, Tomoumi Browse this author
	Oyama, Satoshi Browse this author →KAKEN DB
	Kurihara, Masahito Browse this author →KAKEN DB
Keywords:	Non-convex optimization
	Gradient descent
	Neural network
	Batch training
	Randomized algorithm
Issue Date:	Jul-2018
Publisher:	MIT
Journal Title:	Neural Computation
Volume:	30
Issue:	7
Start Page:	2005
End Page:	2023
Publisher DOI:	10.1162/neco_a_01089
PMID:	29652590
Abstract:	We present a comprehensive framework of search methods, such as simulated annealing and batch training, for solving non-convex optimization problems. These methods search a wider range by gradually decreasing the randomness added to the standard gradient descent method. The formulation that we define on the basis of this framework can be directly applied to neural network training. This produces an effective approach that gradually increases batch size during training. We also explain why large batch training degrades generalization performance, which was not clarified in previous studies.
Rights:	© 2018 Massachusetts Institute of Technology
Relation:	https://www.mitpressjournals.org/loi/neco
Type:	article
URI:	http://hdl.handle.net/2115/71558
Appears in Collections:	情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 高瀬朝海

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University