Proposing Multimodal Integration Model Using LSTM and Autoencoder

Noguchi, Wataru; Iizuka, Hiroyuki; Yamamoto, Masahito

doi:10.4108/eai.3-12-2015.2262505


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Information Science and Technology / Faculty of Information Science and Technology >
Peer-reviewed Journal Articles, etc >

Proposing Multimodal Integration Model Using LSTM and Autoencoder

This item is licensed under:Creative Commons Attribution 3.0 Unported

Files in This Item:

eai.3-12-2015.2262505.pdf

320.71 kB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/64503

Title:	Proposing Multimodal Integration Model Using LSTM and Autoencoder
Authors:	Noguchi, Wataru Browse this author
	Iizuka, Hiroyuki Browse this author →KAKEN DB
	Yamamoto, Masahito Browse this author →KAKEN DB
Keywords:	multimodal integration
	deep learning
	autoencoder
	Long Short Term Memory
Issue Date:	28-Dec-2016
Publisher:	ACM
Journal Title:	EAI Endorsed Transactions on Security and Safety
Volume:	16
Issue:	10
Start Page:	e1
Publisher DOI:	10.4108/eai.3-12-2015.2262505
Abstract:	We propose an architecture of neural network that can learn and integrate sequential multimodal information using Long Short Term Memory. Our model consists of encoder and decoder LSTMs and multimodal autoencoder. For integrating sequential multimodal information, firstly, the encoder LSTM encodes a sequential input to a fixed range feature vector for each modality. Secondly, the multimodal autoencoder integrates the feature vectors from each modality and generate a fused feature vector which contains sequential multimodal information in a mixed form. The original feature vectors from each modality are re-generated from the fused feature vector in the multimodal autoencoder. The decoder LSTM decodes the sequential inputs from the regenerated feature vector. Our model is trained with the visual and motion sequences of humans and is tested by recall tasks. The experimental results show that our model can learn and remember the sequential multimodal inputs and decrease the ambiguity generated at the learning stage of LSTMs using integrated multimodal information. Our model can also recall the visual sequences from the only motion sequences and vice versa.
Rights:	Copyright © 2015 W. Noguchi et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.
Rights:	http://creativecommons.org/licenses/by/3.0/
Type:	article
URI:	http://hdl.handle.net/2115/64503
Appears in Collections:	情報科学院・情報科学研究院 (Graduate School of Information Science and Technology / Faculty of Information Science and Technology) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Submitter: 山本雅人

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University