Multimodal Emotion Recognition

Tejashwini N; Kaveri A V; Keerthana P; Rajneesh Kumar; Kavya C M

doi:10.5281/zenodo.4419690

Authors

Tejashwini N Information Science and Engineering, Sri Krishna Institute of Technology, Bangalore, India
Kaveri A V Information Science and Engineering, Sri Krishna Institute of Technology, Bangalore, India
Keerthana P Information Science and Engineering, Sri Krishna Institute of Technology, Bangalore, India
Rajneesh Kumar Information Science and Engineering, Sri Krishna Institute of Technology, Bangalore, India
Kavya C M Information Science and Engineering, Sri Krishna Institute of Technology, Bangalore, India

DOI:

https://doi.org/10.5281/zenodo.4419690

Keywords:

LSTM(Long Short-Term Memory), CNN(Convolutional Neutral Network), Feature Extraction, Data Preprocessing

Abstract

Recognizing different emotions of humans for system has been a burning issue since last decade. The association between individuals and PCs will be increasingly normal if PCs can see and react to human non-verbal correspondence, for example, feelings. Albeit a few methodologies have been proposed to perceive human feelings dependent on outward appearances or discourse or text, generally restricted work has been three models and other modalities to improve the capacities of the feeling acknowledgment framework. This paper describes the qualities and the restrictions of frameworks dependent on outward appearance or acoustic data or semantic and emotional word vector information. By the utilization of markers all over, nitty gritty facial movements were caught with movement catch, related to synchronous discourse chronicles and text inputs. The essential difficulties of feeling acknowledgment are picking the feeling acknowledgment corpora(speech database) distinguishing proof of various highlights identified with discourse and fitting decision of grouping. Feature Extraction utilized for feeling acknowledgment from video information are geometric and appearance-based while prosodic what more, phantom highlights are utilized for discourse information what more emotional and semantic word vector for text information. Later the given data is preprocessed as in called as Data Preprocessing. CNN is used to capture video and speech emotion-specific information. LSTM is used for text emotion-specific data. The basic aim of this models is to explore the capabilities of text, facial and speech features to provide emotion-specific information.

Downloads

Download data is not yet available.

References

Ming-Hsiang Su, Chung-Hsien Wu, Kun-Yi Huang and Qian-Bei Hong, “LSTM-based Text Emotion Recognition Using Semantic and Emotional Word Vectors”, First Conference on Affective Computing and Intelligent Interaction(ACII Asia), pp.978-5386-5311-1, 2018.

DuoFeng and Fuji Ren, “Dynamic Facial Expression Recognition based on two-Stream-CNN with LBP-TOP” based on 5th IEEE International Conference on Cloud Computing and Intelligence Systems(CCIS), 2018, pages 355-359, pp.978-5386-6005-8, 2018.

Li Zheng, Qiao Li and Shuhua Liu, “Speech Emotion Recognition Based on Convolution Neural Network Combined with Random Forest” based on 2018 30th Chinese Control and Decision Conference(CCDC), pages 4143-4147, pp.978-1-5386-1243-9, 2018.

Peng Shi, “Speech Emotion Recoginition based on Deep Belief Network” based on IEEE 15th International Conference On Network Sensing and Control(ICNSC), pp.978-1-4799-5496-4/14, 2018.

Nimish Ronge, Sayali Nakashe, Asish Pawar, Sarika Bodbe, “Emotion Recognition and Reaction Prediction in videos” based on Third International Conference on Research in Computational Intelligence and Communication Networks(ICRCINN), pp.978-1-5386-1931-5/17, 2017.

World Health Organization, “Mental disorders affect one in four people,” Treatment Available but not Being Used., 2001. Available: http://www.who.int/whr/2001/media_centre/press_release/en/

S. P. Robbins, Organizational behavior, 14/E: Pearson Education India, 2011.

K. Y. Huang, C. H. Wu, M. H. Su and Y. T. Kuo, “Detecting Unipolar and Bipolar Depressive Disorders from Elicited Speech Responses Using Latent Affective Structure Model,” IEEE Transactions on Affective Computing, DOI 10.1109/TAFFC.2018.2803178, 2018.

T. H. Yang, C. H. Wu, K. Y. Huang, and M. H. Su , “Coupled HMM based Multimodal Fusion for Mood Disorder Detection through Elicited Audio-Visual Signals,” Journal of Ambient Intelligence and Humanized Computing, Special Issue on Media Computing and Applications for Immersive Communication, vol. 8, no. 6, pp. 895-906, 2016.

Multimodal Emotion Recognition

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

URN

License

Most read articles by the same author(s)

Make a Submission