View Proposal


Proposer
Neamat El Gayar
Title
Emotions recognition using Multimodal Learning
Goal
Explore recent models integrating different modalities for any AI application for affect recognition
Description
In the field of natural language processing, the transformer models such as BERT and T5 are providing a lot of fruitful results. These models are also built on the idea of self-supervised learning where they are already trained with a large amount of unlabelled data and then they apply some fine-tuned supervised learning models with few labeled data Self-supervised learning methods have solved many of the problems regarding unlabeled data. Uses of these methods in fields like computer vision and natural language processing have shown many great results. Recent success of Transformers in the language domain has motivated adapting it to a multimodal setting ( Images, audio , video) -multimodal transformer survey https://arxiv.org/pdf/2206.06488.pdf -Vision transformer survey https://arxiv.org/pdf/2012.12556.pdf Possible applications: - Emotions prediction ( text, audio video ) in educational setting or monitoring medical patients - Build on current recent papers Faccacy detection in political debates https://aclanthology.org/2024.eacl-short.16.pdf and affect prediction in video conversations https://dl.acm.org/doi/10.1145/3689092.3689409 Other applications also possible Check this article . (for applications combining language) https://theaisummer.com/vision-language-models/
Resources
Background
Url
Difficulty Level
Easy
Ethical Approval
None
Number Of Students
0
Supervisor
Neamat El Gayar
Keywords
Degrees
Bachelor of Science in Computer Science
Master of Science in Artificial Intelligence
Master of Science in Data Science