View Proposal
-
Proposer
-
Neamat El Gayar
-
Title
-
Emotions recognition using Multimodal Learning
-
Goal
-
Explore recent models integrating different modalities for any AI application for affect recognition
-
Description
- In the field of natural language processing, the transformer models such as BERT and T5 are providing a lot of fruitful results. These models are also built on the idea of self-supervised learning where they are already trained with a large amount of unlabelled data and then they apply some fine-tuned supervised learning models with few labeled data
Self-supervised learning methods have solved many of the problems regarding unlabeled data. Uses of these methods in fields like computer vision and natural language processing have shown many great results.
Recent success of Transformers in the language domain has motivated adapting it to a multimodal setting ( Images, audio , video)
-multimodal transformer survey https://arxiv.org/pdf/2206.06488.pdf
-Vision transformer survey https://arxiv.org/pdf/2012.12556.pdf
Possible applications:
- Emotions prediction ( text, audio video ) in educational setting or monitoring medical patients
- Build on current recent papers Faccacy detection in political debates https://aclanthology.org/2024.eacl-short.16.pdf
and affect prediction in video conversations https://dl.acm.org/doi/10.1145/3689092.3689409
Other applications also possible
Check this article . (for applications combining language)
https://theaisummer.com/vision-language-models/
- Resources
-
-
Background
-
-
Url
-
-
Difficulty Level
-
Easy
-
Ethical Approval
-
None
-
Number Of Students
-
0
-
Supervisor
-
Neamat El Gayar
-
Keywords
-
-
Degrees
-
Bachelor of Science in Computer Science
Master of Science in Artificial Intelligence
Master of Science in Data Science