View Proposal


Proposer
Simona Frenda
Title
Models aware of diverse cultures, beliefs, perspectives in NLP
Goal
Use of Retrieval-Augmented strategies to create models aware of different perspectives in highly subjective phenomena such as hate speech detection, emotion, misinformation and irony detection in textual and multilingual data.
Description
The need to design multi-perspectives aware models arises because of the relevance of AI models in society. Indeed, not all countries and societies have equal access to technologies and their development, and if on the one hand, stereotypes about nationalities and cultures are learnt by models because of training data, on the other hand, their opinions and perspectives are not considered by AI models [1,2]. Exploiting the multiple annotation of “perspectivist” corpora and metadata about annotators [4,5], we can design models that are not only able to align with the perspective of users but understand their perspectives because informed with their cultural-background and/or values. To this purpose, in this project we would like to explore strategies of knowledge base like the Retrieval-Augmented Generation [6] to make models aware of diverse perspectives.
Resources
[1] Dignum, V. (2022) Responsible Artificial Intelligence - From Principles to Practice: A Keynote at TheWebConf 2022. [2] Narayanan Venkit, P., Gautam, S., Panchanadikar, R., Huang, T., and Wilson, S. (2023) Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles. In AIES '23. [3] Frenda, S., Abercrombie, G., Basile, V., Pedrani, A., Panizzon, R., Cignarella, A.T., Marco, C. and Bernardi, D., (2024). Perspectivist approaches to natural language processing: a survey. Language Resources and Evaluation. [4] Sachdeva, P., Barreto, R., Bacon, G., Sahn, A., Von Vacano, C., & Kennedy, C. (June). The measuring hate speech corpus: Leveraging rasch measurement theory for data perspectivism. In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP@ LREC2022. [5] Casola, S., Frenda, S., Lo, S. M., Sezerer, E., Uva, A., Basile, V., Bosco, C., Pedrani, A., Rubagotti, C., Patti, V. and Bernardi, D., (2024). MultiPICo: Multilingual perspectivist irony corpus. In Proceedings of Association for Computational Linguistics (ACL). [6] Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, H. and Wang, H., 2023. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2(1).
Background
https://pdai.info/ https://le-wi-di.github.io/
Url
Difficulty Level
Moderate
Ethical Approval
None
Number Of Students
0
Supervisor
Simona Frenda
Keywords
knowledge based strategies, perspectivist models, multiple annotated corpora, multilingual dataset
Degrees
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI
Master of Science in Data Science