View Proposal
-
Proposer
-
Simona Frenda
-
Title
-
Models aware of diverse cultures, beliefs, perspectives in NLP
-
Goal
-
Use of Retrieval-Augmented strategies to create models aware of different perspectives in highly subjective phenomena such as hate speech detection, emotion, misinformation and irony detection in textual and multilingual data.
-
Description
- The need to design multi-perspectives aware models arises because of the relevance of AI models in society. Indeed, not all countries and societies have equal access to technologies and their development, and if on the one hand, stereotypes about nationalities and cultures are learnt by models because of training data, on the other hand, their opinions and perspectives are not considered by AI models [1,2].
Exploiting the multiple annotation of “perspectivist” corpora and metadata about annotators [4,5], we can design models that are not only able to align with the perspective of users but understand their perspectives because informed with their cultural-background and/or values. To this purpose, in this project we would like to explore strategies of knowledge base like the Retrieval-Augmented Generation [6] to make models aware of diverse perspectives.
- Resources
-
[1] Dignum, V. (2022) Responsible Artificial Intelligence - From Principles to Practice: A Keynote at TheWebConf 2022.
[2] Narayanan Venkit, P., Gautam, S., Panchanadikar, R., Huang, T., and Wilson, S. (2023) Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles. In AIES '23.
[3] Frenda, S., Abercrombie, G., Basile, V., Pedrani, A., Panizzon, R., Cignarella, A.T., Marco, C. and Bernardi, D., (2024). Perspectivist approaches to natural language processing: a survey. Language Resources and Evaluation.
[4] Sachdeva, P., Barreto, R., Bacon, G., Sahn, A., Von Vacano, C., & Kennedy, C. (June). The measuring hate speech corpus: Leveraging rasch measurement theory for data perspectivism. In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP@ LREC2022.
[5] Casola, S., Frenda, S., Lo, S. M., Sezerer, E., Uva, A., Basile, V., Bosco, C., Pedrani, A., Rubagotti, C., Patti, V. and Bernardi, D., (2024). MultiPICo: Multilingual perspectivist irony corpus. In Proceedings of Association for Computational Linguistics (ACL).
[6] Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, H. and Wang, H., 2023. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2(1).
-
Background
-
https://pdai.info/
https://le-wi-di.github.io/
-
Url
-
-
Difficulty Level
-
Moderate
-
Ethical Approval
-
None
-
Number Of Students
-
0
-
Supervisor
-
Simona Frenda
-
Keywords
-
knowledge based strategies, perspectivist models, multiple annotated corpora, multilingual dataset
-
Degrees
-
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI
Master of Science in Data Science