View Proposal


Proposer
Simona Frenda
Title
Catching cultural biases in LLMs
Goal
Design a framework of calibration of LLMs biases employing statistical techniques like conformal prediction.
Description
Conformal prediction is a statistical framework used to quantify the reliability of model predictions by assessing how well individual predictions align with a set of labels (e.g., labels annotated by a group of persons belonging to the same culture) [1]. Scientific literature has revealed important biases coming from data collected and annotated by specific segments of the population, leading to the creation of non-neutral models [2] and to the reinforcement of social stereotypes [3]. To investigate societal biases of LLMs we can rely on conformal prediction to evaluate the uncertainty of models against the decisions of specific groups of annotators with similar backgrounds, cultures, or beliefs. We will use multiple annotated datasets addressing high subjective phenomena such as the detection of misinformation, irony, hate speech.
Resources
[1] Angelopoulos, A. N., & Bates, S. (2023). Conformal prediction: A gentle introduction. Foundations and trends® in machine learning, 16(4), 494-591. [2] Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P., & Hashimoto, T. (2023, July). Whose opinions do language models reflect?. In International Conference on Machine Learning (pp. 29971-30004). PMLR. [3] Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.
Background
https://www.nowpublishers.com/article/Details/MAL-101
Url
Difficulty Level
Challenging
Ethical Approval
None
Number Of Students
0
Supervisor
Simona Frenda
Keywords
bias detection, calibration of models, multiple annotated corpora
Degrees
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI
Master of Science in Data Science
Bachelor of Science in Statistical Data Science