View Proposal
-
Proposer
-
Tanvi Dinkar
-
Title
-
Safety in Large Language Models
-
Goal
-
Develop datasets, methods and systems to understand how the delivery of content from chatbots is perceived in safety critical domains
-
Description
- As Large Language Models (LLM) like ChatGPT become increasingly used in the real world, how believable the output sounds versus how safe the output is could make a difference in whether the user follows bad, or even potentially dangerous advice. For example, an LLM may confidently tell the user that if they saw a poisonous mushroom in the woods, then they should eat it [1]. It is very important to investigate the difference between HOW something was said versus WHAT was actually said in LLMs, particularly when thinking about safety-critical queries. For example, one's propensity to believe something may reduce when "bleach is the most effective solution" is instead generated as "I don’t know, but I've heard that bleach could be an effective solution" [2]. The "I don't know, but" is known as a hedge in language, i.e. a word or a phrase used in a sentence to express ambiguity or indecisiveness. On the other hand, there are other methods that can be used to make a system sound more confident, for example offering a rationale/benefit for bad advice - e.g. telling the user to eat the poisonous mushroom because it "Improves Knowledge: Tasting a mushroom can help to improve your knowledge of mushrooms and their flavours" [3].
This project will explore how believable the output of an LLM sounds, focusing on the effect of hedges, rationales and so on. The project can be tailored to suit the students’ interests, i.e. either focusing on training and assessing LLMs, or human data collection and evaluation.
- Resources
-
[1] Levy, S., Allaway, E., Subbiah, M., Chilton, L., Patton, D., Mckeown, K., & Wang, W. Y. (2022, December). SafeText: A Benchmark for Exploring Physical Safety in Language Models. In _Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing_ (pp. 2407-2421).
[2] Mielke, S. J., Szlam, A., Dinan, E., & Boureau, Y. L. (2022). Reducing Conversational Agents' Overconfidence through Linguistic Calibration. _Transactions of the Association for Computational Linguistics_, _10_, 857-872.
[3] Mei, A., Levy, S., & Wang, W. Y. ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models. In _The 2023 Conference on Empirical Methods in Natural Language Processing_.
[4] List of internet challenges https://en.wikipedia.org/wiki/List_of_Internet_challenges
-
Background
-
-
Url
-
External Link
-
Difficulty Level
-
Moderate
-
Ethical Approval
-
Full
-
Number Of Students
-
1
-
Supervisor
-
Tanvi Dinkar
-
Keywords
-
large language models, chatgpt, safety, natural language processing, machine learning
-
Degrees
-
Bachelor of Science in Computer Science
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI
Master of Science in Data Science
Master of Science in Human Robot Interaction