View Proposal
-
Proposer
-
Ioannis Konstas
-
Title
-
NLP - Self-correcting Language Modelling using Large Language Models
-
Goal
-
To develop robust self-correcting mechanisms for various NLP tasks with an LLM
-
Description
- You have most likely already used ChatGPT a few times (or a lot!)
Have you ever wondered what it actually takes to build a system based on a Large Language Model (LLM) and evaluate it on a real-world task?
In this series of projects (check the rest as well!) we will explore the task of self-critique, or providing feedback using natural language to improve the performance on a downstream task. Take the following example (from [1]):
User: I am interested in playing Table tennis.
Response: I'm sure it's a great way to socialize, stay active
Feedback:
Engaging: Provides no information about table tennis or how to play it.
User understanding: Lacks understanding of user's needs and state of mind.
Response (refined): That's great to hear (...) ! It's a fun sport requiring quick reflexes and good hand-eye coordination. Have you played before, or are you looking to learn?
[1] Madaan et al. SELF-REFINE: Iterative Refinement with Self-Feedback. 2023. arXiv.
This notion of using language to update a model has received a lot of interest recently which allows for many interesting avenues to pursue:
- Come up with a generic style of self-feedback that works for many different tasks (e.g., dialogue response generation, code generation, question answering, error correction, etc).
- Focus on one task and create/collect a dataset of feedback that contains desired measurable properties that can be evaluated. In other words we will attempt to evaluate the feedback provided itself rather than just the refined responses.
- (More challenging) use self-feedback to train a reward model using reinforcement learning (https://github.com/huggingface/trl)
Once we decide on the particular flavour of QA task we are interested in, then we can explore several popular techniques for fine-tuning an open-source LLM (e.g., Llama 2) starting from the simpler ones (prompt engineering), all the way up to Parameter Efficient Fine-Tuning (PEFT). We will use standard benchmark datasets and SOTA frameworks to evaluate (and potentially train) our models.
We can co-develop the project to emphasise more on the style of feedback training, data annotation, or human evaluation.
- Resources
-
Large Language Models, GPU
-
Background
-
Machine Learning, NLP (desired: F29AI), software development, F21NL (MSc only), F21CA (MSc only)
-
Url
-
-
Difficulty Level
-
High
-
Ethical Approval
-
None
-
Number Of Students
-
1
-
Supervisor
-
Ioannis Konstas
-
Keywords
-
machine learning, neural networks, large language models, natural language processing (nlp)
-
Degrees
-
Bachelor of Science in Computer Science
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI