View Proposal


Proposer
Ioannis Konstas
Title
NLP - Self-correcting Language Modelling using Large Language Models
Goal
To develop robust self-correcting mechanisms for various NLP tasks with an LLM
Description
You have most likely already used ChatGPT a few times (or a lot!) Have you ever wondered what it actually takes to build a system based on a Large Language Model (LLM) and evaluate it on a real-world task? In this series of projects (check the rest as well!) we will explore the task of self-critique, or providing feedback using natural language to improve the performance on a downstream task. Take the following example (from [1]): User: I am interested in playing Table tennis. Response: I'm sure it's a great way to socialize, stay active Feedback: Engaging: Provides no information about table tennis or how to play it. User understanding: Lacks understanding of user's needs and state of mind. Response (refined): That's great to hear (...) ! It's a fun sport requiring quick reflexes and good hand-eye coordination. Have you played before, or are you looking to learn? [1] Madaan et al. SELF-REFINE: Iterative Refinement with Self-Feedback. 2023. arXiv. This notion of using language to update a model has received a lot of interest recently which allows for many interesting avenues to pursue: - Come up with a generic style of self-feedback that works for many different tasks (e.g., dialogue response generation, code generation, question answering, error correction, etc). - Focus on one task and create/collect a dataset of feedback that contains desired measurable properties that can be evaluated. In other words we will attempt to evaluate the feedback provided itself rather than just the refined responses. - (More challenging) use self-feedback to train a reward model using reinforcement learning (https://github.com/huggingface/trl) Once we decide on the particular flavour of QA task we are interested in, then we can explore several popular techniques for fine-tuning an open-source LLM (e.g., Llama 2) starting from the simpler ones (prompt engineering), all the way up to Parameter Efficient Fine-Tuning (PEFT). We will use standard benchmark datasets and SOTA frameworks to evaluate (and potentially train) our models. We can co-develop the project to emphasise more on the style of feedback training, data annotation, or human evaluation.
Resources
Large Language Models, GPU
Background
Machine Learning, NLP (desired: F29AI), software development, F21NL (MSc only), F21CA (MSc only)
Url
Difficulty Level
High
Ethical Approval
None
Number Of Students
1
Supervisor
Ioannis Konstas
Keywords
machine learning, neural networks, large language models, natural language processing (nlp)
Degrees
Bachelor of Science in Computer Science
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI