Details - MACS Project System

View Proposal

Proposer: Ioannis Konstas
Title: NLP - Self-correcting Language Modelling using Large Language Models
Goal: To develop robust self-correcting mechanisms for various NLP tasks with an LLM
Description: You have most likely already used ChatGPT a few times (or a lot!) Have you ever wondered what it actually takes to build a system based on a Large Language Model (LLM) and evaluate it on a real-world task? In this series of projects (check the rest as well!) we will explore the task of self-critique, or providing feedback using natural language to improve the performance on a downstream task. Take the following example (from [1]): User: I am interested in playing Table tennis. Response: I'm sure it's a great way to socialize, stay active Feedback: Engaging: Provides no information about table tennis or how to play it. User understanding: Lacks understanding of user's needs and state of mind. Response (refined): That's great to hear (...) ! It's a fun sport requiring quick reflexes and good hand-eye coordination. Have you played before, or are you looking to learn? [1] Madaan et al. SELF-REFINE: Iterative Refinement with Self-Feedback. 2023. arXiv. This notion of using language to update a model has received a lot of interest recently which allows for many interesting avenues to pursue: - Come up with a generic style of self-feedback that works for many different tasks (e.g., dialogue response generation, code generation, question answering, error correction, etc). - Focus on one task and create/collect a dataset of feedback that contains desired measurable properties that can be evaluated. In other words we will attempt to evaluate the feedback provided itself rather than just the refined responses. - (More challenging) use self-feedback to train a reward model using reinforcement learning (https://github.com/huggingface/trl) Once we decide on the particular flavour of QA task we are interested in, then we can explore several popular techniques for fine-tuning an open-source LLM (e.g., Llama 4) starting from the simpler ones (prompt engineering), all the way up to Parameter Efficient Fine-Tuning (PEFT), and using Agents that perform different actions (e.g., search the web, use a calculator, execute Python code). We will use standard benchmark datasets and SOTA frameworks to evaluate (and potentially train) our models. We can co-develop the project to emphasise more on the style of feedback training, data annotation, or human evaluation.
Resources: Large Language Models, GPU
Background: Machine Learning, NLP (desired: F21NL), software development, conversational agents (desired: F21CA)
Url
Difficulty Level: High
Ethical Approval: None
Number Of Students: 1
Supervisor: Ioannis Konstas
Keywords: machine learning, neural networks, large language models, natural language processing (nlp)
Degrees: Bachelor of Science in Computer Science
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI
BSc Data Sciences

Back to List