Details - MACS Project System

View Proposal

Proposer: Ioannis Konstas
Title: NLP - Question Answering using Large Language Models
Goal: To develop a faithful question answering system with an LLM
Description: You have most likely already used ChatGPT a few times (or a lot!) Have you ever wondered what it actually takes to build a system based on a Large Language Model (LLM) and evaluate it on a real-world task? In this series of projects (check the rest as well!) we will explore the task of question answering (QA). QA involves trying to automatically answer a user query usually "grounded" on one or more passages (what we refer to as open-book QA) or just relying on the acquired knowledge of the model (closed-book QA). The tricky part (especially with the recent advances in LLMs) is to ensure that the output question is faithful to the user query (and the input passage, if that exists) and not to hallucinate extra pieces of irrelevant or wrong information. There are many interesting aspects in the field of Question Answering that we could potentially explore: - Choose between open-book and closed-book QA. The former involves the component of retrieving the relevant information; this could be either via a search engine (e.g., via a Google search query), or a retrieval engine. This usually depends on the breadth of the domain we will choose to focus on (generic knowledge QA vs. closed-domain questions based for example on the scrape of a website only) - Experiment with different styles of outputs: Provide answers that are less or more verbose, or that exhibit particular linguistic or other rhetoric phenomena (such as repeating part of the question, providing a definition first, giving an explanation, etc.) - Explore more challenging queries, that might require some sort of decomposition reasoning, e.g., complex questions: "How old was Linus Torvalds when Linux 1.0 was released?", "What is the difference between jam and marmalade?" - Experiment with more than single-shot questions, resembling a natural conversation. Once we decide on the particular flavour of QA task we are interested in, then we can explore several popular techniques for fine-tuning an open-source LLM (e.g., Llama 4) starting from the simpler ones (prompt engineering), all the way up to Parameter Efficient Fine-Tuning (PEFT), and using Agents that perform different actions (e.g., search the web, use a calculator, execute Python code). We will use standard benchmark datasets and SOTA frameworks to evaluate (and potentially train) our models. We can co-develop the project to emphasise more on the features (faithfulness, style of output, complex question, conversational QA), training, data annotation, or human evaluation.
Resources: Large Language Models, GPU
Background: Machine Learning, NLP (desired: F21NL), software development, conversational agents (desired: F21CA)
Url
Difficulty Level: Variable
Ethical Approval: None
Number Of Students: 4
Supervisor: Ioannis Konstas
Keywords: machine learning, neural networks, large language models, natural language processing (nlp)
Degrees: Bachelor of Science in Computer Science
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI
Master of Science in Human Robot Interaction
Master of Science in Robotics
BSc Data Sciences

Back to List