Details - MACS Project System

View Proposal

Proposer: Oliver Lemon
Title: Playpen: Training generative AI models through interactive games
Goal: Take part in the PlayPen challenge and project
Description: Interaction between learner and feedback-giver has come into focus recently for post-training of Large Language Models (LLMs), through the use of reward models that judge the appropriateness of a model's response. In this project, we investigate whether Dialogue Games -- goal-directed and rule-governed activities driven by verbal actions -- can also serve as a source of feedback signals for learning. We use Playpen, an environment for off- and online learning through Dialogue Game self-play, and investigate, for example -- post-training methods: supervised fine-tuning; direct alignment (DPO); and reinforcement learning with GRPO. The framework and the baseline training setups are available at: https://github.com/lm-playpen/playpen You will contribute to this project by, for example: implementing new games, and/or training and testing LLMs and/or VLMs using Playpen You should do F20/21CA Conversational Agents if you take this project.
Resources: Playpen: https://github.com/lm-playpen/playpen
Background: AI, NLP, hames design
Url: External Link
Difficulty Level: Variable
Ethical Approval: InterfaceOnly
Number Of Students: 3
Supervisor: Oliver Lemon
Keywords: ai, llm, games, interaction, evaluation
Degrees: Bachelor of Science in Computer Science
Master of Engineering in Software Engineering
Master of Design in Games Design and Development
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI
Master of Science in Data Science
Master of Science in Human Robot Interaction
Master of Science in Robotics
Bachelor of Engineering in Robotics

Back to List