View Proposal


Proposer
Oliver Lemon
Title
Playpen: Training generative AI models through interactive games
Goal
Take part in the PlayPen challenge and project
Description
Interaction between learner and feedback-giver has come into focus recently for post-training of Large Language Models (LLMs), through the use of reward models that judge the appropriateness of a model's response. In this project, we investigate whether Dialogue Games -- goal-directed and rule-governed activities driven by verbal actions -- can also serve as a source of feedback signals for learning. We use Playpen, an environment for off- and online learning through Dialogue Game self-play, and investigate, for example -- post-training methods: supervised fine-tuning; direct alignment (DPO); and reinforcement learning with GRPO. The framework and the baseline training setups are available at: https://github.com/lm-playpen/playpen You will contribute to this project by, for example: implementing new games, and/or training and testing LLMs and/or VLMs using Playpen You should do F20/21CA Conversational Agents if you take this project.
Resources
Playpen: https://github.com/lm-playpen/playpen
Background
AI, NLP, hames design
Url
External Link
Difficulty Level
Variable
Ethical Approval
InterfaceOnly
Number Of Students
3
Supervisor
Oliver Lemon
Keywords
ai, llm, games, interaction, evaluation
Degrees
Bachelor of Science in Computer Science
Master of Engineering in Software Engineering
Master of Design in Games Design and Development
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI
Master of Science in Data Science
Master of Science in Human Robot Interaction
Master of Science in Robotics
Bachelor of Engineering in Robotics