View Proposal


Proposer
Michael Lones
Title
Tools support to help people avoid machine learning pitfalls
Goal
Description
Machine learning is great, but it’s really easy to make mistakes that invalidate the outcomes of the process. In science, this has contributed towards something known as the replicability crisis, where people publish papers based on the outcomes of applying machine learning, but due to mistakes they make in the machine learning process, the outcomes can’t then be reproduced by other people. This is a very common problem, even for top tier research published in journals such as Science and Nature. It also affects companies and people working in AI, who, after years of development, often find their models don’t work when they try to use them in the wild. This project aims to do something to reduce the number of mistakes that are being made. I’m open to ideas, but one approach might be to develop some kind of python tool that keeps an eye on what a data scientist is doing and warns them when they’re doing something that might effect the validity of their results. This might be a tool that runs in the python execution environment, or something that processes notebooks offline. It might use traditional approaches to parsing code and spotting errors, or could use a large language model to spot more subtle errors. A low-hanging fruit might be to look at preventing data leaks, which are a particularly common form of machine learning pitfall. Some reading to get started: How to avoid machine learning pitfalls: a guide for academic researchers - https://arxiv.org/abs/2108.02497 Leakage and the Reproducibility Crisis in ML-based Science - https://arxiv.org/abs/2207.07048 REFORMS: Reporting Standards for ML-based Science - https://reforms.cs.princeton.edu
Resources
Background
Url
Difficulty Level
Variable
Ethical Approval
None
Number Of Students
1
Supervisor
Michael Lones
Keywords
machine learning, large language models, programming languages
Degrees
Bachelor of Science in Computer Science
Bachelor of Science in Computer Systems
Bachelor of Science in Software Development for Business (GA)
Master of Engineering in Software Engineering
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI
Master of Science in Computer Science for Cyber Security
Master of Science in Computing (2 Years)
Master of Science in Data Science
Master of Science in Network Security
Master of Science in Software Engineering
Bachelor of Science in Computer Science (Cyber Security)