View Proposal


Proposer
Gavin Abercrombie
Title
Reproducibility of Human Data Collection for Machine Learning Models
Goal
Re-implement a data collection study with human participants and assess how closely the results resemble the original study and effects on downstream tasks
Description
Much of current Machine Learning (ML) is based on human data (i.e. supervised learning, Reinforcement Learning from Human Feedback (RLHF)) and the performance of even very Large Language Models (LLMs) is highly reliant on the quality of the collected data. Following reproducibility crises in other fields, such as Psychology, researchers have begun to examine the extent to which data collection for ML applications such as Natural Language Processing (NLP) is reproducible, finding that it is often difficult or impossible to reproduce the studies [1]. While Machine Learning data sets have typically assumed a single correct label for each data instance, recent work has sought to reflect a range of perspectives [2, 3]. This is because people don’t universally agree, so it is unrealistic to assume that there is 1 “agreed” upon label for each data sample. This project will explore the reproducibility and validity of such data collections, i.e. how well a second data collection can reproduce the results of the original one, and the extent to which humans disagree on the labelling tasks. The project involves human data collection and analysis, and also implementation and reassessment of state of the art ML models’ performance.
Resources
[1] Anya Belz, Craig Thomson, Ehud Reiter, and Simon Mille. 2023. Non-Repeatable Experiments and Non-Reproducible Results: The Reproducibility Crisis in Human Evaluation in NLP. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3676–3687, Toronto, Canada. Association for Computational Linguistics. [2] Gavin Abercrombie, Valerio Basile, Davide Bernadi, Shiran Dudy, Simona Frenda, Lucy Havens, and Sara Tonelli. 2024. Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024. ELRA and ICCL, Torino, Italia. [3] Barbara Plank. 2022. The “Problem” of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10671–10682, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Background
https://reprohum.github.io/
Url
Difficulty Level
Moderate
Ethical Approval
Full
Number Of Students
0
Supervisor
Gavin Abercrombie
Keywords
machine learning, nlp, data collection, evaluation
Degrees
Bachelor of Science in Computer Science
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI
Master of Science in Data Science
Bachelor of Science in Computing Science
BSc Data Sciences