View Proposal


Proposer
Marta Vallejo
Title
Modelling Disease Progression in ALS Mouse Models Using Multimodal Data - Collaboration with the University of Zaragoza (Spain)
Goal
Working with real clinical data and help to understand better the mechanisms in ALS.
Description
Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that currently lacks effective predictive tools for stratifying patients or understanding early markers of disease severity. In this project, you will focus on applying machine learning and deep learning techniques to data obtained from transgenic SOD1G93A mice, a widely used preclinical model of ALS. The goal is to classify animals into fast vs slow disease progression categories based on multimodal data collected longitudinally. This classification will support the development of tools for early prognosis and guide translational work in human ALS patients.
Resources
The dataset comprises a comprehensive collection of preclinical information collected from SOD1G93A mice. Specifically: • Ultrasound imaging and measurements: muscle dimensions from two anatomical locations, with upcoming experiments providing three longitudinal ultrasound measurements for each hindlimb, alongside body weight over time. • Surgical imaging: images taken before and after muscle biopsies, as well as images of the extracted muscle tissue itself. • Post-surgical recovery images: visual records of animal recovery 1–2 days after biopsy. • Molecular data: gene expression profiles of selected biomarkers obtained from muscle biopsies. • RNA metrics: quantity and quality of RNA extracted from each biopsy, measured at three key time points: early symptomatic, late symptomatic and endpoint stage. These data form a comprehensive view of the disease's physical, molecular, and morphological progression in each subject.
Background
The project will involve the design of a classification pipeline to distinguish fast vs slow progressors using the data modalities above. You will begin with feature extraction techniques, including: • From ultrasound images: use of radiomics (texture, intensity histograms, shape), traditional image processing (e.g., edge detection, HOG), and possibly convolutional neural networks (CNNs) pretrained on medical images (transfer learning). • From molecular data: feature selection based on biomarker variance, correlation analysis, and unsupervised clustering to explore transcriptomic profiles. • From clinical metadata (e.g. weight curves and RNA yield): time-series features such as rate of weight loss or variability in RNA quality. These features will feed into classification models such as Random Forests, Support Vector Machines, and XGBoost, with the possibility of exploring early fusion strategies combining imaging and omics data. Model performance will be evaluated using accuracy, sensitivity, specificity, AUC, and MCC. Emphasis will be placed on model interpretability, using SHAP for feature attribution and Grad-CAM for CNN-based imaging analysis.
Url
Difficulty Level
Variable
Ethical Approval
Full
Number Of Students
3
Supervisor
Marta Vallejo
Keywords
machine learning, deep learning, healthcare
Degrees
Bachelor of Science in Computer Science
Master of Science in Artificial Intelligence
Master of Science in Computing (2 Years)
Master of Science in Data Science
Bachelor of Science in Computing Science
Bachelor of Science in Statistical Data Science
BSc Data Sciences