View Proposal
-
Proposer
-
Marta Vallejo
-
Title
-
Improving Class Imbalance in Parkinson’s Sensor Data via Conditional LSTM-VAE and Latent Space Analysis - Collaboration with York University
-
Goal
-
Gaining understanding of real research of neurodegenerative diseases using deep/machine learning. A final journal/conference publication if results are of enough quality.
-
Description
- Parkinson’s Disease (PD) is a neurodegenerative disease of high incidence in the ageing population. This project aims at the application of deep learning technologies to a clinical dataset that contains information on patients with prodromal or early-stage PD. By analysing and processing digitalised movement data captured by three standard clinical assessments, the classifier will be expected to characterise bradykinesia, a slowing of movement, which is the fundamental motor feature of PD. The complex nature of bradykinesia makes it difficult to reliably identify it, particularly at the early stages of the disease (Ahlrichs and Lawo, 2013).
The types of clinical assessments used in this study are the following:
* Finger tapping
* Hand pronation-supination
* Hand opening-closing
* Hand movements measured by accelerometers
The given dataset is significantly imbalanced between PD patients and healthy controls. A previous student project, which used a traditional LSTM, could not converge under these circumstances, assigning all the labels to the majority class. The applied loss-weighted sum, SMOTE and resampling were applied independently, concluding that only resampling was effective. Even with all the effort, the final performance of the model was rather poor.
The next step of the project will consist of:
1.- Investigate the combination of the previous techniques to improve the model's performance.
2.- Using an LSTM-VAE to generate synthetic sequences for the minority class to extend the imbalanced dataset.
3.- Combine with class-conditional generation (e.g., Conditional VAE).
4.- Use t-SNE and/or UMAP to visualise how the synthetic sequences relate to real data in latent or feature space.
5.- Test the effect of augmentation by comparing models trained on real vs. augmented data.
6.- (Optional) Pretrain the LSTM component (e.g., as an autoencoder or sequence predictor) before integrating it into the LSTM-VAE, to improve training stability and representation learning, especially in low-data regimes.
This step of the project will open the door to investigating other techniques, such as GANs for sequences (e.g., SeqGAN, TimeGAN), that are more expressive but harder to train.
- Resources
-
Real datasets used that have been collected in clinical settings.
-
Background
-
Interest in the topic and techniques. Good programming skills. Willingness to learn
-
Url
-
-
Difficulty Level
-
High
-
Ethical Approval
-
Full
-
Number Of Students
-
2
-
Supervisor
-
Marta Vallejo
-
Keywords
-
machine learning, deep learning, sensor data
-
Degrees
-
Bachelor of Science in Computer Science
Master of Engineering in Software Engineering
Master of Science in Artificial Intelligence
Master of Science in Artificial Intelligence with SMI
Master of Science in Computing (2 Years)
Master of Science in Data Science
Master of Science in Robotics
Master of Science in Software Engineering
Bachelor of Science in Computing Science
Bachelor of Engineering in Robotics
Bachelor of Science in Statistical Data Science
BSc Data Sciences