All Proposals

All Proposals

617 proposals found

Title

Proposer

Campus

Reproducibility of Human Data Collection for Machine Learning Models

machine learning nlp data collection evaluation

Gavin Abercrombie

Edinburgh

Details

Much of current Machine Learning (ML) is based on human data (i.e. supervised learning, Reinforcement Learning from Human Feedback (RLHF)) and the performance of even very Large Language Models (LLMs) is highly reliant on the quality of the collected data. Following reproducibility crises in other fields, such as Psychology, researchers have begun to examine the extent to which data collection for ML applications such as Natural Language Processing (NLP) is reproducible, finding that it is often difficult or impossible to reproduce the studies [1]. While Machine Learning data sets have typically assumed a single correct label for each data instance, recent work has sought to reflect a range of perspectives [2, 3]. This is because people don’t universally agree, so it is unrealistic to assume that there is 1 “agreed” upon label for each data sample. This project will explore the reproducibility and validity of such data collections, i.e. how well a second data collection can reproduce the results of the original one, and the extent to which humans disagree on the labelling tasks. The project involves human data collection and analysis, and also implementation and reassessment of state of the art ML models’ performance.

Difficulty: Moderate

Safety and Bias in Dialogue Systems and Large Language Models

safety bias nlp dialogue systems large language models

Gavin Abercrombie

Edinburgh

Details

Automated dialogue systems are becoming ubiquitous in our homes, on our smart devices, and on the internet, and, with recent advances in Large Language Models, the quality of chatbots and voice assistants is rapidly improving — to the extent that they can sometimes be mistaken for humans [1]. But as the quality of end-to-end dialogue systems improves, so does their capacity to learn unsafe behaviours from data on which they are trained [2]. They also run the risk of responding inappropriately to unsafe or toxic user input [3]. Potentially undesirable behaviours include offensive outputs (abuse, hate speech etc.), as well as the failure to detect and mitigate such language in user inputs, and generation of inappropriate responses in safety-critical situations, such as offering medical or legal advice, or responding to/generating sensitive content. In this project, we will examine one or more of the following aspects of safety for conversational AI: - Evaluation and/or detection of unsafe user input and/or system outputs (abusive language, safety critical topics, sensitive content etc.). - Evaluation of societal biases in the outputs of conversational systems and Large Language Models (LLMs) - Mitigation of unsafe content e.g. evaluation of system response strategies, generation of appropriate responses.

Difficulty: Moderate

Learning with Disagreement

nlp nlg classification

Gavin Abercrombie

Edinburgh

Details

Traditionally, NLP datasets have consisted of data annotated with single 'gold standard' labels. But in reality, people often disagree on their interpretation of the meanings behind natural language expressions. This project will tackle the problem of how to model datasets that include multiple labels for each item and harness the disagreement information for better classification and generation performance across a range of NLP tasks. There will also be the opportunity to compare performance against entries to the 2025 shared task on Learning with Disagreements (LeWiDi).

Difficulty: Moderate

Machine Translation of English-Scots (or other language pairs)

machine translation nlp rule-based ml part-of-speech tagging computational linguistics

Gavin Abercrombie

Edinburgh

Details

Apertium (www.apertium.org) is a free, open-source platform for developing rule-based, shallow-transfer machine translation systems, which was initially developed for translation between closely related languages (Forcada et al., 2011). This project will aim to update and improve the English-Scots (Abercrombie, 2016) -- or another underdeveloped language pair found in the Apertium incubator. The project will involve: data collection and processing, training classifiers, human and automated evaluation.

Difficulty: Moderate

Decision tree-based classification of red-light violation among drivers

Eric Nimako Aidoo

Dubai

Details

Traffic lights are one of the road transportation systems designed at the road environment to regulate competitions among road users at intersections. In the absence of traffic lights at intersections road users are at risk of road crashes. Although traffic lights serve as a medium of regulating conflicts among road users at intersections, different studies have shown that not all road users comply with the red signals. Thus, classification of red-light violation among drivers will be important to support training and policies in road transportation safety. In this study, decision tree-based model will be developed to classify red-light violations and the associated risk factors among drivers.

Difficulty: Variable

Machine Learning Applications in Disease Modelling

machine learning models noncommunicable diseases incidence model comparison

Eric Nimako Aidoo

Dubai

Details

Noncommunicable diseases such as cancer, diabetes, respiratory diseases, and cardiovascular diseases remain major challenges in global health management. Despite advancements in healthcare delivery and accessibility, it is estimated that 41 million people die annually due to noncommunicable diseases. Among several factors, climate change is increasingly recognized as a significant factor influencing the incidence and mortality of noncommunicable diseases. For instance, existing studies have shown that extreme temperatures impact cardiovascular diseases, while air pollution affects respiratory diseases. The need for effective and efficient machine learning ensembles to uncover complex patterns and relationships in such data has become important.

Difficulty: Moderate

Machine Learning Applications in Road Traffic Crashes

machine learning models road traffic crashes model comparison

Eric Nimako Aidoo

Dubai

Details

Road traffic crashes is one of the leading course of death across the globe. It has a significant impact on the victims, family, society and the economy as a whole. Despite advancements in road infrastructure and automobiles, it is estimated that approximately 1.19 million people die each year worldwide due to road traffic crashes. There are several factors that influence road traffic crashes and the associated mortality. The need for effective and efficient machine learning ensembles to uncover complex patterns and relationships in such data has become important.

Difficulty: Moderate

Machine Learning-based Street Sign Detection for Road Safety Enhancement

Eric Nimako Aidoo

Dubai

Details

Difficulty: High

Classification of maternal health risk level during pregnancy: A comparison of machine learning approaches

machine learning models maternal health risk level risk factors health parameters

Eric Nimako Aidoo

Dubai

Details

Maternal health during pregnancy remains one of the major public health concerns worldwide, particularly in low- and middle-income countries where the mortality rate is high. Early detection of maternal health risks during pregnancy is critically important for improving maternal healthcare and birth outcomes. Traditionally, maternal health risk assessment has relied on manual evaluation by clinical experts. However, with advancements in technology, machine learning models have emerged as valuable tools for supporting healthcare by identifying complex patterns within large datasets gathered from medical records. This study aims to develop and compare several machine learning models to classify maternal health risk levels during pregnancy. Additionally, it will explore the relationship between health risk levels and various clinical parameters among pregnant women to identify potential risk factors that can support early diagnosis and treatment.

Difficulty: Moderate

Bayesian modelling of maternal health risk level during pregnancies

bayesian models maternal health risk level risk factors health parameters

Eric Nimako Aidoo

Dubai

Details

Maternal health during pregnancy continues to be a significant public health issue across the globe, particularly in low- and middle-income countries where maternal mortality rates remain alarmingly high. Identifying health risks in pregnant women at an early stage is essential for timely interventions that enhance both maternal and neonatal health outcomes. Traditional methods of risk assessment often rely on clinical expertise and fixed scoring systems, which may not fully account for the inherent complexity and uncertainty within patient data. Bayesian modelling presents a compelling alternative by integrating prior clinical knowledge with observed data and explicitly quantifying uncertainty in risk predictions. This study aims to utilise Bayesian modeling techniques to predict maternal health risk levels during pregnancy and examine the relationships between these risk levels and potential risk factors that can support early diagnosis and targeted interventions, ultimately contributing to more effective maternal healthcare strategies.

Difficulty: Moderate

A comparison of machine learning models for breast cancer risk classification

machine learning models breast cancer risk risk level classification risk factors.

Eric Nimako Aidoo

Dubai

Details

Breast cancer remains one of the most common cancers among women, affecting millions of individuals each year. Timely identification of breast cancer is crucial for improving patient outcomes and reducing mortality rates. Traditionally, breast cancer diagnosis has relied on manual evaluation by medical professionals, including physical examinations, imaging techniques, and biopsy analysis. However, with the advent of advanced technologies, machine learning models have emerged as powerful tools in healthcare, enabling the identification of complex patterns within vast datasets derived from medical records, imaging scans, and genetic profiles. This study aims to develop and compare various machine learning models to classify breast cancer risk levels and enhance diagnostic precision. Furthermore, it will investigate the relationship between risk levels and potential risk factors to support early detection and targeted treatment strategies.

Difficulty: Moderate

Modelling risk factors for Breast cancer: a comparison of Bayesian and frequentist approaches

bayesian models breast cancer risk risk level classification risk factors.

Eric Nimako Aidoo

Dubai

Details

Breast cancer remains one of the most common cancers among women, affecting millions of individuals each year. Timely identification of breast cancer is crucial for improving patient outcomes and reducing mortality rates. Traditionally, breast cancer diagnosis has relied on manual evaluation by medical professionals, including physical examinations, imaging techniques, and biopsy analysis. These methods may not fully account for the inherent complexity and uncertainty within patient data. Bayesian modelling presents a compelling alternative by integrating prior clinical knowledge with observed data and explicitly quantifying uncertainty in risk predictions. This study aims to utilise Bayesian modeling techniques to predict breast cancer risk levels and examine the relationships between these risk levels and potential risk factors to support early detection and targeted treatment strategies. The performance of the Bayesian framework will be compared to the conventional frequentist approach of risk classification.

Difficulty: Moderate

Classification of cardiovascular disease risk: A comparison of machine learning approaches

machine learning models cardiovascular disease risk level classification risk factors.

Eric Nimako Aidoo

Dubai

Details

Cardiovascular disease (CVD) remains one of the leading causes of morbidity and mortality worldwide, affecting millions of individuals annually. Early detection of CVD is essential for improving clinical outcomes and reducing the burden on healthcare systems. Traditionally, diagnosis and risk assessment for cardiovascular conditions have relied on clinical evaluation, medical imaging, blood tests, and patient history. However, the rise of advanced technologies has paved the way for the integration of machine learning in healthcare, offering powerful capabilities to uncover complex patterns within large-scale datasets, including electronic health records, imaging data, and genetic information. This study aims to develop and compare various machine learning models to classify cardiovascular disease risk levels and improve diagnostic accuracy. Additionally, it will explore the associations between identified risk levels and contributing factors, supporting early intervention and personalized treatment strategies.

Difficulty: Moderate

Cryptanalysis of Encryption Schemes

Fadi Alhaddadin

Dubai

Details

Difficulty: Easy

Cloud architecture and health informatics

Fadi Alhaddadin

Dubai

Details

Difficulty: Easy

Data Privacy

Fadi Alhaddadin

Dubai

Details

Difficulty: Easy

Green Computing

Fadi Alhaddadin

Dubai

Details

Difficulty: Easy

Evaluate the Ability to Generate Context-Aware Responses in Customer Service Scenarios using LLMs.

machine learning large language models natural language processing (nlp) text minning

Abdullah Almasri

Malaysia

Details

The task is to utilise LLMs for generating responses to customer service queries using a dataset of customer service queries and responses, evaluate response quality metrics, compare with human-generated responses, and propose improvements for optimising LLMs' performance in customer service applications.

Difficulty: Moderate

Comparative Analysis of LLM vs. Human-Written Summaries for Scientific Articles.

machine learning large language models natural language processing (nlp) text summarization text minning

Abdullah Almasri

Malaysia

Details

The task is to use LLMs to generate summaries of scientific articles and compare them to summaries authored by human experts; evaluate the effectiveness of LLM-generated summaries based on criteria such as completeness, clarity, and accuracy; and identify the strengths and weaknesses in LLMs' summarization capabilities.

Difficulty: Moderate

Graphical character for conversational interaction

face animation python

Matthew Aylett

Edinburgh

Details

In human dialog, participants are able to interrupt each other at any point. Once a dialog participant has been interrupted they may cede the floor (let another person speak) or alter what they are saying to show they are actively listening. In this project a graphical character will be implemented using Unity and Python that allows some limited behaviour (e.g. head nods, eye brow raise), that is integrated with a speech synthesis system (CereProc) allowing basic lip syncing and output speech. An API will be designed and built over http that will allow a client Python program to stream content (e.g. text for the character to speak and instructions for behaviour), poll the character (establish within a narrow time window what the character has said or done from the streamed content), and interruptible (will allow the system to stop the character gracefully or to stop the character and begin producing new content).

Difficulty: Challenging

A GRAPHICAL USER INTERFACE FOR CONSTRUCTING AND EDITING VOCAL PUPPETRY

speech technology hci gui programming

Matthew Aylett

Edinburgh

Details

Neural TTS (text to speech) systems can allow very fine specification of speech output allowing the intonation and speech rate from a source speaker to be used to guide a synthetic voice and replicate the source speakers delivery. Much spoken output generated by TTS does not need to be in real time. For example, producing audio for a speech or media performance. CereProc Ltd has developed a prototype system for taking source speech and transcription and creating XML markup to realise the same delivery in a synthetic engine. In this project a graphical user interface (GUI) will be produced to allow interactive tuning of this output. The GUI will be evaluated against a set of users and output compared to baseline TTS.

Difficulty: Challenging

Masterclass in cloud speech technology

Matthew Aylett

Edinburgh

Details

There are many cloud resources now available for speech synthesis and speech recognition. In a simple form they can be used to add speech functionality to web pages, or speech functionality to standalone applications and digital games. Many of the systems can be either run on a request by request basis or streamed (to process before completion). Speech recognition can be configures to work slower for better recognition or faster, it can be given pronunciation of expected and unusual words such as proper name. Speech synthesis offers many different voices and often fine control of speech style and intonation. The objective of this project is to produce teaching materials to support the use of these systems and a tutorial to give a hand on experience of using them.

Difficulty: High

Robot Voice Separation using LSTMs

social robotics speech technology dialog systems conversational user interfaces

Matthew Aylett

Edinburgh

Details

Current robots (furhat, haru etc) typically stop listening to their microphone when speaking to avoid interfering with speech recognition results. This means robots can't be interrupted and can't produce back channels (the yeah/okay that shows you are listening). In this project you will extend a small audio corpus built to test robot voice separation, set up the Kaldi ASR system, design and train an neural net solution for altering ASR parameters to remove the effect of the robot voice and maintain recognition accuracy.

Difficulty: High

Multimodal Interactive Voice Response (IVR) System

speech technology gui interface realtime programing phone api

Matthew Aylett

Edinburgh

Details

IVR systems are phone based and allow customers and clients to call a company and automate various actions such as booking and appointment or asking for details of a delivery. Many companies are keen to move over to app or webbased systems to carry out the interaction as a visual element can often make the process faster and easier. However, a phone based system is often required for customers and clients who are not comfortable with web based or app based systems and also allow the option of speaking to a customer service representative if the problem is difficult to resolve. In this project we explore the use of a multi-modal IVR system which concurrently offers both interaction and a visual GUI style interaction which allows users to freely switch between both modalities.

Difficulty: Moderate

Towards a Conversational Agent to Assist in Meetings

conversational interaction speech technology

Matthew Aylett

Edinburgh

Details

The majority of conversational systems act in a one-to-one setting (Aylett & Romeo). This allows the system impose its own turn-taking strategy on the conversation (typically a speak-wait strategy). However in a multi-party dialog a system will need to adopt a more human fluid turn-taking approach. In addition it faces significant challenges such as real-time diarization (who said what when), speaker overlap and complex human turn-taking where to take the floor requires the system to predict a point in the conversation where that could occur and signal to the other dialog partners it wishes to do so (Gillet et al). In this project we will focus on setting up a multi-party meeting recording system based on work by Honda Research Institute Europe using a Konnect depth camera and a Respeaker USB mic array (Wange et al). You will record 4-5 meetings with four of five participants using one of the following role-plays: Role-play 1 Your company wants to organise a Work–Life Balance day. The aim of the event is to get employees to see colleagues as people with real lives outside the workplace, and therefore to be more supportive, understanding and friendly towards each other. There is a very limited budget, and the event will take place on a normal working day, without dramatically reducing employees’ productivity during that day. You and some other junior members of staff have been asked to plan the events for the day. Hold a brainstorming meeting to plan the event. Role-play 2 Your company wants to hold a Staff Integration event, to enable employees from different teams and work locations to get to know each other and build relationships. You and other senior managers meet to plan a budget for this event (in terms of cost per employee) and to brainstorm ideas for the event. Using output from the array microphone and Konnect you will analyse the recordings as if they are in real-time using directional information to diarize the recordings and use automatic speech recognition from Azure to transcribe the data. You will compare with IBM diarization to evaluate the process. Finally with and without diarization information you will generate a prompt for an LLM to summarise the meeting. The recording and transcription and logged positional information from the microphone and connect will be released as an open data resource for the community.

Difficulty: Challenging

Human-like turn-taking

speech technology conversational agents dialog systems human evaluation

Matthew Aylett

Edinburgh

Details

Current conversational systems with robots and artificial agents are typically speak-wait systems. The system uses speech synthesis to produce output, waits for a user to say something. When the user stops speaking, the system processes the input speech and generates a response on its content. However, human conversation does not work like this. Human response times to a dialog partner are on average 200ms, they also can interrupt, they can show they are listening by producing backchannels ('yea', 'aha', 'okey etc). Significance progress has been made with incremental speech recognition and with the use of transformer models to create natural conversational content, this project is about putting this all together with previous work analyzing human-turn taking to produce a fluid conversational system.

Difficulty: Moderate

Human Robot Interaction for Health

Lynne Baillie

Edinburgh

Details

The honours project concept should focus on how a social robot could assist someone, through Human Robot Interaction with a health or assistive living need. Assistance and training will be given regarding the robot selected for the project. The student should be a reasonable programmer.

Difficulty: Moderate

Social Robots Helping Children

human robot interaction

Lynne Baillie

Edinburgh

Details

This project will explore the ways in which a social robot could assist children who have severe sight issues. The project will have three phases. In the first phase of the project, the student will explore the literature on research into children learn about the interactions they can have with a social robot. In the second phase, the student will implement and enable a set of interactions to take place with at least one social robot. The student will be required to implement these interactions on at least one social robot. Lastly, the student will deploy the social robot with the implemented interactions in order to evaluate the usefullness of the interactions with a set of students with severe sight issues They will assess which interactions were the most successful.

Difficulty: High

Quantifying cross-sectoral discrepencies between ethnic groups to support analysis into their use of key services (health, housing and energy).

Lynne Baillie

Edinburgh

Details

Difficulty: High

Improving Outdoor Positoning Solutions

Phil Bartie

Edinburgh

Details

GNSS (e.g. GPS, GLONASS) offers a very useful positioning solution for outdoor situations. In most cases these technologies can locate a user to withing 10metres of their actual location. However for some tasks (e.g. robotics, autonomous vehicles, pedestrian navigation guides) this accuracy is not great enough. Positional accuracy is particularly problematic in urban environments (e.g. buildings occluding direct line of sight to the satellites). Pedestrians are also much harder than vehicles to locate accurately as they can turn on the spot, and don't have to follow road regulations. This project will explore a variety of solutions to improve on the performance using map matching techniques, particle filters, and GNSS shadowing. This will involve developing code to process GNSS positions in conjunction with geospatial datasets for roads, pavements and buildings. Building heights could also be used for modelling theoretical lines of sight to GNSS satellites. Ideally software will be written in Python and made available as opensource at the end of the project.

Difficulty: High

Location Based Diary

Phil Bartie

Edinburgh

Details

We are used to electronic diaries which issue reminders based on time, however there are also occasions where the time element is not as important as the location. For example 'remind me to get some superglue next time I'm at the supermarket'. This doesn't have a strict temporal filter but a prompt next time you visit a store would be useful. This project will look to develop a service/application which allows the user to set both location and temporal triggers for reminders. This will involve using the user's location, map data for various feature types, and a basic ontology (e.g. Tesco, Sainsuburys, Aldi = all supermarkets). The application could be extended to notify when near friends, to consider mode of transport (walk, cycle, car - using Android API), and more advanced features such as modelling lines of sight and least cost paths rather than Euclidean distances.

Difficulty: Moderate

Location Sharing Mobile App

Phil Bartie

Edinburgh

Details

There are location sharing apps, even Google Maps can do that. These need to be setup in advance, such that the person sharing the information knows the person able to receive the updates. Updates are sporadic and limited to a location, and phone battery life, and minutes since the update. This project looks to extend this functionality to enable location sharing to anyone, and to groups. For example a user may register a temporary online name and be allocated a group code with the service. They then share this group code with their friends via WhatsApp. Now all in that group can see the locations of all others in that group. The user can leave or rejoin the group as they wish, and also be members of several groups. Also updates will include the current speed of movement, direction, location, timestamp, and the username. Further functionality would include allow a parent/guardian to automatically request a location from a child via the app. The server side could be developed with a web based mobile client for testing purposes, but ideally the student taking on this project would use Android Studio to develop a native app which uses the background demon capabilities of Android OS. Requirements: Spatial database (eg PostgreSQL + PostGIS), mobile (or web) client [native Android dev. preferred] , server side dev (eg Python, NodeJS, PHP), web based mapping (eg leaflet.js)

Difficulty: High

GeoAI - LLM and Maps

Phil Bartie

Edinburgh

Details

The aim of this project is to build a system which combines a natural language interface (text/spoken/map) with searching spatial datasets. It should allow the user to ask questions which can be answered via text and mapped (eg highlight results on the map, zoom+pan the map) This consists of 4 stages: Stage 1 - load spatial datasets into a suitable data management system Stage 2 - build a UI to support querying the data (eg text, speech, map GUI) Stage 3 - develop an ontology to link space to activities (idea of 'place') (eg suitable places to walk include paths, parks, beaches, etc). Stage 4 - add functions to process the data to answer more complex questioins (eg find the closest point on River A to Road B; how many post boxes in Edinburgh, etc) It would be preferred to use open/free tools rather than Google Map API etc. Suggested Tools / Data: Python - eg Geopandas PostgreSQL + PostGIS - spatial data storage and analysis QGIS - map editing OpenStreetMap / Overture Maps Ordnance Survey Data via Digimap (EDINA) Wordnet / ConceptNet / DBPedia -- to help develop an action +place ontology LM Studio / Ollama + LLM Or OpenAI API (costed) Leaflet.js / OpenLayers.js / Leafmap (Python) / Folium (Python) WhisperAI - for speech input ASR

Difficulty: High

Open Source Intelligence - Image Comparison using Computer Vision

Phil Bartie

Edinburgh

Details

A challenge in Open Source Intelligence is in finding similar but not identical images which can be used to validate an event occurred. This project will use tools such as YOLO to identify objects in scenes and OpenCV to compare images. The goal is to produce a data processing pipeline that can check a list of images and find related objects to validate they are taken at the same event, but from different angles/times. A VirtualBox VM with the necessary tools installed can be made available if required. Coding in Python preferred.

Difficulty: Moderate

Simulated City: Spatial Agent Based Modelling

Phil Bartie

Edinburgh

Details

Build a web-based spatial agent based modelling system for urban transport (Simulated City). This should work with real world coordinates and a map base, implementing multiple agents including the environment (map, traffic lights, vehicles of different kinds including bicycles). The agent based simulation should be able to run on a web platform, this might be in a Jupyter Notebook or as a web app. It should allow for user input of factors such as traffic light sequence timings, different road speeds. Prefered technologies: Python, PostgreSQL/PostGIS, Leaflet.js / OpenLayers.js

Difficulty: Moderate

IOT Sleep Tracker

Phil Bartie

Edinburgh

Details

IOT Sleep tracker: Most sleep trackers are based on watches but that can be frustrating if you need to charge your watch or don't like wearing a watch to sleep at night. This project will use an Arduino and piezo sensor to build an app which measures movements from the mattress. The arduino will send data to a server for storage and analysis. Analysis could include simple statistics such as length of sleep, but also ML could be used to determine patterns of sleep (shallow, deep, restless). A web gui could be developed as part of the project to visualise trends.

Difficulty: Moderate

Streetview to Text

Phil Bartie

Edinburgh

Details

Using image to text models this project will turn a set of streetview images (eg 4000 images for Edinburgh at known locations) into text descriptions using a machine learning mode (eg Places365, OpenAI Clip). These descriptions will then be summarised at various scales to discover similar regions.

Difficulty: High

Text to Data Tools

Phil Bartie

Edinburgh

Details

Website can contain valuable data in the form of text (e.g. web text, PDFs, docx files). NLP (inc LLMs) allows extraction of the data within the text, parsing it to find specified items (eg dates, locations, names of people, tools, costs, and other values). This project will be specifically focus on producing an application that performs this job for specified target tasks (eg locations where natural capital tools are used). The output would be a set of web URLs, documents (eg PDFs), and the corresponding values stored in a database (e.g. locations, costs, tool names, organisation using the tools). A UI to search the data could also be developed, including highlighting spatial locations where tools are used and linking to the relevant documents.

Difficulty: Moderate

Indoor Positioning using Computer Vision

Phil Bartie

Edinburgh

Details

GNSS (eg GPS) is a wonderful solution for positioning a device outside. However indoors tracking is much more difficult given the obfuscation of the roof and building materials between the receiver and satellites. There are a number of options for indoor positioning including WiFi fingerprinting, setting up bluetooth beacons, IMU (e.g. foot tracking). There are also solutions based on VPS (visual positioning systems) which use the camera and computer vision to locate the user from a library of previoiusly captured images. This project will develop a simple mobile client which sends image updates from the front camera of a phone held looking upwards to a server. The server will carry out comparisons looking for correspondences within a library of previously captured images. The result would be being able to give a location back to the user which locates them on an indoor map. The project will involve computer vision (eg OpenCV, scikit-image) and development of a web app that takes a camera feed from a mobile device. For this project we particularly want to focus on how well tracking a ceiling will work for positioning around indoor spaces (eg university buildings).

Difficulty: Moderate

Detection and Visualisation of Structural Pattern in Java Programs by Using Graph Isomorphism

structural design pattern java graph isomorphism

Thomas Basuki

Malaysia

Details

Design patterns have been used for many years in object-oriented software development. Its use is then extended to represent many other patterns such as interaction patterns and security patterns. Design patterns are often described in diagrams such as UML diagrams. In general, design patterns can be divided into structural and behavioural design patterns. In our previous project, we have developed software that detects the occurrence of a structural design pattern in a set of Java programs. The software accepts a design pattern represented in a UML class diagram, which is stored in an XML file. The algorithm we chose extracts graph structures from both the class diagram and the program and compare them based on cosine similarity. This technique is very efficient in detecting the pattern but not very accurate. In this project, we propose to use graph isomorphism to detect the occurrence of a structural design pattern in the Java programs. We also plan to extend the software with a capability to visualise the patterns found in the programs.

Difficulty: Moderate

Detection of Behavioral Design Pattern in Java Programs

behavioral design pattern uml java

Thomas Basuki

Malaysia

Details

Design patterns have been used for many years in object-oriented software development. Its use is then extended to represent many other patterns such as interaction patterns and security patterns. Design patterns are often described in diagrams such as UML diagrams. In general, design patterns can be divided into structural and behavioural design patterns. In our previous project, we have developed software that detects the occurrence of a structural design pattern in a set of Java programs. The software accepts a design pattern represented in a UML class diagram, which is stored in an XML file. In this project, we propose to extend the detection of design patterns to include behavioural patterns. We may need to consider other UML diagrams for this purpose such as activity diagram, sequence diagram or state machine diagram.

Difficulty: High

XML Query Processing Based on Labeling Scheme

xml labeling schemes

Thomas Basuki

Malaysia

Details

XML has been accepted as the current standard for data exchange. Many techniques have been proposed for efficient storage and query processing on XML. In general, XML documents are usually represented as trees. One of the techniques for XML query processing is based on labelling schemes of XML nodes. Many labelling schemes have been proposed, each with their strengths and weaknesses. This project will study the labeling schemes that have been proposed so far and implement a few selected schemes for comparison.

Difficulty: Moderate

EEG based authentication

eeg signnasl machine learning security

Hadj Batatia

Dubai

Details

Use public data sets to train machine learning models to identify individuals from their EEG signals.

Difficulty: Easy

Graph-based code generation using LLMs

llm code generation graphs

Hadj Batatia

Dubai

Details

The objective of the project is to represent the intended programme as a graph. A code generator would analyse the graph and generate, debug, and run the code.

Difficulty: High

Generative AI for embedded systems

real time ai llm embedded systems

Hadj Batatia

Dubai

Details

Difficulty: High

Physics-informed neural networks for image denoising

deep learning physics informed networks inverse problems

Hadj Batatia

Dubai

Details

Difficulty: High

Deep learning for vision-based drone localisation

aerial images satellite images drone localisation gps-less navigation deep learning

Hadj Batatia

Dubai

Details

Difficulty: High

Self-supervised deep learning for image reconstruction

self-supervised learning inverse problems high dynamic range imaging compressed sensing

Hadj Batatia

Dubai

Details

The developed self-supervised deep learning models can be applied to the reconstruction if HDR images from compressed sensing.

Difficulty: High

Aerial hyperspectral images for nutrients estimation in crops

hyperspectral images machine learning

Hadj Batatia

Dubai

Details

Difficulty: Easy

A recommendation system for learning materials

Diana Bental

Edinburgh

Details

Vocabularies and ontologies (such as Dublin Core and schema.org) exist to identify and classify learning resources so that teachers and learners can search for suitable resources. The student will investigate existing vocabularies and ontologies such as schema.org, and use these to design and implement a prototype system. https://www.dublincore.org/resources/metadata-basics/ https://schema.org/LearningResource https://dl.acm.org/doi/abs/10.1145/3041021.3054160

Difficulty: High

To evaluate and extend a "Bechdel test" for computer games

Diana Bental

Edinburgh

Details

The Bechdel test for film suggests that a film should a) have at least 2 women characters b) they should talk to each other c) about something other than men. This simple test of female roles in film has produced a lot of discussion and some change in film practice. But how about computer games? Is there an equivalent test? How would it apply? is there any progress towards meeting such a thing? A previous student project has developed and evaluated a potential test. This project will replicate that test with new subjects, and develop and extend the test.

Difficulty: High

Tailored apps and tools

user model tailoring personalisation

Diana Bental

Edinburgh

Details

Projects in user modeling, personalisation and tailored information. Build an app that provides information in some area of interest - this could be a sport activity, craft, tourist activity. The app can provide information and recommendations. Information and recommendations can be tailored according to relevant information about the user - location, time of day the app is being used, and user preferences and skills.

Difficulty: Variable

Digital Personalities and Virtual Influencers

Diana Bental

Edinburgh

Details

We are increasingly engaging with "digital personalities" online and offline. This project will investigate the use of "digital personalities" and current awareness and attitudes towards Virtual Influences in society. For this project you will survey recent literature about trends in Virtual Influencers and how they are described in popualar media (news, film etc) Build on studies from literature on related technologies such as AI and Bots, and conduct interviews with members of the public to understand their attitudes and concerns. As part of this project you may also implement protoypes or small example systems which demonstrate aspects of influencer system for use in your study.

Difficulty: Variable

Usable privacy choices

Diana Bental

Edinburgh

Details

You will need to: research different privacy mechanims: research different metricsa and emchanisms to evaluate how successful they are for users; identify suitable privacy interfaces and metrics; conduct a usability evaluation of the selected inrerfaces; design and prototype a privacy interface that is intended to imporve on existing interfaces in some way; evaluate the improvement. Metrics for Success: Why and How to Evaluate Privacy Choice Usability Lorrie Faith Cranor, Hana Habib Communications of the ACM Volume 66 Issue 3March 2023 pp 35–37 https://doi.org/10.1145/3581764

Difficulty: High

A "what and when to study" app

Diana Bental

Edinburgh

Details

Students often find it difficult to schedule their work and decide when to study, what needs to be done. Apps exist which privide this kind of advice and reminders for other personal goals, such as exercise and wellbeing. An app could give personalised suggestions for study material e.g. "watch the following recordings" "try the following exercises" "stdy these handouts before the lecture"; warnings up upcoming deadlines. Devise an app in which students can set learning goals and get personalised reminders relevant to theor courses. The app should take into account external factors such as exam dates and coursework deadlines, weightings and content.The app design will need to consider and respect privacy. Digital twins and artificial intelligence: as pillars of personalized learning models Furini, Gaggi, Mirri, Montangero, Pelle, Poggi, Prandi Communications of the ACM Volume 65 Issue 4 April 2022 pp 98–104 https://doi.org/10.1145/3478281

Difficulty: High

Heterogeneous Ensemble Topic Models

topic modelling large language models ensemble methods

Pierre Le Bras

Edinburgh

Details

Ensemble approaches to modelling topics from large test corpora have shown to improve topic coherency topics and model stability, compared to traditional approaches. However, most attempts so far have concentrated on the evaluation of homogeneous ensembles. The emergence of new topic modelling systems (e.g., BERTopic, Top2Vec) offers the possibility to explore heterogeneous ensembles, mixing these new approaches to classical ones (e.g., NMF, LDA). This project would involve the integration of multiple topic modelling technique in one system, followed by the computation of ensemble topic models (topical alignment and/or weighted term co-association), and finally the evaluation of several metrics of interests.

Difficulty: Challenging

Comparison of Topic Model Visualisations

topic model visualisation user study

Pierre Le Bras

Edinburgh

Details

The data generated by topic models is a rich multi-dimensional set of probabilities, which naturally poses challenges when presented to non-expert users. Over the years, several systems have been built to allow this data exploration by visualising the output of topic models, for example: LDAVis, BERTopic, Topic Mapping (see URL). This project proposes to establish the affordances and hinderances of these many systems empirically by designing a user-based study to quantitatively and qualitatively measuring key metrics. The project would preferably involve the creation of interactive interfaces (one per method) and the iterative development of a rigorous user study, followed by the evaluation of results.

Difficulty: High

Open project in interactive data visualisations

data visualisations interactive systems

Pierre Le Bras

Edinburgh

Details

This project is open to students with an interest or idea involving data visualisations, please contact me to discuss your ideas BEFORE selecting this project. Example of projects include: - building bespoke interactive visualisations for complex datasets - user evaluation of interactive data visualisation systems - educational/explainable software involving intuitive data visualisation - spatial data visualisation systems

Difficulty: Variable

Empirical Study of Grid Mapping Visualisation

data visualisation grid mapping similarity data clusters

Pierre Le Bras

Edinburgh

Details

Visualising the similarity items is a multi-dimensional problem that requires the implementation of mapping strategies, inevitably introducing errors and making compromises. This project proposes to establish a list of viable strategies, implement them and evaluate their performance against selected metrics.

Difficulty: High

Building a Corpus Analysis Pipeline in Rust

text analysis data mining rust data pipeline

Pierre Le Bras

Edinburgh

Details

While Python has established itself as the de facto data processing and analysis programming language for years, other languages and their features have been seemingly left out. The project aims to investigate how the programming language Rust can perform when building a text corpus analysis configurable pipeline.

Difficulty: High

From EDA to Data Apps using Observable

data visualisation javascript observable plots data analytics geospatial data

Pierre Le Bras

Edinburgh

Details

Observable (observablehq.com) has slowly established itself as one of the cornerstones of data visualisation and analytics tools. From providing an online interactive notebook platform, to creating a declarative charting library (Plots), then introducing their Data Analytics static site generator (Framework), and finally introducing reusable Data Visualisation components with live data (Data Apps). This project aims to build learning resources to get started with the essential tools provided by Observable. Typically, the lecture and lab should help learners get the skills needed to build "complex" visualisations using Plots within the notebook, e.g., connecting to a live database and displaying data overlayed on a geographical background (however, the exact scope can be discussed).

Difficulty: Moderate

Interactive Gantt Chart Builder

data visualisation web development ui/ux

Pierre Le Bras

Edinburgh

Details

This project will see the development, evaluation and integration of an interactive Gantt chart builder for the current HWU project system. Some of the requirements may include: - Build an interface that guides students in writing correct milestones, detailed tasks, task hierarchies, and dependencies. - Generate an interactive Gantt chart for students and supervisors to explore. - Provide an export functionality to include the chart or data in a documentation. - Provide functionalities for students and supervisors to reflect on the progress of a project and reevaluate goals if needed. - Evaluate the tool and its guidance feature with a user study. - Develop and integrate the tool within the existing technological stack on which the HWU project system is built. This project will require a willingness to organise and run meetings with stakeholders (current system developers, supervisors, students, etc.). You should also be proficient with web programming, data management, UI/UX and willing to develop bespoke interactive data visualisation systems.

Difficulty: Moderate

D3-integrated analytics JS library

data analytics data visualisation javascript

Pierre Le Bras

Edinburgh

Details

D3.js is an online library for building highly customisable and interactive data visualisations for the web. It is built around a set of modules, some for building/manipulating HTML documents, some for interactivity, others for common data transformations. The typical analytics used in conjunction with data visualisations (clustering, regression, density estimation, etc.) are not covered by those module. This project aims to analyse the software engineering needs for a library that would provide these algorithms, write some of the most typical analytics algorithms (PCA, MDS, K-Means, DBSCAN, etc.), evaluate their performance compared to other implementations (e.g., Python), and test their integration within D3.js and Observable Plots' ecosystem.

Difficulty: Moderate

Knowledge Graphs for Medical Images

Albert Burger

Edinburgh

Details

Medical images contain a lot of information that needs to be captured for further analysis and to link it to associated other medical data. In this project you will develop a Knowledge Graph, using the graph-based database system Neo4j and RDF, to model aspects of the human gut. Based on this you will develop a set of semantic query solutions to answer typical questions on the data set provided.

Difficulty: High

Performance Evaluation of Graph Database Indexing Techniques

Albert Burger

Edinburgh

Details

Indexing is a common technique in databases to improve performance. As part of this project you will study the indexing features provided by the graph-based database system Neo4j and design and run a set of experiments to evaluate the performance improvements that can be achieved.

Difficulty: High

Visualisation-based Graph-DB Comparison

Albert Burger

Edinburgh

Details

As part of an ongoing research project we model human gut anatomy in the form of a graph database, Neo4j. Other research groups have created similar databases and it is important to be able to systematically identify and understand the variations between two different data sets over the same domain. Similarly, an existing database will evolve over time due to addition/deletion/modification of data elements. Here it is important to be able to systematically identify the differences between different versions of the same database. To achieve this, you will use visualisation techniques that can be applied in Neo4j, for example their Bloom tools, to make it easy for an end user to identify and understand variations in the underlying databases.

Difficulty: Moderate

Integration of Heterogeneous Biomedical Databases using a Virtual Knowledge Graph System

Albert Burger

Edinburgh

Details

As part of an ongoing research project there is a need to integrate data, currently stored in a number of different data repositories, into a single coherent system for analysis purposes. Specifically, in this project you will explore the integration facilities of the virtual graph system Ontop to answer queries across multiple data sets.

Difficulty: Challenging

Using Generative AI to Create a Knowledge Graph for a Biomedical Domain

generative ai knowledge graphs neo4j

Albert Burger

Edinburgh

Details

Generative AI is now widely used to tackle a variety of computational problems. In this project you will explore Neo4j’s LLM Knowledge Graph Builder to create a knowledge graph in the Neo4j Graph Database. The new knowledge graph will then be interrogated using LLM chats as well as the Cypher query language. You will have to familiarise yourself with the Neo4j graph database and generative AI tools. The use case for the project will be based on a current biomedical research project, though no previous biomedical knowledge is required.

Difficulty: Moderate

Large Language Models (LLM) for Topic Modelling

machine learning large language models topic modelling

Mike Chantler

Edinburgh

Details

To evaluate the use of LLM to create topic models of huge document collections and to compare these against conventional tools such as LDA techniques. Particular concerns are topic stability and topic quality. See url for example of a graphical topic model of > 200,000 document repositories.

Difficulty: Moderate

CNN and transformers for Mechatronic Optimisation of Laser Systems

machine learning image processing optimisation laser systems

Mike Chantler

Edinburgh

Details

Most laser systems are currently tuned by hand. This project would produce a produce a program that could use the intensity distribution in images of laser beams to tune their output. Program would be written in python and likely involve both both Transformer and CNN mechanisms. A key issue - would be the design of the hybrid CNN/Transformer architecture. This is part of the Heriot Watt EPSRC Prosperity Partnership group of projects.

Difficulty: Moderate

Visualising research on Global Warming

topic modelling global warming large language models

Mike Chantler

Edinburgh

Details

This web app would provide an at-a-glance visualisation of global warming research. It would use topic modelling and Heriot-Watt's topic modelling toolkit. It would provide a similar thematic analysis to that illustrated by our "Visualising Covid-19 Research" paper (see url). The aim is to visually show in which themes world global warming research is (and is not) focussing on, and how these topics are developing over time. It is programmed in a mixture of Java and JavaScript. Topic modelling algorithms may be LLMs or conventional (LDA).

Difficulty: Challenging

Visual comparison of international research and/or patent data

llms machine learning javascript graphics

Mike Chantler

Edinburgh

Details

The web now makes available vast collections of research project descriptions (e.g. Gateway to research at https://gtr.ukri.org/). Different countries use different classification systems to code their national research portfolio and so it is difficult to directly compare research across nations. However, LLM-based Topic modelling provides a means to produce thematic analysis independent of any classification system. This project would therefore develop an interactive, graphics-based system to allow visual exploration and comparison of international research portfolio themes.

Difficulty: Moderate

CNN/Transformer architectures for image processing

Mike Chantler

Edinburgh

Details

As title - but on an image processing problem of your choice.

Difficulty: Easy

LLM/NLP Assistance of Brainstorming and Idea Generation

Mike Chantler

Edinburgh

Details

Use of LLM and other NPL techniques for clustering, suggesting, stimulating, extracting ideas for brainstorming for scientific proposal generation of similar. A key challenge would be the graphical presentation of these clusters, ideas and linkages in an at-a-glance presentation.

Difficulty: Easy

Scientific Paper Analysis using LLMs etc

llms d3 java script

Mike Chantler

Edinburgh

Details

Given a set of papers from for instance a workshop - extract the methods proposed, contributions claimed and tasks performed. Then - summarise the workshop and present a graph of relationships between methods, tasks etc in a visual, quick and easy to understand way.

Difficulty: Easy

Computer Vision and Imaging topics - AI for Multi-modality Image Processing

computer vision deep learning multi-modality image fusion

Dongdong Chen

Edinburgh

Details

Multi-modality image fusion is a technique that combines information from different sensors or modalities to produce a fused image that retains complementary features from each modality. However, effectively training such fusion models is challenging due to the lack of ground truth fusion data. This project focuses on implementing/developing advanced AI models for multi-modality image fusion, e.g. learn to Multi-modality Image Fusion without Groundtruth. If you’re interested in doing this project, please have a look at the papers listed below. https://arxiv.org/pdf/2305.11443.pdf

Difficulty: High

Computer Vision and Machine Learning Topics - Advanced AI models for Data Clustering

deep learning machine learning clustering analysis computer vision visualization

Dongdong Chen

Edinburgh

Details

Clustering is the task of grouping unlabeled data points together based on their similarities. It is an unsupervised machine learning task, meaning that the data points do not have any labels associated with them. In contrast, classification is the task of assigning labels to data points. If the data points are labelled, then clustering can be used as a preprocessing step to improve the performance of classification algorithms. Deep learning (Neural Networks) can learn useful representations from data. This project will focus on implementing and extending the state-of-the-art deep learning models for clustering analysis.

Difficulty: Moderate

Computer Vision and Imaging topics - Self-Supervised Learning for Image Reconstruction

computer vision image reconstruction equivariant imaging medical imaging signal processing self-supervised learning

Dongdong Chen

Edinburgh

Details

In recent years, deep learning has achieved state-of-the-art performance in various imaging inverse problems, including medical imaging and computational imaging. These methods typically require pairs of signals and their corresponding measurements for training. However, in many imaging problems, we only have access to degraded or undersampled measurements of the underlying signals, which limits the applicability of learning-based approaches. The recent equivariant imaging (EI) framework overcomes this limitation by exploiting the invariance to transformations (translations, rotations, etc.) present in natural signals. EI is fully self-supervised and can recover the signals from their measurements alone. This project focuses on applying EI to different inverse problems including but not limited to medical image (e.g. CT/MRI) reconstruction and image restoration (super-resolution, denoising, debluring, etc.). If you’re interested in doing this project, please have a look at the papers listed below.

Difficulty: High

Computer Vision and Imaging Topics: Generative Modeling and its Application in Image Processing

computer vision diffusion models image generation image processing

Dongdong Chen

Edinburgh

Details

Recent advances in AI-based Image Generation spearheaded by Diffusion models such as Glide, Dalle-2, Imagen, and Stable Diffusion have taken the world of “AI Art generation” by storm. Generating high-quality images from text descriptions is a challenging task. It requires a deep understanding of the underlying meaning of the text and the ability to generate an image consistent with that meaning. In recent years, Diffusion models have emerged as a powerful tool for addressing this problem. This project will focus on applying Diffusion Models for image generation, such as painting generation and image fusion. https://arxiv.org/abs/2006.11239 https://arxiv.org/abs/2011.13456 https://arxiv.org/abs/2303.06840 https://arxiv.org/abs/2305.08995

Difficulty: Variable

Deep Learning for Imaging and Low-Level Computer Vision

deep learning machine learning signal processing image reconstruction computer vision

Dongdong Chen

Edinburgh

Details

Inverse problems are ubiquitous in computer vision, image processing, and signal processing. Many research or industrial scenarios essentially involve solving inverse problems, such as image super-resolution, image denoising, computational photography, astronomical imaging, medical image reconstruction, etc. Deep learning is one of the main pillars of modern AI due to its powerful learning capabilities. Exploring deep learning and AI solutions for solving inverse problems is a frontier topic. This project will investigate the fascinating and cutting-edge topic of deep learning for inverse problems. The students will also explore and develop the "Deep Inverse" library (see https://deepinv.github.io/deepinv/) - a Pytorch-based open-source library developed by an international and growing team for solving inverse imaging problems with deep learning. Students will be encouraged to publish academic papers if their progress is excellent. DeepInverse: https://deepinv.github.io/deepinv/ Background about deep learning for inverse problems: https://ieeexplore.ieee.org/abstract/document/10004796?casa_token=OWX4u5zTQxUAAAAA:4xqldZHDeZTsQaJLArqxMQqdkHzTnAKi51x2LtAT5BVfo_zWsVCYotmynl08nnqSkvFheAgOBQ

Difficulty: High

Improving process mining methodology (research project) - no longer available

process modelling process mining

Jessica Chen-Burger

Edinburgh

Details

Improving existing process mining methodology, no programming is needed, but the use of one or more process mining tool(s) is required. If you are interested in this project, please text me using Teams.

Difficulty: High

Process Modelling and Automation for housing management (software development project) - no longer available

process modelling python programming conceptual modelling

Jessica Chen-Burger

Edinburgh

Details

To create a data and process model and a corresponding automated process system to read and execute this process model. Programming required. If you are interested in this project, please text me using Teams.

Difficulty: High

Sentiment Analysing Twitter data for stock market trends (research project) - no longer available

sentiment analysis stock market investment natural language processing

Jessica Chen-Burger

Edinburgh

Details

To analyse twitter data for stock market trends, normally no programming is necessary, sentiment analysis tools will be used. Two different approaches are available: 1. SA tool evaluation, 2. SA tool improvements. If you are interested in this project, please text me using Teams.

Difficulty: High

Supply chain optimisation problems (software development) - no longer available

supply chain management optimisation problems business models

Jessica Chen-Burger

Edinburgh

Details

Optimise a Supply Chain, programming is required for this project. If you are interested in this project, please text me using Teams.

Difficulty: High

A Talent Finder system using Approximate Mapping (software development) - no longer available

ontology knowledge representation prolog python database

Jessica Chen-Burger

Edinburgh

Details

Create a software system to enable approximate mapping based on semantics of topics and research areas, and other features to create a recommendation system for Talent Finder. If you are interested in this project, please text me using Teams.

Difficulty: High

Stock Market Price Prediction using Hybrid ML and Sentimental Analysis Techniques - Vishal Choudhary

Jessica Chen-Burger

Edinburgh

Details

Difficulty: Easy

Prediction and Analysis of Cricket Players’ Performance and Score Forecasting Using Machine Learning Models - Madan, Sahil

Jessica Chen-Burger

Edinburgh

Details

Difficulty: High

Robust Proximity Zone Classification using Adaptive Filtering and Machine Learning for BLE Devices

bluetooth low energy (ble) rssi proximity detection machine learning iot

Zi Hau Chin

Malaysia

Details

This project develops a smart sensing system that accurately classifies the proximity of Bluetooth Low Energy (BLE) devices into defined zones. It leverages an Adaptive Kalman Filter to dynamically denoise noisy Received Signal Strength Indicator (RSSI) data and employs machine learning techniques to establish robust proximity zones resilient to environmental variations. The core work involves: 1. Add Adaptive Kalman Filter (AKF): Add an AKF to smooth the noisy raw RSSI in real-time, dynamically adjusting filter parameters. A little bit of modifications to the source code to collect raw RSSI, RSSI with Kalman Filter and RSSI with AKF at the same time. 3. Train ML Classifier (Offline): Collect labelled training data using the smoothed RSSI from your AKF in various environments, then use scikit-learn to train a machine learning model to classify proximity zones. 4. Integrate & Deploy: Integrate the trained and optimised ML model into the source codebase, ensuring efficient inference. Modify the MQTT output to include the predicted proximity zone classification alongside the RSSI values. Conduct comprehensive system testing in real-world scenarios to validate the accuracy of proximity zone classification and the overall system stability. 5. Evaluation: Perform performance benchmarking on raw RSSI, KF and AKF.

Difficulty: Moderate

BlockVote: A Decentralised Smart Contract Voting System

blockchain smart contracts decentralized voting solidity dapp web3

Zi Hau Chin

Malaysia

Details

BlockVote is a secure, transparent, and verifiable voting platform powered by blockchain smart contracts. This project involves developing robust Solidity contracts for immutable vote casting and tallying, designing a user-friendly decentralised application (DApp) for voter interaction, and integrating advanced cryptographic techniques for enhanced privacy. The core work involves: 1. Designing and implementing secure and efficient Solidity smart contracts for decentralised election management, voter registration, and immutable vote casting and tallying. 2. Developing a robust decentralised application (DApp) frontend that seamlessly interacts with the blockchain, providing intuitive interfaces for administrators to set up elections and for voters to participate securely. 3. Integrating advanced cryptographic techniques or decentralised protocols for enhanced features such as voter anonymity (e.g., using commitment schemes or exploring Zero-Knowledge Proofs) and verifiable election results, thereby significantly increasing system trust and auditability.

Difficulty: High

Improving the Quality of Skin Lesion Data

Christos Chrysoulas

Edinburgh

Details

Literature: discussing lack of open-source, diverse, and well-labelled skin lesion data available for training VLMs. Implementation: clean and label an open-source skin lesion dataset (e.g. HAM10000 or one of the ISIC Challenge datasets). This work will be closely supervised by Tess Watt (PhD Student)

Difficulty: Moderate

Compressing VLMs for Use on Constrained Devices

Christos Chrysoulas

Edinburgh

Details

Literature: sourcing and comparing model compression techniques. Implementation: use a technique from the literature to compress a pretrained VLM for use on a constrained device (e.g. Raspberry Pi). This work will be closely supervised by Tess Watt (PhD Student)

Difficulty: High

Integrating sentiment analysis into e-commerce systems to reduce customer frustration with chatbots

artificial intelligence nlp sentiment analysis e-commerce customer experience

Santiago Chumbe

Edinburgh

Details

Chatbots are bringing innovation to e-commerce communication with customers. E-commerce companies have been adopting chatbots to provide personalised consumer assistance, particularly chatbot based on Artificial Intelligence. However, everyone knows that no matter how well a chatbot has been trained and developed, there will always be cases where human intervention will be necessary to resolve customer queries. This raises the question of how to identify the moment when the customer is becoming tired or frustrated by the answers given by the chatbot, so it is the right time to resort to human intervention.

Difficulty: Variable

Enhancing Chatbots with AI: Its relevance and impact on customer experience

artificial intelligence chatbots e-commerce customer experience

Santiago Chumbe

Edinburgh

Details

The project examines the relevance of chatbots enhanced with AI, regarding customer experience in the context of e-commerce. Based on an in-depth analysis of recent publications in this field, as well as our own field study, we first identify the main causes of poor user experience and frustration that customers experience as a result of their interaction with chatbots; Second, we examine the AI-based techniques that have been proposed to solve these poor customer experiences with chatbots; and finally we propose a design for a chatbot enhanced with a bespoken AI-based technique for e-commerce.

Difficulty: Moderate

Information Systems Research Based Project

methodologies systems design ssm rich pictures user centred design

Jenny Coady

Edinburgh

Details

If you have taken my F21IF class in Sem 1 and are interested in Methodologies, Systems Design, SSM, Rich Pictures, User Centred Design, or something else in this area and would like to propose a project then come speak to me.

Difficulty: Challenging

Applications of Machine Learning and/or optimization in sustainability

David Corne

Edinburgh

Details

Modern optimization and machine learning (ML) methods are increasingly used, but still under-exploited, in real-life scenarios related to sustainability. Two such areas I work on are (i) optimization of vehicle fleet plans to reduce carbon emissions, (ii) ML to predict near-future energy demand (to help optimize the use of renewables in energy systems). If you are interested in either of these, we can probably discuss it and come up with a mutually interesting and challenging project.

Difficulty: Challenging

Create a map building language

gis maps mapping compilation

David Corne

Edinburgh

Details

Google maps, yahoo maps, and others provide APIs that make it possible to build custom maps. For example, if you know the locations of all the bottle recycling bins in your postcode, you could use one of the former APIs to produce a nice map highlighting those bins with a custom gif. Or if you were interested in cycling, and had data about road elevations in areas of interest, you could draw a colour coded visualization of the difficulty of cycling in those arteas. Or, etc ... the world (literally) is your oyser. The finished 'map' is typically an html document full of javascript. However, all of this can be quite laborious to create, even (in fact especially) using the tools provided by the API. This project is to build a tool -- probably command line/linux -- which converts an input text file of simple instructions into the aforesaid html document. For example "10km square centred on Trafalgar Square, marker and label on each statue".

Difficulty: Variable

Transport system simulation to support emission reduction

simulation data mining sustainability machine learning

David Corne

Edinburgh

Details

An ongoing project called TransiT (https://transit.ac.uk/) is exploring how 'DIgital Twins' can help the UK find ways to redesign aspects of its transport systems to reduce carbon emissions. In the transport-systems context, a 'Digital Twin' is basically a discrete-event simulation which might be used to, for example, investigate the effects of a new traffic light system at a major junction, or the impact of more frequent and cheaper bus services on traffic congestion. This student project will build and/or investigate a transport system or component in connection with the wider TransiT project. Details to be explored and devleoped in alignment with the student's particular skills or interests. The project will likely make use of (or further engineer) existing open source simulators such as MATSim.

Difficulty: Variable

Statistical and machine learning methods for temporal data for understanding weather variables affecting energy consumption

machine learning time series multivariate analysis

Sarat Dass

Malaysia

Details

The aim of this project is discover relationships between energy consumption and weather variables using statistical models and machine learning methods for temporal data. The goal is to correlate consumption habits with weather conditions such as temperature, humidity and light. Apart of the temporal data analysis, energy measurements are also available for a group of buildings which are close to each other, which provides a measure of variability of consumption across buildings. Machine and deep learning methods are to be developed to understand all sources of variability and for making predictions. The dataset to be investigated also contains missing information whereby different imputation techniques will be investigated with their effects on predictions.

Difficulty: Moderate

Formalizing Computational Models using the Coq Proof Assistant

Marko Doko

Edinburgh

Details

Very expressive type systems commonly used by functional programming languages can be used to specify very complex data types. Those data types can be so complex that they allow encoding of entire mathematical theorems inside a single data type! This expressiveness of type systems enables creation of tools such as the Coq proof assistant (https://coq.inria.fr/). These tools allow the user to specify mathematical structures, state results about those structures (lemmas, theorems), and prove those results correct. Using proof assistants allows us to create machine-checked proofs, which we can trust with an extremely high degree of confidence, much higher than if the proofs have only been looked over by humans. The aim of this project is to serve as an introduction to using proof assistants through first specifying a mathematical object, and then proving some basic properties of it. The project consists of picking one computational model (e.g., finite automata, Î»-calculus, Turing machines), learning enough about the Coq proof assistant to specify the selected computational model within Coq, and proving some fundamental properties about it. You can choose to focus on any computational model you like - pick the one you're the most familiar with, or the one which interests you the most. The project is highly flexible in terms of scope. It can be molded according to the student's ambition and interest throughout the project's duration. What will you get out of this project? In terms of practical skills, you will gain experience in writing formal specification, using dependent types, and programming in functional languages. In terms of building up your wider understanding, you will get a first-hand experience of the deep interconnectedness between mathematics and programming. If you enjoy programming, but have always found mathematical proofs difficult and too abstract, this project will give you a completely different outlook on proofs. If you are someone who always had a knack for constructing proofs, you will gain an even deeper appreciation for programming.

Difficulty: High

Software for Interactive Exploration of Concurrent Programs' Executions

Marko Doko

Edinburgh

Details

When it comes to defining possible behaviors of multithreaded programs accessing shared memory locations, modern programming languages often come with execution models which are (for a multitude of reasons) rather intricate. Those models represent possible executions of a program using graphs whose nodes represent actions taken by the program, and various types of edges represent the relations among the actions. Getting good intuitive understanding of the models of concurrent executions is not easy because people may find it difficult to visualize how the graphs which represent the executions are being built. This project will focus on a version of the execution model used by C/C++ language family. The task is to develop a software which will enable users to input an example program and interactively explore its executions, i.e., the software will help the user build various graphs which represent legal executions of the given program. Think of this software as a teaching aid. The ideal use case is to help people learn how to think about the modern execution models for concurrent programs. Note that you do not need to know much (or anything at all for that matter) about C or C++ to work on this project. The implementation language of the software being developed can be any language you feel comfortable with.

Difficulty: Moderate

Formal treatment of Manufactoria's computation model

Marko Doko

Edinburgh

Details

In 2010, a Flash browser game Manufactoria [1] appeared and quickly gained a cult status. After the end of support for Flash in 2022, a remake of the game was developed and released in 2022 [2, 3]. Manufactoria tasks the player to create machines which do computations by manipulating a queue (reading from the head and writing to the tail). Initially, you should get familiar with the Manufactoria's programming model enough to be able to implement it in a theorem prover, such as Coq [4]. Once the programming model has been implemented, we will look into proving some basic properties of the model, aiming towards proving that Manufactoria's model is as strong as the computational model provided by Turing machines. You will not be expected to spend money on the game. A copy will be provided for you. [1] http://pleasingfungus.com/Manufactoria/ [2] https://pleasingfungus.itch.io/manufactoria-2022 [3] https://store.steampowered.com/app/1276070/Manufactoria_2022/ [4] https://coq.inria.fr/

Difficulty: Challenging

Formalization of the algebraic structure of physical units in Coq

coq theorem proving formalization of mathematics

Marko Doko

Edinburgh

Details

The main goal of the project is to create a formalization of the algebraic structure of physical units in an assisted theorem prover. After specifying the structure, some fundamental theorems should be established, and an example (preferably the SI system) should be encoded. Goals with which the project can be extended include: - developing a theory of unit prefixes - developing a theory of conversions between unit systems

Difficulty: Challenging

Human-Robot Interaction

human-robot interaction robotics social robots

Christian Dondrup

Edinburgh

Details

This is not a specific project but a place holder for anyone who would like to work on the filed of HRI. This could include things like: - Dialogue - Gestures (recognition and generation) - Navigation - Etc Please get in touch to discuss possible ideas you might have around this topic.

Difficulty: Variable

Develop a middle-ware for the Pepper robot

robot robotics middleware ros python pepper

Christian Dondrup

Edinburgh

Details

The Pepper robot (https://aldebaran.com/en/pepper/) is a humanoid robot made by Softbank Robotics. Sadly, Softbank Robotics has gone out of business and there is no official support for Pepper anymore. This means that the software used to control it, i.e. NAOqi (http://doc.aldebaran.com/2-5/index_dev_guide.html), is not supported anymore. Moreover, the version of it on the Pepper robots we have is only compatible with Python 2. This makes it hard for developers to install this API on their machine and to interface with the robot. However, NAOqi is only responsible for communication with the robot and is installed on the robot itself as well. Hence, I am looking to interface with the robot without the use of NAOqi. For this reasons, this project aims to create a software library that includes a control node on the pepper robot running in NAOqi and a communication interface to connect to the control node via the local network. This communication interface could be as simple as using socket IO or it could use something more advanced such as UDP. This interface should be compatible with python 3 and will be the go to solution for controlling the robot from an external PC. The control node needs to be implemented in python 2 on the robot using NAOqi. It should be able to receive commands via the communication interface and to stream data from the robots sensors, e.g. cameras and mic, to the external PC. Ideally, this interface would be integrated into ROS (https://www.ros.org/) but that is an optional extra.

Difficulty: Variable

LLMs for robots

robot robotics llm machine learning human-robot interaction hri

Christian Dondrup

Edinburgh

Details

This is a rather generic project description exploring the use of LLMs in robotics. I mainly focus on Human-Robot Interaction and robot control. Possible topics could include things like: - Human-Robot Dialogue - Decision making in Human-Robot Interaction - Task planning and execution If you are interested in doing a project that involves robots and LLMs, please contact me so that we can talk about your idea.

Difficulty: Variable

Machine learning for robot control

robotics robot machine learning navigation control computer vision behaviour generation

Christian Dondrup

Edinburgh

Details

There is a vast amount of machine learning algorithms out there and this project is relatively open to cover their use for robotics. Examples could include: - Navigation for mobile robots - Control of robotics arms - Gesture generation for social robots - Gesture recognition - Object/person detection - Action/task recognition - Planning and execution of tasks If you are interested in robotics and machine learning, please contact me so that we can have a chat about you project idea.

Difficulty: Variable

Parkinson's detection from speech recordings

machine learning signal processing

Heba Elshimy

Dubai

Details

This study aims to detect the onset of Parkinson's disease from speech recordings.

Difficulty: Variable

Predicting Drug-Drug Interactions for Safe Prescriptions: a Deep Learning Approach

deep learning drug interactions

Heba Elshimy

Dubai

Details

Drug-drug interactions occur when two or more drugs interact with each other. These could result in a range of outcomes from reducing the efficacy of one or both drugs to dangerous side effects such as increased blood pressure or drowsiness. Impact: Increasing co-morbidities with age often results in the prescription of multiple drugs simultaneously. Predicting possible drug-drug interactions before they are prescribed is thus an important step in preventing these adverse outcomes.

Difficulty: High

Personalized Cancer Treatment: Predicting Drug Response Using Genomic Profiles and Deep Learning

deep learning genomics personalized medicine

Heba Elshimy

Dubai

Details

The same drug compound could have various levels of responses in different patients. To design drug for individual or a group with certain characteristics is the central goal of precision medicine. For example, the same anti-cancer drug could have various responses to different cancer cell lines. This task aims to predict the drug response rate given a pair of drug and the cell line genomics profile. Impact: The combinations of available drugs and all types of cell line genomics profiles are very large while to test each combination in the wet lab is prohibitively expensive. A machine learning model that can accurately predict a drug's response given various cell lines in silico can thus make the combination search feasible and greatly reduce the burdens on experiments. The fast prediction speed also allows us to screen a large set of drugs to circumvent the potential missing potent drugs.

Difficulty: High

Smart Triage: AI-Assisted Emergency Severity Assessment Using Multimodal Patient Data

Heba Elshimy

Dubai

Details

Patients visiting the emergency room / emergency department are assessed at triage by a single care provider and asked a series of questions to assess their current health status. Their vital signs are measured and a level of acuity is assigned. Based on the level of acuity, the patient either waits in the waiting room for later attention, or is prioritized for immediate care.

Difficulty: Moderate

Enabling Environmental Awareness for Hard of Hearing Users through Haptic-Enabled Wearable Devices

deep learning audio signal processing

Heba Elshimy

Dubai

Details

Deliverable: a smartwatch app with background monitoring of ambient sounds; classifying them; and alerting the user via haptic feedback and on-screen notifications of the type pf sound. The haptic feedback would be different based on the severity and importance of the detected sound. This should alert the users to ambient sounds that needs their immediate attention.

Difficulty: Moderate

Conversational AI

dialogue systems conversational ai natural language processing natural language understanding dialogue management

Arash Eshghi

Edinburgh

Details

This project will develop and evaluate a conversational AI system in a task-based domain. The student will learn about concepts and techniques in conversational AI design and implementation, including Natural Language Understanding, Natural Language Generation, and Dialogue Management, and evaluation of dialogue systems. The focus of the project is left open initially and the student can focus on any aspect or sub-task. The dialogue system will be evaluated with human subjects recruited from the university, and it will use both subjective and objective evaluation metrics.

Difficulty: Variable

Applying confusion and diffusion to create strong ciphers (No longer available)

cyber security cryptography

Marwan Fuad

Edinburgh

Details

Diffusion and confusion are two primitive operations in block ciphers. The purpose of the project is to investigate/compare how popular encryption algorithms approached these two principles. The second purpose, for a higher level of difficulty, is to suggest new ways to enhance these principles in one or more of these popular ciphers. You are also expected to create a code/demo/interactive visual to demonstrate the above tasks.

Difficulty: Moderate

Detecting Fake Accounts on Social Media Platforms (No longer available)

fake accounts detection machine learning social media security data analysis cyber security

Marwan Fuad

Edinburgh

Details

This project will focus on creating an application for detecting fake accounts on social media platforms. The system will analyze various features such as profile information, posting frequency, network connections, account meta data, interaction patterns, and any other features, to classify accounts as either genuine or suspicious. The project will involve data collection, feature engineering, model selection, and testing on real-world datasets. The project will also explore the ethical implications of such technology and propose ways to ensure user privacy and data security. The outcome of this project should be of level that is suitable for a peer-reviewed publication

Difficulty: Challenging

Human or bot?

bot detection nlp machine learning social media text classification cybersecurity ai ethics

Marwan Fuad

Edinburgh

Details

With the increasing prevalence of AI-driven bots in online spaces, discerning between human and bot-generated texts has become difficult and crucial. This project will explore natural language processing (NLP) techniques and machine learning to develop a system capable of detecting bot-generated content. The proposed solution will involve collecting a diverse dataset of human and bot conversations, training a model to identify key linguistic features and patterns, and developing an interface to demonstrate the detection capability. The final deliverable could be an application, a web-based tool, or a code repository, which provides real-time analysis and classification of text inputs. A similar project resulted in a peer-reviewed publication, this is the expected outcome of this project

Difficulty: Challenging

Plagiarism Detection in Student Coursework Using Digital Forensics

digital forensics plagiarism detection file metadata analysis academic integrity

Marwan Fuad

Edinburgh

Details

Plagiarism detection in academic settings has traditionally relied on textual comparison tools (e.g. TurnItIn), which often fail to catch more sophisticated forms of academic plagiarism. This project proposes to incorporate digital forensics to detect plagiarism. The proposed tool will analyze the file metadata for signs of content manipulation, such as creation and modification timestamps, reuse, file paths, and embedded data (e.g., hidden images and embedded objects), to uncover potential evidence of plagiarism. By examining these aspects, the tool will identify discrepancies that may indicate that a file has been copied, altered, or otherwise manipulated. This approach will offer an additional layer of scrutiny, making it more difficult for students to bypass plagiarism checks. The final deliverable will be a functional prototype that can be used by educational institutions to enhance the integrity of student assessments

Difficulty: Moderate

Detection of AI-Generated Content in Student Coursework Using Machine Learning

ai-generated content detection machine learning academic integrity nlp text classification chatgpt

Marwan Fuad

Edinburgh

Details

This project aims to address the increasing challenge of AI-generated content in academic settings by developing a detection system that can differentiate between human-written and AI-generated CW. The project will involve collecting a dataset of human and AI-generated text samples, training a machine learning model on these datasets, and fine-tuning the model to recognize patterns specific to AI-generated content. The outcome will be an application that instructors and institutions can use to uphold academic integrity. The project will also explore the ethical implications and limitations of such a system, ensuring it is both fair and effective

Difficulty: Moderate

Semantic Search of Web Pages Content

natural language processing semantic search add-on speech recognition

Marwan Fuad

Edinburgh

Details

In several cases the user is looking for information in a long webpage but this information could come in different forms/formats. For example, they could be looking for a deadline, this could be stated as an explicit date (which could come in a wide variety of formats, e.g. 10 Sep 2024, 10 September 2024, 10/09/2024, 10/9/2024, etc, in addition to different date formats – English or American), or the date could be stated as “in three weeks” or “before classes start”. The purpose of this project is to create an add-on that performs semantic search in a webpage where the user will enter the item they’re looking for, and app will analyse it and generate all possible “forms” of this item and look for them in the webpage It is very important to understand that the above date example, is only an example, and the app is expected to perform much more than that It is suggested that two students will work on this project, one to work on the natural language processing aspect of it, and the other is to work on the software part. In case of three students, the third student will work on adding another layer to the software, which is speech recognition, so the user will prompt the software to perform the semantic search interactively using voice commands

Difficulty: Moderate

Simulation of the Predator Prey Model

predator prey model robotics games

Marwan Fuad

Edinburgh

Details

The predator prey model addresses an important problem in theoretical ecology. This model has applications in robotics, economics, and other domains. The purpose of the project is to create a robotic simulation, a game, or software that simulates this model under different settings (that will be given to the student). This will help researchers working on the predator prey problem

Difficulty: Moderate

Machine Learning Applications in Immunology and Personalized Medicine

immunoinformatics deep learning

Marwan Fuad

Edinburgh

Details

Prediction of B- and T-cell epitopes has long been the focus of immunoinformatics. Based on information from whole-genome sequencing, exome sequencing and RNA sequencing, it is possible to characterize an individual’s human leukocyte antigen (HLA) allotype. New opportunities for translational applications of epitope prediction arose, such as epitope-based design of prophylactic and therapeutic vaccines, and personalized cancer. Several approaches based on Artificial Neural Networks (ANN) and Support Vector Machines (SVM) have been successfully applied for HLA class I binding prediction. Applications to HLA class II binding prediction were also applied but not with as much success. Applications to B-cell epitopes prediction were also applied but with less success

Difficulty: Easy

Easy To Remember

interestingness mining

Marwan Fuad

Edinburgh

Details

When starting a business, and looking for a phone number from available ones, then a number like “200200200” is much preferable to “207947633”, as it’s interesting and easy to remember, but if the former is no longer available, then “200201202”, or “123454321”, or “121251414” will all be preferable to “207947633”. The objective of this project is to create rules on what is an “interesting phone number” from the available ones in a dataset, then apply these rules to offer a client an interesting phone number, or several, ranked from most interesting to least interesting. Notice that you will need to define what is “interesting”, and, more challenging, how to compare two numbers generated using different rules. For a higher level of difficulty, perform the above project on car numbers. Notice this will include elements from natural language processing

Difficulty: Moderate

Using Videos to Enhance Emotion Recognition in Speech

speech recognition emotion recognition disambiguation videos

Marwan Fuad

Edinburgh

Details

Although the tone of a voice is an important factor in emotion recognition in speech, it can sometimes be insufficient; the same tone can be used to express surprise, disapproval, anger, or even humor. The purpose of the project is to enhance emotion recognition in speech using videos.

Difficulty: Moderate

Develop an application for the monitoring and control of social robots.

human-robot interaction social robots woz interaction design

Daniel Hernandez Garcia

Edinburgh

Details

As part of the national robotarium which is a joint initiative between Heriot-Watt and Edinburgh Uni to further research in Human-Robot Interaction, we have recently acquired a suit of new robots. One of these robots is the ARI (https://pal-robotics.com/robots/ari/), a humanoid robot for social interaction. To be able to demonstrate the abilities of the robots to visitors and possible collaborators or run simple Human-Robot Interaction experiments, we need an interface for robot control that is easy to use and reliable. To this end, in this project you will develop an application (GUI) to allow the control of PAL robotics ARI robot for running demonstrations in the lab or conducting human-robot interaction experiments. You will have the option to develop one of two types of applications: a web-based interface, running on the robot touchscreen in its chest, for usage by people interacting with the robot; a gui interface for the "experimenter" to monitor the robot's performance and configure the robot's plan and goals for the interaction. Either application should be able to provide some of the following features: displaying content on the robot screen, monitoring the robot sensors, execute predefined movements (gestures) or actions, and configure the robot's behavior.

Difficulty: Moderate

Social Navigation Strategies for Social Robots

Daniel Hernandez Garcia

Edinburgh

Details

Robots are becoming more prevalent in our society and being able to move through a crowded room or corridor is one of the most fundamental and basic tasks that these robots should be able to accomplish. Social navigation in robotics primarily involves guiding mobile robots through human-populated areas, with pedestrian comfort balanced with efficient path-finding [1]. Looking at current applications of robots, it is easy to see that a lot of them are still working in fenced off areas or even inside cages. One of the most commonly used examples is logistics where robots either pick and place items on shelves or, because picking up things is hard, drive the whole shelf to a person to do the picking [2]. All of this happens in very structured and unpopulated environments. Existing navigation systems still face real-world challenges when deployed in the wild [3]. Although progress has been seen in this field, a solution for the seamless integration of robots into pedestrian settings remains elusive [4]. If we ever want robots to be able to move outside of these fenced-off areas, we need to make sure that they move in a safe manner. This is normally achieved by off the shelve navigation and localisation methods such as the ROS [5] navigation stack [6]. Mind you, of similar importance is that humans feel safe around the moving robot and the movement being safe and being perceived as safe is not always the same. In this project, we are looking for someone who wants to enable a robot (for example, the ARI robot [7]) to navigate a room, with humans, reliably, safely, and with perceived safety. This will require the set-up of the ROS navigation stack on the system and the implementation of a human-aware planner. There are off the shelve methods that can be used [8][9] but more advanced solutions can be implemented as well [3][4]. The resulting system would then be evaluated with participants either face to face or using videos [10]. [1] Core challenges of social robot navigation: A survey. https://arxiv.org/abs/2103.05668 [2] https://www.youtube.com/watch?v=HSA5Bq-1fU4 [3] Augmented Social Force Model for Legged Robot Social Navigation https://rpl-cs-ucl.github.io/ASFM/ [4] Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation https://arxiv.org/abs/2209.10780 [5] https://www.ros.org/ [6] http://wiki.ros.org/navigation [7] https://pal-robotics.com/robots/ari/ [8] http://wiki.ros.org/social_navigation_layers [9] https://docs.nav2.org/ [10] A Protocol for Validating Social Navigation Policies https://arxiv.org/abs/2204.05443

Difficulty: Challenging

Multi-party Social Robot Interactions with LLMs

Daniel Hernandez Garcia

Edinburgh

Details

The ability of Socially Assistive Robots (SARs) to handle dialogue with multiple people at the same time is critical to their adoption in public spaces. Tasks that are typically trivial in one-to-one interactions become considerably more complex when multiple users are involved [1, 2]. Taking part in social interactions with multiple participants, i.e. more than two, constitutes a challenging task for an autonomous system to manage. In these situations the system must interpret and understand different social cues from multiple people at the same time while also employing proper social strategies for addressing different users and regulating the interaction. Building multi-party conversations systems present challenges that do not exist in dyadic conversations, since the structure of the dialogue context is more complicated and the generated responses relies heavily on both interlocutors (i.e., speaker and addressee) and the history of the conversation \cite{gu2022hetermpc}. For multi-party human-robot interactions, turn-taking and the recognition of speakers and addressees remain an open challenge [4]. The work of Skantze [5] provides an overview of research in modelling turn-taking, including end-of-turn detection, handling of user interruptions, and generation of turn-taking cues with voice assistants and social robots. The use of LLMs also holds significant promise for improving HRI [6]. The main goal of our work is the development of a multi-party conversational system that would allow situated social interactions involving a robot and multiple users. To do so, we will want to connect the language understanding capabilities of an LLM with a robot's multi-modal perception (audio and visual) and action generation capabilities. The project will seek to evaluate performance of a multi-party system in a user evaluation study following the experimental methodology that was designed by [7]. [1] D. Traum, “Issues in multiparty dialogues,” in Advances in Agent Communication: International Workshop on Agent Communication Languages, ACL 2003, Melbourne, Australia, July 14, 2003. [2] “WWHO Says WHAT to WHOM: A Survey of Multi-Party Conversations,” https://www.ijcai.org/proceedings/2022/768 [3] “HeterMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations,” https://arxiv.org/abs/2203.08500 [4] K. Mahajan and S. Shaikh, “On the need for thoughtful data collection for multi-party dialogue: A survey of available corpora and collection methods,” https://aclanthology.org/2021.sigdial-1.36 [5] Turn-taking in Conversational Systems and Human-Robot Interaction: A Review https://www.sciencedirect.com/science/article/pii/S088523082030111X#bib0118 [6] “Understanding large-language model (llm)-powered human-robot interaction” https://doi.org/10.1145/3610977.3634966 [7] N. Gunson, A. Addlesee, D. Hernandez Garcia, M. Romeo, C. Dondrup, and O. Lemon, “A holistic evaluation methodology for multi-party spoken conversational agents,” in ACM International Conference on Intelligent Virtual Agents (IVA ’24). https://researchportal.hw.ac.uk/en/publications/a-holistic-evaluation-methodology-for-multi-party-spoken-conversa

Difficulty: Challenging

Projects on Foundational Models for Human-Robot Interaction.

Daniel Hernandez Garcia

Edinburgh

Details

Explore state-of-the-art Foundational Models [1] for real-world robotics tasks. Foundation models have unlocked major advancements in AI, we want to explore the use of foundation models with and for Human-Robot Interaction [2]. Contact me (dh143@hw.ac.uk) if you want to discuss a possible project on Foundational Models for HRI. [1] https://aws.amazon.com/what-is/foundation-models/ [2] Carolina Parada. 2024. What Do Foundation Models have to Do With and For HRI? https://dl.acm.org/doi/10.1145/3610977.3638460

Difficulty: Variable

AI democratization

Neamat El Gayar

Dubai

Details

No code /Low code Machine Learning Automating machine learning training using Low-code and Auto-ML This will involve tool to automatically create ML pipeline and train and test models https://geekflare.com/no-code-machine-learning-platforms/ https://www.databricks.com/discover/pages/the-democratization-of-artificial-intelligence-and-deep-learning https://www.g2.com/articles/low-code-and-no-code-machine-learning-platforms

Difficulty: Moderate

Emotions recognition using Multimodal Learning

Neamat El Gayar

Dubai

Details

In the field of natural language processing, the transformer models such as BERT and T5 are providing a lot of fruitful results. These models are also built on the idea of self-supervised learning where they are already trained with a large amount of unlabelled data and then they apply some fine-tuned supervised learning models with few labeled data Self-supervised learning methods have solved many of the problems regarding unlabeled data. Uses of these methods in fields like computer vision and natural language processing have shown many great results. Recent success of Transformers in the language domain has motivated adapting it to a multimodal setting ( Images, audio , video) -multimodal transformer survey https://arxiv.org/pdf/2206.06488.pdf -Vision transformer survey https://arxiv.org/pdf/2012.12556.pdf Possible applications: - Emotions prediction ( text, audio video ) in educational setting or monitoring medical patients - Build on current recent papers Faccacy detection in political debates https://aclanthology.org/2024.eacl-short.16.pdf and affect prediction in video conversations https://dl.acm.org/doi/10.1145/3689092.3689409 Other applications also possible Check this article . (for applications combining language) https://theaisummer.com/vision-language-models/

Difficulty: Easy

Large Language models /Generative AI and Text analytics applications

Neamat El Gayar

Dubai

Details

NLP / Text analytics applications include analysis of text sources like emails, chats, customer feedback, medical records, scientific literature ....Many applications domain like Telecommunications, healthcare, Retail , Travel and Hospitality and Financial Institutions cab benefit from that. This project falls in that domain. Industry collaboration could be sought.

Difficulty: Moderate

Machine Learning Applications

Neamat El Gayar

Dubai

Details

Machine Learning applications have found great sucesses in domains likes Emotions AI, Human computer interaction, Telecommunications, healthcare, Retail , Travel and Hospitality and Financial Institutions,..... Of special interest to me are applications in the field of education, healthcare, sustainability, transportation and well being. All those depend on having or collecting user personalized data and processing those data to derive insights and provide predictions useful as suggestions or actions for the future. Find your field of interest, find your data, ...and get started!! Besides many industry based application derived from research questions and use cases that are of interest to stack-holder using approaches from Machine learning and data analytics

Difficulty: Moderate

Customer Analytics

Neamat El Gayar

Dubai

Details

Examine problems in customer behaviour, preferences and profiling to help in retail and sales . Techniques like association rule minings, visualizations, clustering and classifications can be applied Also sources of text like customer reviews, customer support chats can be used for added benefit

Difficulty: Easy

ML and data analytics for sustainability

Neamat El Gayar

Dubai

Details

Application for sustainability to support saving Energy or improving lifestyle and encourage sustainable practices Theme to go well with Cop28 . Possibility to join exhibitions /competition https://www.cop28.com/thematic-program#:~:text=Each%20day%27s%20programming%20incorporates%20four,through%20both%20content%20and%20speakers.

Difficulty: Easy

Industry based project

Neamat El Gayar

Dubai

Details

Students are welcome to bring their own ideas or industry-based use cases. Those can be based on solving problems from their current employer or from personal or career interest. Please research the area of interest, develop a problem statement and discuss with your supervisor to guide you further.

Difficulty: Easy

Explainability in AI and Deep Learning for healthcare applications

Neamat El Gayar

Dubai

Details

The objective is to compare and implement different explainability models on a sample image data set. Interpretable AI, or Explainable Machine Learning (XML), usually refers to model that can be explained by a human or it decisions can be interpreted. The main focus is usually on the reasoning behind the decisions or predictions made by the model to be more transparent. This is of particular importance for interpreting medical or security related decisions in machine learning models. XAI attempts to unravel the "black box" tendency of machine learning and attempt to explain why a model arrived at a specific decision. Some resources: Data set: https://www.kaggle.com/c/aptos2019-blindness-detection/data Readings: https://aclanthology.org/2021.eacl-demos.17/ https://arxiv.org/pdf/1910.10045.pdf

Difficulty: Easy

Deep learning approach to designing esthetically appealing products

Neamat El Gayar

Dubai

Details

Category and Product Enrichment: One of our most important elements of discovery is the aesthetics of the product a customer is considering buying. Product description, color, product dimensions and image are some of many other attributes the customer experiences before proceeding to purchase. We would like to leverage ML applications like image recognition and text processing. Possible collaboration with startup https://pubsonline.informs.org/doi/full/10.1287/mksc.2022.1429 https://www.sciencedirect.com/science/article/abs/pii/S1071581923000253

Difficulty: Variable

Customer attribuite prediction for gift recommendations

Neamat El Gayar

Dubai

Details

Predicting Relationships and Ethnicities: A major part of the gifting philosophy of the is to understand the relationships between senders and recipients of a gift, as well as their ethnicities. Leveraging ML to come up with these customer attributes will help us become more relevant to our customers. This projects could includes collaboration with a start up

Difficulty: Easy

Smart Travel Planner

Neamat El Gayar

Dubai

Details

Generating customized and real time travel suggestions can still be very hard to achieve given the different preferences, goals and constraints of travelers. This project leverages latest AI technologies that include predictive analytics, large language models and recommendation systems to propose a smart travel planner for the city of Dubai. This system is expected to significantly improve the travel planning process by providing users with tailored, accurate, and timely information, thereby enhancing their overall travel experience. “Smart Dubai Travel planner” will also analyse real time data from traffic, weather, and occupancy in tourism attactions to provide updated information to generate efficient, eco-friendly , and high-quality travel itineraries Output: Human-like interaction system capable of understanding and responding to complex queries related to travel alternatives within Dubai . The system will provide tourists with the most up-to- date, reliable, and relevant information, ultimately enhancing their travel experience and optimizing tourist flow in the city of Dubai .

Difficulty: Moderate

LLM for customer support

Neamat El Gayar

Dubai

Details

A l virtual assistant, using natural language processing to understand and respond to user queries. This can be investigated in several domains: - Law ( to respond to legal queries and provide recommendations ) - Life style/ Therapy ( provide recommendations for lifestyle for improving minor personality disorders and enhancing mental health) - Tourism ( provide recommendations for visitors related to sightseeing, bookings and shopping while integrating live data when possible)

Difficulty: Moderate

Product visual matching for online shopping

Neamat El Gayar

Dubai

Details

Visual Match: Leveraging image recognition technology, visual matching compares product images to find identical or similar items. This is particularly useful in the fashion and home decor sectors, where visual elements play a significant role in the purchasing decision. Scrape, match and report the differences in availability and offerings in terms of affordability, variety and quality. Possible collaboration with online startup business (e-commerce) https://blog.getmanifest.ai/product-matching-in-ecommerce/#:~:text=Visual%20Match%3A%20Leveraging%20image%20recognition,role%20in%20the%20purchasing%20decision

Difficulty: Moderate

AI applications in dentistry

Neamat El Gayar

Dubai

Details

In recent years, artificial intelligence (AI) has been gaining more relevance in the field of dentistry in general, as well as endodontics (treating root canals). AI-guided algorithms have a great potential to better diagnose, treatment plan, and execute endodontic treatments, as well as outcome prediction of the various endodontic treatments and interventions. Building on a strong collaboration with a partnering institution providing dental clinical data , this project will explore using state of the art Machine learning models and image processing techniques for prediction and intervention in root canal treatment. Data and expert domain knowledge will be provided.

Difficulty: Easy

Experiments with theorem provers

Lilia Georgieva

Edinburgh

Details

Principal goal of the project: SPASS, Vampire, and Otter are contemporary resolution-based theorem provers implementing sophisticated reasoning techniques and decision procedures for classes of first-order formulae and formulae in modal and description logics. For certain classes of formulae more than one refinement of resolution leads to a decision procedure. The aim of this project is to use theorem provers to test the potential and limitations of the systems when applied to such classes. The project would involve studying the properties of classes of formulae, their translation into the input language of the theorem provers, running the theorem provers, and evaluating the results. Prerequisites: Knowledge of first-order logic, interest in theorem proving; References: See http://spass.mpi-sb.mpg.de/spass http://www.cs.man.ac.uk/~riazanoa/Vampire

Difficulty: Moderate

Haptic Interactions

haptics hci human interaction

Theodoros Georgiou

Edinburgh

Details

This project is for students interested in investigating haptics (sense of touch) as a mode of interaction with technology. This is an intentionally open project and the student can propose their project to Dr Georgiou for an initial discussion. For this project you will need to: research a topic, design a study to investigate your hypothesis or proposal, implement your software and possibly hardware prototype(s), finishing with a user evaluation. *Note: as this is a quite open project, If you have an interesting idea, email me to discuss it with me before applying for this project. email me at t.georgiou@hw.ac.uk

Difficulty: Variable

IoT for assisted living

iot hci assisted living sensors prototype

Theodoros Georgiou

Edinburgh

Details

IoT devices are becoming more and more popular. Small ubiquitous devices can have many uses in the modern smart home. In this project the student will have to research, design and implement an IoT system to be used within assisted living environments for detecting and logging people's activities of daily living in lightweight and unobtrusive ways, while people's privacy is maintained. The student taking this project will need to have knowledge or an interest in hardware prototyping with arduinos, raspberry pi, small sensors etc. The student working on this will have full access to the Makerspace at the National Robotarium to 3D print, and design and implement their prototypes *Note: as this is a quite open project, you will need to discuss your idea with me before applying for this project.

Difficulty: Variable

Investigate the acceptance of robots in the home environment.

robot robot privacy hri hci monitor

Theodoros Georgiou

Edinburgh

Details

This project is for students interested in investigating the use of robots with the context of human robot interactions in the home environment. Specifically, you will need to investigate how people perceive elements of trust and privacy when being monitored by a robot versus other traditional monitoring devices - such as cctv. For this project you will need to: research a topic, design a study to investigate your hypothesis or proposal, implement your software prototype(s), evaluate with users. PLEASE NOTE: you must email me first to discuss this further before applying for this project. email me at t.georgiou@hw.ac.uk

Difficulty: Variable

Investigate wearable sensors for specialist application.

wearables sensors hci prototype

Theodoros Georgiou

Edinburgh

Details

This project is for students interested in investigating the use wearables within the context of data capture for specialist applications ranging from activities of daily living, to participating in sports. The exact sensors and/or nature of these wearable sensors will be part of the initial discussion with the interested student. For this project you will need to: research a topic, design a study to investigate your hypothesis or proposal, implement your prototype(s), evaluate with users. *Note: The proposal has a basic idea of what will be required, however as this is going to be your project, you are advised to discuss with me before applying for this project. email me at t.georgiou@hw.ac.uk

Difficulty: High

Is the IoT secure? - old project, not available

iot wearables network security privacy security

Theodoros Georgiou

Edinburgh

Details

No longer offered.

Difficulty: Variable

Serious games for rehabilitation.

games serious games hci hri health

Theodoros Georgiou

Edinburgh

Details

For this project, you will design, implement, and test/evaluate a game with a purpose on rehabilitation equipment at the National Robotarium. Initial training will be provided, but the details of this project have been left purposefully open so you can come and discuss ideas and concepts that interest you specifically. We are looking for motivated students interested in making games within the various constrains of commercially available hardware and devices. PLEASE NOTE: you must email me first to discuss this further before applying for this project. The general idea here is to experiment with, and implement a novel gameplay idea on rehabilitation equipment (upper limb), based on research in the area. You would then evaluate this idea with users and reflect upon your design using this data. This project will be co-supervised with Dr Thomas Methven.

Difficulty: Variable

Soft Interactive Displays

hci etextiles sensors design

Theodoros Georgiou

Edinburgh

Details

For this project, you will design, implement, and test/evaluate a soft werable, holdable, or ambient display (to be discussed as part of the project) for communicating to a user environmental data. You will work closely with academics and / or students from the School of Textiles and Design at the Galashiels Campus to design the soft display using soft material before implementing an appropriate interatcion using microcontrolers (arduino, circuit playground, etc) and etextile technologies. We are looking for motivated students interested in making physical objects and working collaboratively in multidiciplinary teams. PLEASE NOTE: you must email me first to discuss this further before applying for this project. The general idea here is to design, and implement an interactive display of data based on research in the area. You would then evaluate this idea with users and reflect upon your design using this data.

Difficulty: Variable

Investigate technology needs from certain communities

Theodoros Georgiou

Edinburgh

Details

Use HCI techniques to investigate technology needs from certain communities. This is a broad topic that will involve talking to certain underrepresented communities in Scotland (meetings will be arranged) to investigate what is their needs from current technologies they may have available to them. This will need to the creation or proposal of a prototype based on the outcomes of this investigation. As this is a complex topic, please email Dr Georgiou (T.georgiou@hw.ac.uk) to have a chat before selecting it.

Difficulty: Moderate

Dynamic Data Sharding Strategies in Distributed SQL Databases

Jeevani Goonetillake

Edinburgh

Details

This research delves into exploring dynamic sharding techniques within distributed SQL database systems. It focuses on investigating strategies that intelligently distribute data across multiple shards or partitions based on dynamic factors such as data access patterns, workload changes, and resource availability. By analysing the sharding configuration, this research mainly aims to explore the performance, scalability, and resource utilization of distributed SQL databases. The outcome of the research will be important for applications with varying data access requirements and evolving workloads, enabling efficient data management and retrieval while maintaining data integrity in complex distributed environments.

Difficulty: Moderate

Security in Distributed SQL Databases: A Comprehensive Analysis

Jeevani Goonetillake

Edinburgh

Details

This research is an in-depth exploration of the security challenges and solutions within distributed SQL database systems. It involves a thorough examination of various aspects, such as data encryption, access control, authentication mechanisms, and data leakage prevention, with the goal of comprehensively understanding and mitigating potential vulnerabilities. This research aims to provide valuable insights and recommendations for enhancing the protection of sensitive data stored in distributed SQL databases, catering to the growing demand for secure data management solutions in today's interconnected digital landscape. By conducting a comprehensive analysis, this research contributes in ensuring the confidentiality and integrity of data in distributed SQL environments.

Difficulty: Moderate

Database meets Data Mining - An Analysis on the association between the two Domains

Jeevani Goonetillake

Edinburgh

Details

This research is a collaborative investigation that seeks to bridge the gap between database management systems and it's connection to data mining. The link between database management and data mining holds significant promise for applications in business intelligence, data analytics, and scientific research, ultimately empowering users to harness the full potential of their data resources.

Difficulty: Moderate

Blockchain-Based Data Provenance and Auditing

Jeevani Goonetillake

Edinburgh

Details

The research aims to investigates the utilization of blockchain technology to establish transparent and immutable records of data's origin, history, and modifications, ensuring data integrity and trustworthiness. By employing blockchain as a decentralized and tamper-resistant ledger, this research aims to create an auditable trail for data, allowing stakeholders to trace its lineage and verify its authenticity throughout its lifecycle. This innovative approach holds the potential to enhance data transparency, reduce the risk of data manipulation or fraud, and provide a robust framework for compliance and auditing purposes in various domains, including supply chain management, healthcare and finance.

Difficulty: Moderate

Blockchain Integration with Distributed SQL for Immutable Data Storage

Jeevani Goonetillake

Edinburgh

Details

This research explores the fusion of blockchain technology with distributed SQL databases to create a robust system for secure and immutable data storage. By combining the inherent security features of blockchain, such as decentralization and cryptographic hashing, with the scalability and query capabilities of distributed SQL databases, this research aims to provide a solution for organizations seeking to ensure the permanence and integrity of their data records. This approach can find applications in various sectors, including supply chain management, finance, healthcare, and more, where maintaining an unchangeable history of data is essential for transparency, trust, and compliance with regulatory requirements.

Difficulty: Moderate

Collecting benchmarks for type incorrect programs for programming language X (X could be Java, Scala, Haskell, OCaML, etc.)

programming

Jurriaan Hage

Edinburgh

Details

Typically, a student will focus on a particular programming language, which it is can be decided mutually. Some examples are Java, Scala, ML, OCaML, Haskell, Idris, any statically typed language will do. The idea is then to construct and collect programmes that in some measurable way cover (part of) the language, and which can then be used to experimentally check the quality of the type error diagnosis procuced by compilers for the language. Programs can be collected manually, or students can invest in techniques to make collecting a benchmark of this kind a more automatic process, for example by mutating type correct programs.

Difficulty: Variable

Program plagiarism benchmarks for language X

Jurriaan Hage

Edinburgh

Details

To enable, for example, machine learning based approaches to program plagiarism detection we are in need of sizable collections of plagiarism cases (among cases that are not). The research involves coming up with ways of arriving at such sets with as little effort as possible. This project can be done for various different technologies, depending on the student (typically, Java, Python or Haskell, but others are of interest too).

Difficulty: Moderate

Type Safe Python

Jurriaan Hage

Edinburgh

Details

Look for libraries and tools for Python that can be used to increase the quality of code, reliability and maintainability of Python code. Explain how they work. Develop a tutorial to take a piece of plain Python code and make it more reliable by the use of these tools and libraries.

Difficulty: Moderate

Domain-specific Languages

Jurriaan Hage

Edinburgh

Details

Provide a general introduction to what they are, and what forms they can take. Illustrate by means of several examples, which can be across different technologies (Java, Haskell, Scheme, Ruby), or within a particular technology. Starting point would be the book by Fowler that you can borrow from me.

Difficulty: Moderate

Contributing to an Open Source project

software development software evaluation

Jurriaan Hage

Edinburgh

Details

The idea of this project is that you find an open source project, and the challenge of the project is to improve the project in some way and then evaluate the effects of your improvement. This makes this very generic project where you will have a major say in what you will be doing. The kind of project works best if there is an open source project that you are already quite familiar with (in terms of its use, not necessarily how it is implemented).

Difficulty: Variable

Apply data mining techniques to analyze educational datasets, uncover patterns in student behavior

dubai students

Maheen Hasib

Dubai

Details

Difficulty: Moderate

Predicting heart failure mortality using machine learning models

Maheen Hasib

Dubai

Details

Difficulty: Moderate

Fraud detection in health insurance claims

Maheen Hasib

Dubai

Details

Difficulty: Moderate

GA: Third Party Provider (TPP) Application for Open Banking

Idris Ibrahim

Edinburgh

Details

Open Banking was introduced in 2019 as part of the Payment Services Directive (PSD2) regulationsto give customers greater control over their financial data. A TPP is an authorised online service provider which interacts with a bank, they can with the customers consent allows you to see all your banking products in one place. Money Dashboard is one example of a TPPCurrently testing isnâ€™t complete as there no TPP app in the lower environments (SIT, UAT and PPE*). to test the journey between a TPP and the Sainsburyâ€™s Bank Credit Card app. The solution to this is to create a TPP app (for Android) that will redirect to the Sainsburyâ€™s Bank app (in the lower environments) if the app is installed or Sainsburyâ€™s Bank web site page if the app is not installed, login and give consent before redirecting back to TPP app and displaying Sainsburyâ€™s Bank Credit Card data.This app will include the use of open banking APIâ€™s, certificate signing and a user-friendly interface and will allow better testing of this app-to-app journey.*SIT = System Integration Testing, UAT = User Acceptance Testing PPE = Pre-Production Environment

Difficulty: Variable

Analysing Network Traffic

Idris Ibrahim

Edinburgh

Details

This project involves collecting and examining network packets to understand usage, detect unusual behaviour, and evaluate performance. It supports better network management, security, and optimisation. Legal and Ethical Notes: Ensure all monitoring complies with privacy laws. Get permission before capturing data, inform users, and protect any sensitive information with proper security measures.

Difficulty: High

Real-time Video Processing

----

Idris Ibrahim

Edinburgh

Details

The 'Real-time Video Processing' project explores real-time video techniques using the Marvin Framework. With rising demand for video enhancements, this hands-on project covers video filtering, object tracking, augmented reality, motion detection, and data analysis—all through live camera feeds. Marvin Framework: Participants will connect cameras and use Marvin plug-ins for real-time processing tasks.

Difficulty: Moderate

Image Filtering and Enhancement

Idris Ibrahim

Edinburgh

Details

Difficulty: Challenging

Object Detection and Recognition

Idris Ibrahim

Edinburgh

Details

Difficulty: Moderate

Network Simulator 3 by Example: Bridging the Gap in Networking Education

Idris Ibrahim

Edinburgh

Details

Upgrade the current NS2 teaching website to NS3, creating a modern platform for F29NC and F20MX courses. The project will carefully move all content to NS3 while adding new features, making it a valuable resource that supports interactive learning and collaboration.

Difficulty: Moderate

Detecting Malware via Behavioural Analysis Using ML

---

Idris Ibrahim

Edinburgh

Details

Difficulty: Variable

Blind Vision

Idris Ibrahim

Edinburgh

Details

Developing Blind Vision is a mobile application aimed at assisting visually impaired individuals in navigating their daily lives. The application utilises computer vision and machine learning to analyse the surroundings captured by the rear camera of a smartphone. By providing real-time audio feedback, turn-by-turn navigation, and warnings about obstacles, Blind Vision empowers users with enhanced independence and safety.

Difficulty: Challenging

GUI Network Scripts’ Scenarios Generator

---

Idris Ibrahim

Edinburgh

Details

The GUI Network Scripts’ Scenarios Generator aims to provide a user-friendly graphical interface for generating scenarios and scripts in NS-2 (Network Simulator 2). NS-2 is a widely used discrete event network simulator for research and educational purposes. This project seeks to simplify the process of creating and customizing simulation scenarios by offering a visual tool that abstracts the complexities of NS-2 script creation.

Difficulty: Moderate

SDLC Navigator

Idris Ibrahim

Edinburgh

Details

SDLC Navigator: A Comprehensive Study and Decision Support System for Software Development Life Cycle Models In the rapidly evolving landscape of software development, choosing the most suitable SDLC model is crucial for project success. The proposed project aims to provide developers with a comprehensive understanding of various SDLC models, enabling them to make informed decisions tailored to their project requirements.

Difficulty: Moderate

A Smart Requirements Writing Assistant

Andrew Ireland

Edinburgh

Details

Most requirement specifications are written in natural language (NL). NL documents are inherently imprecise. EARS is a pattern-based method for constraining the formulation of NL requirements. The goal of this project is to develop a tool that implement the EARS method. In addition, the tool will include 'automated critic', i.e., it will access a LLM (e.g., Google Gemini) to provide suggestions about requirements provided by a user, e.g., possible failure modes that could be added, refinements within the context of conditional requirements.

Difficulty: Challenging

A Hazard Analysis Assistant

Andrew Ireland

Edinburgh

Details

STPA -- System Theoretic Process Analysis -- is a leading Hazard Analysis technique. Developed at MIT, STPA is used for analysing hazards within the context of safety-critical systems, e.g. transportation, medical and national infrastructure. By assisting in the identification of system-level hazards, STPA supports the development of control actions that can be used to prevent potential accidents.

Difficulty: Moderate

A Neuro-symbolic Assistant for Explaining Accidents

Andrew Ireland

Edinburgh

Details

Hazard analysis is the process of trying to anticipate and prevent or mitigate for accidents. The goal of this project is to build a tool that exploits the results of hazard analysis in order to explain how an accident occurred. To achieve this, the tool will need to be able to firstly capture the logical reasoning (symbolic part) that underpinned the hazard analysis. Second, the tool will need to be able to systematically disrupt the 'logical reasoning' in order to model a given accident. Such disruption allow can not fully explain an accident, moreover, many of the 'disruption' will be infeasible. This is where we propose the use of Large language Models (neuro part). That is, each 'logical disruption' will be used to automatically generate a promote for a LLM, e.g., google Gemini. The output from the tool will take the form of an accident report. Ultimately, however, a human accident investigator would need to decide whether or not to accept the findings provided within the auto-generated accident report. Problem frames is a technique that uses common patterns that occur within problems to capture requirements within the context of software engineering. A feature of the problem frames approach is that it draws a distinction between system requirements (world view) and software requirements (software view). The consistency between the world and software views can then be verified mechanically. Note that inconsistencies at this level have historically led to catastrophic system failures. Consistency verification will typically rely upon assumptions about the 'world' in which the intended system will operate. Problems frames does not help with ensuring the validating of such assumptions. This is where LLMs might help, i.e., we propose to use LLMs to explain the validity (or otherwise) of a given set of assumptions. Ultimately, it will be the responsibility of a human engineer to decide if they believe the LLM's explanations. But a tool that combines symbolic 'consistency verification' with neuro (i.e., LLM) explanations of 'assumption validating' could increase the productivity of an engineer. It would be a cool tool too! For more details, see the URL slot below that provides an External Link to a note that more fully describes the proposal and includes an example.

Difficulty: High

Projects in cyber security

authentication cryptography cyber security network security

Mike Just

Edinburgh

Details

I'm happy to supervise students who want to do a project in cyber security, particularly in areas of application design, authentication, cryptography, digital identity, network security or privacy.

Difficulty: Moderate

Cryptography Guessing Game

cryptography game

Mike Just

Edinburgh

Details

Your task is to build a game that tests users' abilities with cryptography. For different ciphers, a user will be presented with a ciphertext and then asked to answer with the correct plaintext. You might provide a multiple list of potential plaintexts (where at most one is correct) and/or other hints to the user. The user will gain points based on finding each plaintext (possibly some points for partial finds) as well as the difficulty level that might be based on such factors as the type of cipher or difficulty on multiple choice selections. As a start, your game should allow one person to play on their own (against the computer) and you can also look at variations where one or more users can play against one another in competition. You might also investigate collaborative options for the game whereby two or more users work together to guess the correct plaintext.

Difficulty: Moderate

Cracking some historical ciphertexts

cybersecurity cryptography

Mike Just

Edinburgh

Details

There are plenty of tools for using historical ciphers to encrypt plaintext to ciphertext, but very few to do the reverse and cryptanalyse ciphertext. Your aim is to build an application that takes a ciphertext as input (and possibly some partial plaintext or other information, such as the language of the plaintext), (optionally) the name of the cipher used to encrypt it, and produces one or more possible plaintexts (without knowledge of the key). This project will be a useful exercise for understanding how different ciphers work and the approaches used to cracking them. There are some interesting challenges related to automatically detecting whether you have recovered a valid plaintext.

Difficulty: Moderate

Machine learning for cyber security (MEng masterclass)

machine learning cyber security anomaly detection

Mike Just

Edinburgh

Details

There are many situations today in which cyber security decisions are made using machine learning. Most commonly it has been used for detecting anomalies, such as intrusion detection (of network traffic) and malware detection (of software), though there are other areas as well (see Section 4 of https://www.cybok.org/media/downloads/AI_for_Security_TG_v1.0.0.pdf). This is a proposal for a MEng masterclass, output of which would involve the delivery of a lecture, and possibly tutorial or lab, on how machine learning is used for cyber security, with a focus on one security application area.

Difficulty: Moderate

A visualiser for reductions in the lambda calculus

Fairouz Kamareddine

Edinburgh

Details

The lambda calculus is an idealised programming language. Reductions in the lambda calculus allowed us to study evaluation strategies in programming languages. This project is to visualise reductions (best using Python), and to assess and compare different strategies and tradeoffs between termination and efficiency. The visualised reductions will be animated graphs that almost speak to the user. Some of these graphs will be impressive. You can demonstrate the usefulness of what you do either for educational purposes, or for measuring the efficiency of different programs.

Difficulty: Moderate

Analysing Mathematical texts in MathLang

Fairouz Kamareddine

Edinburgh

Details

Readapt MathLang in Python and use the adaptation to MathLang to analyze mathematical texts.

Difficulty: Moderate

Associating programming langauges to tasks

Fairouz Kamareddine

Edinburgh

Details

Different programming languages have different features. This project is to assess when you need an object oriented language, when you need a parallel language, and to build a tool that helps this choice. Already a number of tools have been built and this project is to extend them with new features.

Difficulty: Moderate

Checking the correctness of a mathematical book in Lean/Coq/Isabelle

Fairouz Kamareddine

Edinburgh

Details

Computer checking the correctness of entire books of mathematics has moved on since the ideas were first introduced independently by de Bruijn in his Automath system and Trybulec in his Mizar system. Since, Computer systems have been created to check the correctness of not only mathematical books but of software and information in general. In this project you will investigate the checking of a a mathematical book in a choice of three computer checkers: Isabelle, Coq or Lean. You will first need to familiarise yourself with one or all of these systems after you have done the background research on them, and then you will need to investigate initial proving of a couple of results you have studied in one of your favorite courses (e.g., it could be your course on discrete mathematics or logic and proof or lambda calculus or Turing machines), and then decide whether you want to do all the work in one prover or you want to compare the three provers in question.

Difficulty: High

Investigating the Languages for Mathematics

Fairouz Kamareddine

Edinburgh

Details

Langauges of Mathematics have played an important role in the creation of the computability theory and also new areas such as the design and verification of programming langauges. There is still no ideal langauge to write mathematics. This project investigates a number of attempts at finding a good language to write mathematics and to also computerise and check the correctness of mathematics.

Difficulty: Moderate

Investigations (theory and implementation) of a new computation language

Fairouz Kamareddine

Edinburgh

Details

The famous BNF notation as introduced and used in the Algol 60 report was followed by numerous notational variants (EBNF, ABNF, RBNF etc.). Subsequently, a new Computer Science Metanonation "CSM" was developed. But, CSM does not have yet a full definition or a full implementation. This may be difficult to do. This project will investigate variants of CSM, extracting a full definition and giving a full implementation.

Difficulty: High

Proving correctness of Turing machine specifications

Fairouz Kamareddine

Edinburgh

Details

You learned how to write and implement Turing machines to solve certain tasks. But, how do you ensure that the machine is correct and meets its specification? How do you show the properties of your Turing machine (e.g., whether it terminates). This project is to implement the language of Turing machines in Lean (or your choice of provers) and to check the properties of different Turing machines as well as the universal machine.

Difficulty: High

Semantic annotations of Foundations course material

latex semantically annotated text parser python

Fairouz Kamareddine

Edinburgh

Details

Semantically annotated mathematical documents provide several advantages over unannotated documents. In LaTeX, a package for adding semantic annotations called sTeX can be used to semantically annotate documents. This project would involve using sTeX to create semantic annotations for the Foundations course material (slides, tutorial sheets, etc.). Semantic annotating of documents provides plenty of opportunity for automation, so part of the project would include interaction with existing automation approaches to provide annotations for the Foundations course material. This project would require some experience with LaTeX, Python, and parsing.

Difficulty: Moderate

Network Intrusion Detection System using Netflow Data and Machine Learning

nids network intrusion detection machine learning

Kayvan Karim

Dubai

Details

The project applies machine learning algorithms to analyze the attack patterns within Netflow data, distinguishing benign network behaviour from potential cyber threats. The Netflow data can be aggregated using the NTFA tool and then build machine learning models to detect the attacks.

Difficulty: Moderate

Retrieval Augmented Generation (RAG) using LLM

llm vector db

Kayvan Karim

Dubai

Details

For this project, the student must utilize LLM and Vector DB tools like Pinecone or ChromaDb to create an LLM agent to respond to user inquiries based on the provided document(s) using Retrieval Augmented Generation.

Difficulty: Easy

Explainable AI for Verification

Ekaterina Komendantskaya

Edinburgh

Details

As machine learning algorithms find their ways in safety-critical systems, such as autonomous cars, robot nurses, conversational agents, the question of ensuring their safety and security becomes important. It is often difficult to explain and interpret machine learning models, yet this seems to be the key to their safe and secure implementation and applications. Explainability of AI is a growing subject area, with different applications, with several tools already available on the market, such as e.g. LIME. These tools start to gradually find their place in AI verification. You will study applications of methods of explainable AI in the area of Verification, and will create your own prototype tools that support these methods. You will have a chance to collaborate with researchers in the lab for AI and Verification: LAIV.uk.

Difficulty: High

Probabilistic Verification of AI

Ekaterina Komendantskaya

Edinburgh

Details

As machine learning algorithms find their ways in safety-critical systems, such as autonomous cars, robot nurses, conversational agents, the question of ensuring their safety and security becomes important. At the same time, neural networks are known to be vulnerable to adversarial attacks --- a special kind of crafted inputs that cause unintended behaviour in trained neural networks. Due to these two factors, neural network verification has become a hot topic in both machine learning and verification communities. It is often described as one of the main challenges faced by computer science and engineering these days. Often, we cannot verify a property with certainty, but can verify it with some degree of probability. There are languages ad tools for Probabilistic verification, for example, PRISM or Probabilistic Prolog. You will study this area and implement your own toy examples using these methods. You will have a chance to collaborate with researchers in the lab for AI and Verification: LAIV.uk.

Difficulty: High

Verification of Neural Networks

Ekaterina Komendantskaya

Edinburgh

Details

As machine learning algorithms find their ways in safety-critical systems, such as autonomous cars, robot nurses, conversational agents, the question of ensuring their safety and security becomes important. At the same time, neural networks are known to be vulnerable to adversarial attacks --- a special kind of crafted inputs that cause unintended behaviour in trained neural networks. Due to these two factors, neural network verification has become a hot topic in both machine learning and verification communities. It is often described as one of the main challenges faced by computer science and engineering these days. In this project, you will study the existing methods of neural network verification, and will implement your own toy application/algorithm. You will have a chance to collaborate with researchers in the lab for AI and Verification: LAIV.uk.

Difficulty: Variable

Sketch to pixel art

computer graphics pixel art

Babis Koniaris

Edinburgh

Details

Pixel art is characterised by constraints, such as low image resolution and limited colour palette. Silhouettes and lines, to be visually pleasing, need to conform to certain requirements with regards to the length and orientation of the sub-segments they're composed of (see resources). Freehand drawing support for such "perfect lines" exists in tools such as Aseprite and Krita. However, conversion of existing black-and-white sketches or existing pixel art that doesn't conform to such rules, is not currently possible. Depending on interest and challenge, there's opportunity to create a tool to modify existing rough high-resolution raster sketches to pixel art with perfect lines, or modify existing line art (e.g. from an SVG file or pixel art image) to pixel art that conforms to the visually pleasing line constraints.

Difficulty: Variable

Open project in real-time large 2D/3D data visualisation

computer graphics virtual reality data visualisation hci

Babis Koniaris

Edinburgh

Details

This project is open to students with an interest or idea involving data visualisation of datasets with a spatial component (and therefore mappable to 2D/3D), that are non-trivial to visualise with convential methods, due to the nature of the data (too large), the nature of the desired visualisation (too fancy), or even the nature of the medium (e.g. large data in VR). This needs to be driven by an existing motivation/use-case. Please contact me to discuss your ideas BEFORE selecting this project.

Difficulty: Variable

Open project in computer graphics and/or game development

computer graphics game development simulation procedural generation visualisation vr

Babis Koniaris

Edinburgh

Details

This project is open to students with an interest or idea involving implementation/evaluation of computer graphics or game development techniques, in topics such as rendering, simulation, AI, physics, procedural generation, VR, tools, etc. Please contact me to discuss your ideas BEFORE selecting this project.

Difficulty: Variable

Machine Learning approaches for Stock Market Price Prediction

machine learning

Smitha Kumar

Dubai

Details

Using modern machine learning techniques in predicting stock market prices given historical data.

Difficulty: Moderate

Facial Expression Recognition using CNN

Smitha Kumar

Dubai

Details

Mobile Application which could capture an image an d analyze facial expression and identify the emotion. Using CNN(Convolution Neural Network) https://www.sciencedirect.com/science/article/pii/S0031320316301753

Difficulty: High

Machine Learning in Neurological Disorders

Smitha Kumar

Dubai

Details

Applications of Machine Learning in analyzing the clinical data to detect disease (Alzheimer’s , Parkinson’s disease , Cerebrovascular Disorders (Stroke))

Difficulty: Moderate

Generative AI applications

Smitha Kumar

Dubai

Details

Here are a few sample projects that illustrate various applications: Text summarization - Abstractive summarization vs Extractive summarization - Evaluation of open llms / article type (reading comprehension, scientific article, sports news) LLMs in Code debugging - Enhance LLM response using RAG/RLHF LLMs in Climate Information

Difficulty: Moderate

LLM-based QA Models in STEM Using RAG Architecture

Smitha Kumar

Dubai

Details

Improve LLM response using Retrieval Augment Generation (RAG), Analyze RAG pipeline ,includes vector DB Enhance with RLHF techniques

Difficulty: Moderate

Smart Traffic management system

microcontrollers - arduino sensors web development

Rosalind Deena Kumari

Malaysia

Details

A smart traffic management system uses sensors or cameras to gauge vehicle flow and adjusts signals accordingly. Detectors are installed to count the number of vehicles and at intersections and send data to a controller for traffic light timing. By smoothing out the traffic flow, congestion and travel delays are minimized.

Difficulty: Moderate

A mobile application for smart agriculture system

arduino programming sensors web development

Rosalind Deena Kumari

Malaysia

Details

Smart agriculture starts with soil health monitoring using sensors for moisture, pH, and temperature. A controller processes this data, helping optimize watering and nutrients to improve crop yields and reduce resource waste. • Sensor Placement: You will learn to position sensors at proper depths or spots for accurate readings. • Data Thresholds: You will set triggers for irrigation or alerts when conditions shift. • Automated Feedback: You might open a valve or send a notification when soil moisture drops.

Difficulty: Challenging

Generative AI, Spoken Language Technology, NLP (various topics)

ai generative ai llm vlm speech conversation robotics human-robot interaction nlp evaluation safety

Oliver Lemon

Edinburgh

Details

Conversational AI and spoken language technology is a lively research area encompassing a variety of problems such as language understanding, dialogue management, user modelling, language generation, multimodal interaction, and Natural Language Processing (NLP) in general. Since the arrival of ChatGPT and similar models, many conversational systems are now built using generative AI methods such as LLMs and vision-language models (VLMs). For this project you can choose from a variety of topics listed below or propose your own. All must involve an evaluation of what you have achieved (e.g. compared to previous state-of-the-art approaches / systems/ components). 1) Integrating conversational AI and SDS with graphical talking head / Virtual Characters. e.g. use the FurHat robot head, ARI robot. You may wish to use generative AI models such as GPT, LLAMA etc 2) using generative AI models in video games (e.g. NPCs, player assistant, visually-aware companion....) 3) Integrating conversational AI and SDS with vision systems. You may wish to use generative AI vision-and-language models such as LLAVA 4) Safety and ethical issues of generative AI models 5) Task-based conversational systems - Your own topic that involves state-of-the-art research in talking to computers. E.g. Personalised or Emotional Natural Language Generation, Machine Learning methods for optimization of NLP, etc. Make sure that there is a method for evaluating your proposed advances! You will need to take course F20CA / F21CA to do this project.

Difficulty: Moderate

Social robotics: HRI for multi-person conversation

generative ai llm human-robot interaction evaluation

Oliver Lemon

Edinburgh

Details

This project will develop a new conversational interface for a robot receptionist for the Robotarium building -- using the robot animated head FurHat: http://www.speech.kth.se/furhat/ which we have in the lab, or using the PAL ARI robot. A key aspect will be to handle multiple people in the conversation (e.g. 2 groups of 2 visitors, one group wants to find the cafe, the other has a meeting in room 3). The project combines audio and visual processing to create a socially intelligent robot. You can use a Vision-Language Model (VLM) such as GPT4o or Gemini to approach this task. Using e.g. the Furhat SDK and an LLM or VLM, you will develop and test a new interface that performs socially intelligent human-robot interaction for greeting visitors to a building, allowing them to find their meeting location, alerting their contacts, handling deliveries, etc. You will evaluate the system you create with real users. If you use the PAL ARI robot, you could attempt to navigate the robot to lead the human to the right location. This project should handle dialogue with multiple humans in the scene. You should take F20/21CA Conversational Agents if you do this project.

Difficulty: Moderate

Conversational speech and multimodal Interfaces for Role-playing games using Generative AI

ai games generative ai llm vlm speech multimodal evaluation

Oliver Lemon

Edinburgh

Details

There are interesting opportunities in adding speech and dialogue capabilities into video games -- for example making conversational game characters that you can really talk to, or a companion character that can assist and support the player. Such characters need to be believable, safe, and responsive to the game world. Evaluation frameworks (benchmarks) for such characters are also an open area of investigation. A related area is in using generative models to create interactive narratives/stories for players, which are coherent and immersive. New LLMs (large language models)and VLMs such as GPT 4o and Gemini are now being used to create games and conversational game characters, see Mantella and CHIM : https://www.nexusmods.com/skyrimspecialedition/mods/98631 , https://www.nexusmods.com/skyrimspecialedition/mods/126330 This project could explore such systems and LLMs and integrate them a into a game engine (e.g. Skyrim, Minecraft, etc) . For example to drive conversations with NPCs, or develop a visually-aware conversational game companion, search lore etc. See e.g. https://www.youtube.com/watch?v=OiPZpqoLs4E and https://www.youtube.com/watch?v=tVd3QYc0fU8 You will evaluate aspects such as the usability, safety, and immersion value of this system versus the baseline game. This project can be done in collaboration with an industrial partner, for example SpeechGraphics, and is also suitable for DataLab students. See https://arxiv.org/html/2402.18659v1 for a recent survey of this area. You should take the course F20/21CA Conversational Agents.

Difficulty: Moderate

Collaborative AI: generative AI systems capable of teamwork

ai teamwork generative ai

Oliver Lemon

Edinburgh

Details

Consider the different scenarios where AI agents need to collaborate with unfamiliar teammates (other robots, AI systems, and humans) who possess varying knowledge, skills, and capabilities. This is the problem of `ad-hoc teamwork' (AHT), which requires agents with the ability to dynamically agree and coordinate on a `common-ground' understanding of the domain and tasks at hand. You will investigate the extent to which current generative AI systems (LLMs and VLMs) have such collaborative skills, and develop new methods to support AHT within generative AI systems. You will investigate tools such as CoELA, ( https://github.com/UMass-Embodied-AGI/CoELA ) AutoGen ( https://microsoft.github.io/autogen/ ) and LangChain Agents. This project can also involve collaboration with Toshiba (Cambridge Research Lab). You should take the course F20/21CA Conversational Agents.

Difficulty: Challenging

Human-Robot Teamwork with Generative AI

generative ai llm vlm cooperative ai human-robot interaction evaluation

Oliver Lemon

Edinburgh

Details

This project will explore how humans, robots, and AIs can collaborate together in teams, to coordinate on shared tasks. This will involve generating conversational interaction to understand tasks, agree plans, resolve ambiguities, correct mistakes, and so on. You will use LLMs and/or VLMs (such as LLAMA, LLAVA, GPT, Gemini, Moshi, etc ...) to create a system which can meet and coordinate with a previously unseen other agent (a human or robot) and collaborate with them to complete a shared task -- for example to tidy up a room, make breakfast, or build a lego model. See CoELA for a start on this problem : https://github.com/UMass-Embodied-AGI/CoELA You can use real robots such as Tiago, ARI, Stretch, Furhat and/or simulations of them. You will evaluate the system's effectiveness and efficiency in completing different shared tasks with different people. This project can also involve collaboration with Toshiba (Cambridge Research Lab).

Difficulty: Moderate

Playpen: Training generative AI models through interactive games

ai llm games interaction evaluation

Oliver Lemon

Edinburgh

Details

Interaction between learner and feedback-giver has come into focus recently for post-training of Large Language Models (LLMs), through the use of reward models that judge the appropriateness of a model's response. In this project, we investigate whether Dialogue Games -- goal-directed and rule-governed activities driven by verbal actions -- can also serve as a source of feedback signals for learning. We use Playpen, an environment for off- and online learning through Dialogue Game self-play, and investigate, for example -- post-training methods: supervised fine-tuning; direct alignment (DPO); and reinforcement learning with GRPO. The framework and the baseline training setups are available at: https://github.com/lm-playpen/playpen You will contribute to this project by, for example: implementing new games, and/or training and testing LLMs and/or VLMs using Playpen You should do F20/21CA Conversational Agents if you take this project.

Difficulty: Variable

Assistive Robots with Generative AI: voice-controlled robot helpers

assistive technology ai generative ai llm speech conversation interface robotics human-robot interaction nlp

Oliver Lemon

Edinburgh

Details

Some people are not able to manipulate objects but they can speak. This project explores the development of voice-controlled assistant robots to help them, for example fetching objects and manipulating them, to enable more independence. You will use vision-language-action models (VLA) such as SmolVLA, Open VLA, or Gemini robotics to create and evaluate such a system, either in simulation or with a robot. You will use a speech recogniser such as Whisper. You should take F20/21 CA- conversational agents to do this project.

Difficulty: Moderate

Formal specifications and verification of software requirements

formal verification model-checking automatic theorem-proving

Oleksandr Letychevskyi

Edinburgh

Details

Using the example of requirements for software, create its formal specification and apply formal verification methods, such as model checking or automatic theorem proving. Identify the properties of the program that need to be proved or disproved.

Difficulty: High

Model-based testing of software systems.

model-based testing test coverage software formal model

Oleksandr Letychevskyi

Edinburgh

Details

Write a program that generates test sequences based on a formal model of the system under test. Investigate the test coverage conditions of the model and conduct an experiment with the aim of the best coverage of the code.

Difficulty: High

Using AI methods in theorem proving

automatic theorem proving ai methods term rewriting system

Oleksandr Letychevskyi

Edinburgh

Details

Consider examples of the use of AI in competitions of proof machines. Create a neural network trained on the scenarios of proving statements in the selected theory (polynomial algebra, first-order logic). Using an existing statement proof system or term rewriting system, integrate it with a neural network that provides a hint for each step of the proof.

Difficulty: High

Problems of simplifying Boolean formulas and AI

first-order logic algebraic expression simplifying ai methods

Oleksandr Letychevskyi

Edinburgh

Details

On the basis of axioms and theorems of predicate logic (propositional or first-order) within the framework of the existing system of term rewriting, create an inference machine. Based on the output scenarios, create a neural network that determines the most efficient simplification step. The result could be an integration of a neural network and term rewriting system, or some other way of hinting at the simplification step.

Difficulty: High

Neuro-symbolic approach in finding backdoors in the program code.

backdoor software vulnerability neuro-symbolic approach

Oleksandr Letychevskyi

Edinburgh

Details

Investigate the types of backdoors in the program code and create several examples on which to simulate the behavior of the program. Based on the received scenarios, create a neural network for detecting backdoors. In parallel, create a formal description of the semantics of the program's behavior with a backdoor. Create a method that uses the formal semantics of backdoors, and confirms or rejects the result of neural network classification.

Difficulty: High

Research on the use of formal methods in the verification of the property of resistance to attacks in the blockchain.

blockchain formal verification double spending attack

Oleksandr Letychevskyi

Edinburgh

Details

Create a formal model of the selected consensus algorithm (Proof of stake, proof of delegated stake, etc.) in the form of a formal model, which can be represented as an automaton, a Petri net, or another type of transition system or process algebra. Consider attacks on the blockchain such as double spending, Sybil attacks, or others. Create a method or use existing algorithms of the model-checking and prove the possibility of an attack under certain conditions (for example, the number of attackers in the network is more than 50%).

Difficulty: High

AI methods in detecting intrusions into software systems.

cybersecurity ai methods adversarial attacks

Oleksandr Letychevskyi

Edinburgh

Details

Create a neural network that classifies cyberattacks based on existing datasets. Create a program (wrapper) that works as a firewall for a software system to detect and prevent intrusions, using the created neural network. Estimate accuracy and adversarial attack resistance. Consider the problem of verification of the neuron network.

Difficulty: Moderate

AI methods in biological research.

ai methods biological models

Oleksandr Letychevskyi

Edinburgh

Details

Consider neural networks that work with Big Data in biological research (a database of experiments, substances with certain properties). Develop a technology for searching for a substance with given properties using AI methods.

Difficulty: High

AI-based Digital Twins

digital twins neural network crisis event

Oleksandr Letychevskyi

Edinburgh

Details

Create a model as a digital twin to predict the behavior of objects based on the available data set. Simulate the input data into the model and get predictions of behavior.

Difficulty: Easy

Neurosymbolic based digital twins

neurosymbolic approach digital twins

Oleksandr Letychevskyi

Edinburgh

Details

Using the available dataset, create an object classification model. Create (generate) a set of constraints that will be integrated with the classification model. Conduct an experiment and evaluate its accuracy.

Difficulty: Easy

AI-based detection of vulnerabilities in binary code

ai-methods cybersecurity vulnerabilities

Oleksandr Letychevskyi

Edinburgh

Details

Create a dataset of binary code fragments with vulnerabilities. Create a neural network and conduct experiments on vulnerability detection. Evaluate accuracy.

Difficulty: Easy

A tool to monitor/inspect multi-threaded (concurrent) Java software

programming java concurrency static analysis runtime monitoring software engineering

Kostas Liaskos

Edinburgh

Details

Writing correct Java multi-threaded code [1] is a hard problem, which introduces new challenges and common concurrency bug patterns (e.g. memory consistency errors [1]). The programmer has to ensure that all accesses to shared data are coordinated. The coordination is usually done with some sort of synchronisation, which in turn might lead to further problems (e.g. deadlocks, starvation and livelock [1]). The utilisation of supporting software monitoring/inspection tools is an important countermeasure for programmers to identify such concurrency issues in programs. The aim of this project is to develop a tool that monitors/inspects multi-threaded (concurrent) Java systems. The project is open-ended in terms of the techniques that can be utilised, and a few indicative examples are: - Techniques for static program analysis [2], e.g. software code quality metrics ([3]): [4] and [5] are examples of two popular metrics for non-concurrent systems; or - Runtime system monitoring techniques [6], e.g. the native task managers [7] of Microsoft Windows and macOS are two popular examples; or - A combination of the above. The target users will be other programmers, software engineers, and testers. Requirements gathering will mainly involve researching relevant literature (including learning the basics of Java concurrency and multi-threaded programming) and existing tools in order to adapt the selected techniques for concurrent systems. GUI implementation will be a “must-have” requirement. Some students may choose to investigate turning their tool into a plug-in for a popular Java IDE, e.g. Eclipse, IntelliJ etc.

Difficulty: Variable

A tool to support independent/self-directed learning within programming/computer science

computer science education independent learning self-directed learning programming visualisation of data structures visualisation of algorithms

Kostas Liaskos

Edinburgh

Details

The project is open-ended in that you may choose the specific field(s) within programming/computer science. One example in the context of learning programming is data structure/algorithm visualisation [1]. Another popular (non- programming/computer science specific) tool is Duolingo in the context of learning foreign languages [2]. The end-product must include functionality on the following aspects [3]: - Assess readiness to learn; - Set learning goals; - Engage in the learning process; and - Evaluate learning. The target users will be learners and instructors within the field of programming/computer science. Requirements gathering and evaluation must involve users from this target audience. GUI implementation will be a “must-have” requirement.

Difficulty: Variable

Data-flow code coverage of unit tests: traditional vs. concurrent

java concurrency static analysis runtime analysis programming software engineering code coverage data flow analysis unit testing junit

Kostas Liaskos

Edinburgh

Details

Data-flow coverage is a coverage metric that has been utilised successfully to measure the adequacy of test suites. Concurrency introduces new challenges in the context of software testing; hence, a variation of the traditional data-flow coverage metric has been proposed [1]. The aim of this project is to develop a tool that calculates both versions of data-flow coverage. The core of this project is to compare the two versions and further investigate the usefulness of the concurrent version. The key tasks and challenges in this project include: 1. Reviewing data-flow coverage metrics and understanding the challenges introduced by concurrency. 2. Investigation and implementation of both versions of data-flow coverage. 3. Evaluating the tool by applying it to a range concurrent systems. Strong Java programming skills are essential. Furthermore, the student must familiarise with the fundamentals of the Java concurrency package [2].

Difficulty: Challenging

Automated mutation testing for concurrent software

java concurrency mutation testing mutation operators code parsing static code analysis programming software engineering unit testing junit

Kostas Liaskos

Edinburgh

Details

Mutation testing is a simple technique utilised to evaluate the quality of existing software tests: faults (or mutations) are seeded into the code, then the tests are run. The quality of the tests can be calculated from the percentage of mutations killed. Tools that automate this process exist, e.g. PIT [1]. The aim of this project is to develop a tool that automates mutation testing in the context of concurrent software. The core of this project is to investigate common concurrent bug patterns [2] and suggest appropriate mutation operators. The key tasks and challenges in this project include: 1. Reviewing common concurrent bug patterns and suggest appropriate mutation operators for mutation testing. 2. Investigation and implementation of the suggested mutation operators. 3. Evaluating the tool by applying it to a range concurrent systems.

Difficulty: Challenging

Automated code quality metrics for concurrent software

java concurrency software quality static analysis programming software engineering code metrics

Kostas Liaskos

Edinburgh

Details

Code quality metrics (e.g. lines of code (LOC) [1], cyclomatic complexity [2] etc.) are extensively utilised in the context of Software Quality Assurance. However, concurrency introduces new challenges in terms of the adequacy of such metrics. The aim of this project is to develop a tool that automates code quality metrics in the context of concurrent software. The core of this project is to investigate common code quality metrics utilised for non-concurrent software and suggest appropriate metrics for concurrent systems. The key tasks and challenges in this project include: 1. Reviewing common code quality metrics utilised for non-concurrent software and suggest appropriate metrics for concurrent software. 2. Investigation and implementation of the suggested metrics for concurrent software. 3. Evaluating the tool by applying it to a range concurrent systems.

Difficulty: Variable

An Abstract Cyber Security Strategy Game

cyber-security games

Hans Wolfgang Loidl

Edinburgh

Details

Cyber security is currently one of the main priority areas of the UK government with its UK Cyber Security Strategy [2] outlining the Governmentâ€™s plans for ensuring a secure and prosperous cyberspace for UK citizens and businesses. A recent boardgame, designed by Andreas Haggman, aims to model, on an abstract level, the threats and vulnerabilities of the UK and other technology-focused states in the context of on-line business and government. The main aim of this project is to implement Haggman's design for a cyber strategy game on a mobile deviced, preferably Android, and to evaluate the game by running a gaming session with testers and evaluate their experience as well as the game dynamics. The project will proceed in the following stages: - Literature review on cyber security, board game design, and programming mobile devices. - An initial implementation of the game as a multi-player game on mobile devices. - Test and evaluation of the initial implementation, identifying areas for improvement. - A revised implementation based on the above - Test and evaluation of the revised implementation, in terms of software quality and gaming experience.

Difficulty: Moderate

Continuous Compliance Validation Pipes for Autonomous Vehicles Safety Cases Using Bazel

fintech

Hans Wolfgang Loidl

Edinburgh

Details

In safety-critical systems, the validation stageâ€“that is to say a retroactive analysis of the software to ensure the requirements are metâ€“is time-consuming and expensive. Standards defined in various industries such as Automotive (ISO26262), Industrial (IEC61508), Robotics (IEC61508), Medical Devices (IEC62304), and Avionics (DO-178) all have stipulations for how software is treated and validated before it is safe to use. As expected there are many concepts that are common across the board which can be simplified with software. Bazel is a build tool developed by Google that promises to build software quickly, reproducibly, and correctly, guaranteeing that the same input produces the same output now and forever. The applications to building safety-critical software are obvious. In the case of the automotive industry, the rise of self driving cars has put new pressures on software validation. Project complexity is increasing as the expectations around safety grow. A new approach from MIT named Systems-Theoretic Process Analysis presents a method for validating complex projects by analysing the control structures and working back from an accident scenario to provide traceability into the complex set of preconditions that triggered specific unsafe control actions to occur, and is of particular use in the autonomous car industry due to the complexity of the software. Integrating reproducible builds and STPA validation into the software of the carâ€™s control system will make compliance validation cheaper, more secure, and less time-consuming. Objectives: 1. Make Baidu Apolloâ€™s build process reproducible using Bazel 2. Integrate STPA validation into the projectâ€™s tests 3. Establish a continuous integration pipeline to run this validation Further Reading: <A href="http://psas.scripts.mit.edu/home/wp-content/uploads/2013/04/Basic_STPA_Tutorial1.pdf">Basic STPA Tutorial</a> <a href="https://bazel.build/">Bazel</a> <a href="http://apollo.auto/">Baidu Apollo</a>

Difficulty: Moderate

Develop an AI agent for a multi-player on-line historical role-playing game

games ai

Hans Wolfgang Loidl

Edinburgh

Details

Role-playing games, set in an accurate historical context and supported by a scalable, distributed game engine, can provide an engaging learning environment for both players and game developers: players can learn about the historical and societal context of the game, and game developers can exercise modular design of a complex system in order to achieve scalability for a large number of players. The goal of this project is to develop an AI agent that can act as a NPC or a PC in the previously developed core game engine (JominiEngine [1]). This involves interacting with the game engine, through the same kind of API that is used for the separately developed game clients. The AI agent should be able to interact with the game world and perform basic activites in the three main areas of fief management, household management, and army management. The agent can initially be simple, and rule based, but should be extended to a version that draws on machine-learning techniques to demonstrate increased effectiveness in the game. The project will proceed in the following phases. Literature survey on game design, machine learning and AI techniques; review of the existing game model and code base Design of basic AI functionality (e.g. rule based) and its interaction with the game model Implementation of the basic AI functionality Design of improved AI functionality, drawing on machine-learning techniques Implementation of improved AI functionality Evaluation of the effectiveness of the improved vs. the basic functionality

Difficulty: Moderate

Efficient stream processing and machine learning for stock market data (industry project)

fintech industry project

Hans Wolfgang Loidl

Edinburgh

Details

High volumes of data that are generated continuously are a challenge for many application domains. Efficient implementation of a streaming pipeline is crucial to make processing feasible, and opens the opportunity for applying machine learning techniques on the data stream. The goal in this project is to evaluate and extend an existing system for high-performance stream processing, and to then use simple machine learning techniques on the data stream. The main platform is a recently developed, open source library [1], [2]. In the first step a systematic evaluation of performance and throughput of this library, in comparison with alternatives such as Apache-Kafka [3] should be performed. Possible extensions and enhancements to the performance of the library should be considered. In the second step, simple machine learning techniques should be employed to learn characteristics about the data stream. As underlying data streams, publicly available market data should be used [4], [5], [6], though commercial data integrations could also be optimised [7], [8]. If you are interested in low latency and high throughput data pipelines and want to gain experience with advanced optimization techniques, this might be the project for you! [1] https://github.com/invesdwin/invesdwin-context-integration#synchronous-channels [2] https://github.com/invesdwin/invesdwin-context-persistence#timeseries-module [3] https://kafka.apache.org/ [4] https://github.com/fxcm/ForexConnectAPI [5] https://www.alphavantage.co/documentation/ [6] https://www.dukascopy.com/wiki/en/development/strategy-api/historical-data [7] http://www.iqfeed.net/daytradersetups/index.cfm?displayaction=developer&section=main [8] https://api.tradestation.com/docs/fundamentals/http-streaming

Difficulty: Moderate

Extend an AI agent for a simple Android-based boardgame

games ai

Hans Wolfgang Loidl

Edinburgh

Details

Mobile devices increasingly attract attention as platforms for implementing boardgames. Such boardgames feature a principled game-design that offers high re-play value. An implementation on Android (or other mobile OSs) brings these games to a wide community of users, looking for simple, solitaire or networked games for entertainment. The goal of this project is to extend an existing AI agent for an existing Android implementation of the game "Guerilla Checkers" [1] by Brian Train [2]. This is a checkers-like boardgame with an asymmetric player set-up, a different victory conditions for each side. The current implementation [3], by Richard Gould, allows 2 players to play the game on an Android device. An existing AI, from a previous project, achieves basic game-play but is not competitive against a human player. The main goal is to enhance the AI to make it competitive, and (optionally) to compare it with alternative AI implementations for this game.

Difficulty: Moderate

Extending a massively multi-player on-line RPG

games

Hans Wolfgang Loidl

Edinburgh

Details

Role-playing games, set in an accurate historical context and supported by a scalable, distributed game engine, can provide an engaging learning environment for both players and game developers: players can learn about the historical and societal context of the game, and game developers can exercise modular design of a complex system in order to achieve scalability for a large number of players. The goal of this project is to extend the implementation of a previously developed core game engine. This involves adding in-game functionality of the basic game model, such as enhanced player interaction or more accurate modelling of battles, performance improvements to the core game engine, such as faster database access, and assessing the extended game engine in terms of latency, performance and scalability. The project will proceed in the following phases. <ul> <li> Literature survey and review of game model <li> Design of extensions to the core game engine <li> Implementation of extensions to the core game engine <li> Evaluation of latency, performance and scalability </ul>

Difficulty: Moderate

Extending a massively multi-player on-line RPG

games

Hans Wolfgang Loidl

Edinburgh

Details

Role-playing games, set in an accurate historical context and supported by a scalable, distributed game engine, can provide an engaging learning environment for both players and game developers: players can learn about the historical and societal context of the game, and game developers can exercise modular design of a complex system in order to achieve scalability for a large number of players. The goal of this project is to complete the implementation of a <a href="MScCoreGame.html">previously developed core game engine</a>. This involves adding in-game functionality of the basic game model, implementing a client-server API to allow interaction of players with the core game engine, and assessing the extended game engine in terms of latency, performance and scalability. The project will proceed in the following phases. <ul> <li> Literature survey and review of game model <li> Design of extensions to the core game engine, completing the game model <li> Implementation of extensions to the core game engine <li> Evaluation of latency, performance and scalability </ul>

Difficulty: Moderate

First Class Serialization for Distributed Haskells

distributed programming

Hans Wolfgang Loidl

Edinburgh

Details

Follow <a href="http://www.macs.hw.ac.uk/~hwloidl/MScProjects/MScFirstClassSer.html#spec">the link below</a> for a detailed discussion.

Difficulty: High

High-performance graph algorithms for social networks

Hans Wolfgang Loidl

Edinburgh

Details

Relationships in social networks such as Facebook are typically captured as graphs with users as nodes and relationships as edges. Such graphs become huge when used in the context of social networks. Learning non-trivial relationships and trends in such networks is very time consuming and therefore needs efficient algorithms. In this project, application kernels should be developed for parallel graph algorithms on large graph structures, in order to learn new relationships. The core activity will be to implement parallel versions of the graph traversal algorithms and to assess performance. These application kernels should be implemented in an object-oriented language (eg. Java or C#) and in a functional language (eg. Haskell or ML). The performance of both implementations should be evaluated on a range of large input graphs.

Difficulty: Moderate

Extend a novel board-game and develop an AI for it

games ai

Hans Wolfgang Loidl

Edinburgh

Details

The board-game "Empires of the Skies" is a novel, unpublished board-game currently in design phase. Nowadays, playtesting of new board-games is often done online, using web-based platforms A prototype implementation for this game exists, as a browser game using Java- and Type-script. The goal of this project is to complete the implementation of the board-game "Empires of the Skies", to support the entire game, to develop a simple AI for this game, and then to evaluate both the implementation and the AI. As an optional goal, a simple AI should be implemented for the game, to allow single player usage and to facilitate playtesting.

Difficulty: Moderate

Implement a simple boardgame and develop an AI for it

games ai

Hans Wolfgang Loidl

Edinburgh

Details

The goal of this project is to first develop an implementation of a simple BOARD GAME (several options below) for Android-based tablet devices, or on a desktop/laptop, and then to DEVELOP an AI in the game for one of the factions. Choice of technologies is flexible, and should meet the requirements of casual tablet/laptop usage. There are several options of GAMES to implement: - "Kashmir Crisis" (https://brtrain.wordpress.com/2019/08/29/new-game-kashmir-crisis) - More Open Design games by Brian Train (https://brtrain.wordpress.com/free-games/) - Spellcast (https://www.andrew.cmu.edu/user/gc00/reviews/spellcaster.html) - Several games by game designer Neil McCormack - "Origins of World War I" (https://boardgamegeek.com/boardgame/17967/origins-world-war-i) - Schlieffen - Fire&Move - Agricola Express - More board-games and card-games can be discussed for implementation Whatever the game, the project will proceed in the following PHASES: - Literature survey on game design, machine learning and AI techniques; - Review of rules and game mechanisms of the boar-game - Implementation of the board-game as a (multi-player) tablet-based game - Design of basic AI for one of the factions - Implementation of the basic AI functionality - Design of improved AI functionality, drawing on machine-learning techniques - Implementation of improved AI functionality - Evaluation of the effectiveness of the improved vs. the basic functionality

Difficulty: Moderate

Parallel programming on the Xeon Phi Many-core Coprocessor

parallel computing

Hans Wolfgang Loidl

Edinburgh

Details

The new Intel Xeon Phi coprocessor is an accelerator card that promises to boost the performance of the host machine by offloading parallel code to a 61-core processor. It brings affordable many-core technology to standard desktop machines, and claims to be easier to program than other accelerators such GPGPUs. The goal of this project is to use a set of common parallel benchmark programs, to run them both on the Xeon Phi and on a departmental many-core server, in order to compare performance on both architectures. In a second phase, a simple parallel program should be developed from scratch, using the Xeon Phi's tool support, in order to assess programability of this new architecture.

Difficulty: Moderate

Parallel symbolic computation on distributed memory machines

parallel computing

Hans Wolfgang Loidl

Edinburgh

Details

Symbolic computation is characterised by performing compute-intensive operations on highly-structured, complex data. The parallelism in these applications is typically dynamic and irregular, i.e. it is generated throughout the computation and varies significantly in size. These characteristics make it difficult for conventional parallel programming languages. As a high-level parallel programming language, the pure, non-strict functional language Haskell will be used. It provides extensions to support both shared-memory and distributed-memory parallelism. The focus in this project is on the latter. This project will use the SymGrid-Par infrastructure for parallel programming, together with the GAP system for computational algebra, to implement one concrete symbolic application. Candidate applications come form the area of computational algebra, and include parallel resultant computation, squarefree factorisation and solving polynomial systems of equations. The thesis will report on the process of parallelising the application, reflect on the sources of parallelism, the suitability of the language and infrastructure for parallelisation, and assess the performance of the parallelised application on our Beowulf cluster.

Difficulty: High

Software Systems for Autonomous Cars

autonomous cars machine learning

Hans Wolfgang Loidl

Edinburgh

Details

F1Tenth is a miniature race car, on a 1/10 scale, which is available in the department. The F1Tenth simulator (https://github.com/f1tenth/f1tenth_simulator), is a virtual environment for controlling this car. Control of the car builds on the ROS software which is identical to the real car software we use. In a previous project, a neural-network based AI for over-taking in the simulator has been developed. The goal in this project is to develop this AI further, extend the simulator, and test this environment for multiple cars at the same time. Optionally, the control software can be tested on the physical car as well.

Difficulty: Moderate

Unity-based client for a massively multi-player on-line historical RPG

games

Hans Wolfgang Loidl

Edinburgh

Details

Role-playing games, set in an accurate historical context and supported by a scalable, distributed game engine, can provide an engaging learning environment for both players and game developers: players can learn about the historical and societal context of the game, and game developers can exercise modular design of a complex system in order to achieve scalability for a large number of players. The objective of this project is to enhance a Unity-based client for an existing server for a historical role-playing game. This graphical client should enhance the current functionality of interacting with the game, and introduce new features to improve the general user experience, building on features provided by the Unity framework. The development of the clients should be modular, to maximise the re-use of code between the clients. The usability of these clients should be assessed through user surveys. The project will proceed in the following phases. Literature survey on game mechanics and usage aspects of game clients Design of the software architecure for all clients and delineation of differences Implementation of a text-based game client (for desktops) Implementation of a GUI game client (for desktops) Implementation of an handheld-based game client Assessment of all clients in terms of usability, flexibility and modularity

Difficulty: Moderate

Advanced Docker usage

cloud services

Hans Wolfgang Loidl

Edinburgh

Details

Docker [1] is a popular virtualisation technology in particular in the context of DevOps. It allows to manage Linux containers, each running its own image, and thereby isolating the software context in a software development project. In contrast to full virtualisation, as in VirtualBox or VMWare, docker images use para-virtualisation, sharing access to the same (host) OS kernel. Para-virtualisation achieves higher performance than full virtualisation, but also restricts the usage to running Linux inside Linux. The goal of this project is to develop teaching material covering: . underlying concepts of (para-)virtualisation . basic docker usage information . some advanced docker usage information . case study of docker usage in software development . or case study of using docker in teaching

Difficulty: Moderate

Kubernetes usage

cloud services

Hans Wolfgang Loidl

Edinburgh

Details

Kubernetes [1] is a popular Cloud management infrastructure, specifically for the deployment of Cloud applications. It provides location transparency, avoiding tying the running of an application to one particular machine, resilience and replication, to provide continuous, scalable execution of an application with high resource requirements. The goal of this master class, is to develop teaching material for the practical use of kubernetes, based on Googles web page below [1], and to develop a concrete case study that demonstrates the advantage of a web application running on this platform.

Difficulty: Moderate

On-line board-game platforms as educational tools for web-development

games

Hans Wolfgang Loidl

Edinburgh

Details

Several web-based platforms for implementing board-games have become very popular in particular during the lock-down period. These platforms typically use a mixture of web-based languages and infrastructures to provide an easy-to-use development platform. The goal of this project is to evaluate one or more of these platforms in terms of their provision of a web-based development platform, and the benefit the developer gets, in terms of a learning experience in web development, from using these tools. Some platforms are mentioned below

Difficulty: Moderate

Using BoardGameArena for Serious Games

games

Hans Wolfgang Loidl

Edinburgh

Details

Several game engines allow for easy development of computer games. One such platform is BoardGameArena [1], which specialises in the web-based development of digital browser-games using PHP and JavaScript as implementation languages. The goal of this project is to develop teaching material on how to use BoardGameArena to implement serious games, i.e. games that are designed and developed for a particular learning purpose. To this end a case study of implementing a simple board game should be performed. [1] BoardGameArena Studio https://studio.boardgamearena.com

Difficulty: Moderate

Playful learning

gamification

Hans Wolfgang Loidl

Edinburgh

Details

The concept of plaful learning uses gamification concepts to make the learning process more engaging. These can be simple, such as providing badges on achievements, or more advanced, such as space race functionality in quizzes. The main task for this master class is to develop teaching material that exemplifies the usage of platforms such as kahoot or TopHat for playful learning, with a critical reflection on advantages and disadvantages of these platforms.

Difficulty: Moderate

The RISC-V architecture

computer architecture

Hans Wolfgang Loidl

Edinburgh

Details

The RISC-V architecture is a new, open-hardware computer architecture that is increasingly popular and becoming a competitior to established architectures, e.g. ARM. It provides advantages in terms of modularity, flexibility, and open design. The main task in this master class is to give an overview of the main characteristics of this new architecture, mainly for programmers (rather than electrical engineers). This should also critical reflect on advantages and disadvantages. The practical part of the master class should provide some simple programming exercise that elaborates on the differences in architecture: this could use RISC-V vs ARM assembler.

Difficulty: High

Deep resource monitoring and machine-learning analysis

Hans Wolfgang Loidl

Edinburgh

Details

The goal of the project is to develop tools for resource monitoring of a range of devices, perform the data collection based on these tools, and then to apply machine learning techniques about the resource usage to gain insight

Difficulty: Easy

Monitoring and Visualisation of Scalable Kubernetes Clusters using Prometheus etc

kubernetes cloud

Hans Wolfgang Loidl

Edinburgh

Details

Difficulty: Moderate

Application or verification of Attack-Defense Trees

Manuel Maarek

Edinburgh

Details

This project is to develop a tool for automatic generation of Attack Defense Trees, or an analysis tool for Attack-Defense Trees.

Difficulty: High

Code-based in-class engagement app

Manuel Maarek

Edinburgh

Details

This project is to design and develop an in-class engagement mobile application around programming code. A Socrative-like app for code.

Difficulty: Moderate

Compiler for Continuous Integration

Manuel Maarek

Edinburgh

Details

Design and implement a compiler for complex development and code based operations. Continuous integrations system such as GitLab-CI would be the runtime target.

Difficulty: Challenging

GitHub/GitLab programming game

Manuel Maarek

Edinburgh

Details

This project is to develop a GitHub-based or GitLab-based programming game for users to improve their programming skills. The project is to use the GitHub API in the building of the game engine.

Difficulty: Variable

GitLab for security analysis (DevSecOps)

Manuel Maarek

Edinburgh

Details

This project is to build a DevSecOps workflow extension to GitLab for security analysis. Such extension could take the form of adding Attack-Defence Tree modeling for security code review, or integrating a code static analyser such as Infer https://fbinfer.com/ for secure code analysis. This project is also a Master Class on existing features of security analysis existing within GitLab.

Difficulty: High

GitLab integration for code peer-testing

Manuel Maarek

Edinburgh

Details

This project is to build a GitLab extension for peer-testing and peer-feedback of programming code. The project aims at providing a user-friendly solution for giving and receiving feedback on programming artifacts.

Difficulty: Moderate

Implementing a Secure Application in Rust

Manuel Maarek

Edinburgh

Details

Rust is a recent systems programming language that claims to run blazingly fast, and to prevent segfaults, and to guarantee thread safety. This project is to implement a secure application in Rust to evaluate its effectiveness in secure software development.

Difficulty: High

Investigate Terms and Conditions of Open APIs

Manuel Maarek

Edinburgh

Details

Mobile and Web applications are making use of Open APIs to access services. These APIs come with a developer terms of service. This project is to investigate and compare terms of services of existing APIs.

Difficulty: Moderate

Cyber Security Cards

Manuel Maarek

Edinburgh

Details

This project is about the use of a deck of Cyber Security Cards. The current deck covers software security and is based on the CyBOK (Cybersecurity Body of Knowledge) https://www.cybok.org/ . The cards deck is both physical and digital. This project could take different directions. - Evaluate the deck of cards in a training or education context. - Extending the deck of cards from its current scope. - Linking or extending the deck towards other security knowledge bases. - Expand the uses of the decks with knowledge extension, linkage or augmented reality.

Difficulty: High

Security and Programming Languages

Manuel Maarek

Edinburgh

Details

The choice of a programming language has major implications, including on security. This project is to investigate how secure or insecure a programming language is with regard to some known weaknesses (CWE, CAPEC, OWASP) by implementing a secure application. This project is also a Master Class on security features of programming languages.

Difficulty: Moderate

Security Benchmark of XML Libraries

Manuel Maarek

Edinburgh

Details

There exists a number of XML Libraries for various programming languages and platforms. Comparison of these XML Libraries exist but focus on speed and features. The aim of this project is to compare various XML parser implementation with regard to security issues (XML entity attacks, cyclic references, remote access, encoding based attacks, ...).

Difficulty: Moderate

Serious Games for Cyber Security or Software Engineering

Manuel Maarek

Edinburgh

Details

This project is to design, develop or evaluate serious games targeting cyber secutity or software engineering practices. The target audiance (novice/expert), the purpose (educational, training, awareness), and the type (digital, board) of the game is to be determined. This project is also a Master Class on teaching cyber security existing serious games for cyber security.

Difficulty: Challenging

Static Security Analyser for OCaml

Manuel Maarek

Edinburgh

Details

Typed functional programming languages offer great benefits in term of code safety and security. However some implementation details should be taken care of. This project involves implementing a verifier of programming rules for the OCaml language.

Difficulty: Moderate

CI/CD, DevOps, Cloud Orchestration languages

Manuel Maarek

Edinburgh

Details

This project is to explore the languages of CI/CD, DevOps, Cloud Orchestration.

Difficulty: High

Static Application Security Testing (SAST)

Manuel Maarek

Edinburgh

Details

The project is to study and develop strategies for Static Application Security Testing (SAST) using GitLab's Security Dashboard and SemGrep.

Difficulty: Challenging

Evaluating GitLab-based Programming Education Workflows

Manuel Maarek

Edinburgh

Details

A number of courses use GitLab-Student for their teaching assessment of programming-based courses. This project is to evaluate current or new education workflows. This could include feedback on code, tests (CI) outcomes, group or individual assessments, metrics, interactions with Canvas, interactions with other tools.

Difficulty: High

Master Class on PDL Prompt Declaration Language

Manuel Maarek

Edinburgh

Details

This project is to design a Master Class on the PDL Prompt Declaration Language developed by IBM available at https://ibm.github.io/prompt-declaration-language/ and https://github.com/IBM/prompt-declaration-language https://arxiv.org/abs/2410.19135

Difficulty: High

Teaching tool for studying Finite Automata and Regular Languages

finite state automata regular languages models of computation

Radu Mardare

Edinburgh

Details

Finite automata (FA), in the form of deterministic finite automata (DFA) and non-deterministic finite automata (NFA), are a class of fundamental computational devices that recognize Regular Languages. They are intensively used in the practise of computer science, and they have been studied repeatedly in our courses. The aim of this project is to produce a teaching app that can help the students studying and manipulating FAs. The app should propose an appropriate interface where the students can design and work with FAs. There are many possible operations on FAs that can be automatized, such as: - The graphical construction of a DFA or NFA and its mathematical specification - Given a DFA or NFA, the simulation of computations on particular inputs - The determinization of a NFA - Computing operations with FA – Union, Concatenation, Star, Intersection, Complement - Computing the regular expression that characterizes the language recognized by a FA - Construct a FA that recognizes a given regular expression - Construct a FA for the reversal of the language recognised by a given FA - Minimizing a DFA

Difficulty: Moderate

The equivalence of Epistemic Systems

epistemic systems epistemic games epistemic logic transition systems modal logic

Radu Mardare

Edinburgh

Details

Epistemic systems are systems of agents witnessing a certain reality and computing knowledge about it. They are intensively used in modelling, for instance security systems where one wants to understand and control the information accessed by certain agents active on a network. Due to its relevance in applications, the field of epistemic logic had a considerable evolution in the last decades. This project aims at developing a couple of fundamental concepts of equivalence that might be relevant for epistemic systems. For instance, (i) what does it means for two agents to have equivalent knowledge? or (ii) what does it means for two societies of agents to have equivalent knowledge? or even (iii) what makes two epistemic systems equivalent? To argue for possible answers to these questions one can make use of fragments of epistemic logic. This project can be developed either as a theoretical work, or it can focus on implementing some dedicated algorithms related to the aforementioned problems.

Difficulty: Moderate

Simulating and predicting the knowledge of agents in Epistemic Systems

epistemic systems epistemic games epistemic logic transition systems modal logic

Radu Mardare

Edinburgh

Details

Epistemic systems are systems of agents witnessing a certain reality and computing knowledge about it. They are intensively used in modelling, for instance of security systems where one wants to understand and control the information accessed by certain agents active on a network. Due to its relevance in applications, the field of epistemic logic had a considerable evolution in the last decades. This project aims at developing concepts of simulation that can be useful in applications. For instance, what does it mean that an agent can simulate another agent? This can be a useful concept in security, where dissimulated behaviours can be used to avoid security checks. This project can be developed either as a theoretical work, or it can focus on implementing some dedicated algorithms related to the aforementioned problems.

Difficulty: Moderate

Evolutionary Approach To Soft Robotic Design

evolutionary algorithms soft robotics

Alistair Mcconnell

Edinburgh

Details

Soft robotics is a relatively novel field of robotic design and development and due to its bio-inspired nature there are a vast number of permutations of simplistic soft robots. This project would involve using evolutionary algorithms to evolve a Voxel-based soft robot design.

Difficulty: High

Environmental Monitoring Sensor Network using LoRaWAN

lorawan environmental sensors tinyml

Alistair Mcconnell

Edinburgh

Details

The objective of this project will be to help Heriot-Watt University create an environmental sensor network at the Edinburgh Campus through the use of a LoRaWAN network. The data should be displayed using an intuitive UI for users to monitor. Work will be done on using edge processing on the sensors to maximise the sensor's battery life and compensate for the LoRa Bandwidth restrictions.

Difficulty: High

Road/Infrastructure Quality Monitoring Using IMU Crowdsourced Data Gathering

android app crowdsourcing data data visualisation

Alistair Mcconnell

Edinburgh

Details

There are approximately 247,800 miles of road in the UK [1] which can vary in quality from brand new to close to destruction. There are also approximately 72,000 bridges in the UK and recent studies put around 4.4% of those as substandard [2]. It is close to impossible to accurately monitor all of this infrastructure and the cost of embedding sensors can be prohibitive, it has been suggested that crowdsourced IMU data could be used to passively monitor infrastructure as it is traversed by cars. This project would involve creating an application to use a phone IMU and GPS to monitor vehicle journeys and create a database and visualisation that could show the road quality.

Difficulty: High

Robotic Arms for Everyone!

robotics remote labs user evaluation

Alistair Mcconnell

Edinburgh

Details

Space can be expensive and hard to get, robots are also expensive and hard to get. Both of these factors makes running robotic labs a complex and difficult venture. One way we can tackle this problem is through the implementation of Remote Labs. The project would involve: 1) The integration of a small robotic arm and the Practable.io backend 2) The testing and evaluation of the system with a suitable group of participants

Difficulty: Challenging

Development of Software for Customised 3D Printed Prosthesis

prosthetics parametric cad

Alistair Mcconnell

Edinburgh

Details

Development of a parametric algorithm that can automatically scale and generate a new prosthetic hand for a user. To make it more challenging, this could also incorporate the motors, power, etc, required for an active prosthetic.

Difficulty: High

B"ohm trees

James McKinna

Edinburgh

Details

B"ohm trees are infinite objects corresponding to the successive 'unfolding' of a term in lambda calculus; they can be regarded either as a possible semantics of lambda terms, or as an extended language of terms with extended notions of the usual reduction relations on lambda terms. This project, which can be appropriately scoped according to the level and skills of the student, is to study formal representations of lambda terms in a system such as the Agda implementation of type theory. Aspects of the theory of B"ohm trees include: * continuity properties (trees can be given a topology for which application and lambda abstraction are continuous) * extending the Standardisation theorem for ordinary lambda calculus to the extended reduction system on trees; * other topics as they may arise The interested student should have a good mathematical background; an interest in (the foundations of) programming languages; preferably direct experience in the form of our CS Foundations I and II courses

Difficulty: Challenging

Gradual typing in Python

James McKinna

Edinburgh

Details

Python is a dynamically typed language with wide application in many data-intensive areas. As such, it has no built-in support for type checking, although users may document their code with the intended types; at present these are unchecked. The aim of this project is to investigate adding types and type checking to an existing python implementation, with a view to evaluating the behaviour and performance of existing python code under a (more strongly) typed discipline. There is considerable scope for theoretical and practical investigations in this area; interested students should discuss with me how they would like to approach the project. Existing work on 'Reticulated Python' may be of use as background material.

Difficulty: Challenging

Interactive Theorem Proving in Agda

James McKinna

Edinburgh

Details

The Fundamental Theorem of Arithmetic (FTA), is as its name implies, one of the elementary cornerstones of number theory. Agda is an interactive theorem prover based on intuitionistic type theory, with a rich and highly developed library of elementary mathematical and computational structures. A glaring omission from the library is a complete formalisation of FTA. A concrete objective would be a complete proof of this result, and its successful incorporation into the library.

Difficulty: Variable

programming languages in the K framework

James McKinna

Edinburgh

Details

The K framework https://kframework.org/ (with additional tooling at https://github.com/runtimeverification/k) is, in the words of its developers, "a rewrite-based executable semantic framework in which programming languages, type systems and formal analysis tools can be defined using configurations and rules. " This project concerns doing experiments with representation of, and perhaps reasoning with, programming languages not already supported by K. We also have the offer of services from runtimeverification.com for technical advice and support. Further details of the scope and difficulty of this project available on request. There is room for >1 student to work with me on this.

Difficulty: Easy

Simplicial Objects in Dependent Type Theory

James McKinna

Edinburgh

Details

There has been much recent interest from the mathematics community in so-called *cubical type theory*, a development of intuitionistic type theory in which homotopy-theoretic structure and results may be developed synthetically. Classical structures in homotopy theory include the so called simplicial category, %Delta$ and simnplicial structures, considered as presheaves on $Delta$ with values in suitable categories of interest. Mac lane's "Categories for the Working Mathematician" contains the essential material with which to begin the project. The aim of this project is to develop the general theory of such structures in the Agda theorem prover, an implementation of intuitionistic type theory, and specifically to do so on top of the existing standard library Data.Fin of the dependent family `Fin n` (for `n : Nat`) of finite types. This project is suitable for mathematics students, and mathematically inclined computer science students interested in developing a chapter of classical mathematics in a formalised setting.

Difficulty: Challenging

XAI: explainable search procedures

James McKinna

Edinburgh

Details

Some classical AI logic/'puzzle solving' problems were some of the earliest search problems for which heuristic-guided search procedures were developed; some such procedures may be described in terms of (variously elaborate) proof systems for derivation in suitable logics (as indeed, can classical proof-search procedures). But most puzzle-solving algorithms merely compute solutions, rather than explanations to the human consumer of hoe those solutions were arrived at. The aim of this project is to take an existing solver (or else write one of your own) for a given puzzle game (such as Sudoku etc.) and instrument in such a way as to produce interactive explanations of how to solve a problem instance. There are lots of avenues in which such a project could then be taken: user studies of the appropriateness/intelligibility of the computed explanations; various aspects of additional learning/increase in the expressive power of explanations; etc. More than one student could do this project; but they would need to work on different puzzles/solvers.

Difficulty: Variable

Computer-assisted financial trading

Radu-Casian Mihailescu

Dubai

Details

The goal of this project is to leverage Machine Learning approaches to create an intelligent software agent that can teach itself and adapt to market conditions in order to generate successful trading strategies in the stock market. The developed algorithms will be evaluated in real-world set-ups.

Difficulty: Moderate

Context-aware object detection in computer vision

Radu-Casian Mihailescu

Dubai

Details

Surveillance cameras are typically placed in different contexts which are not known beforehand and that may change dynamically. For example, cameras should be able to cope with background variations such as light changes, weather and seasons in the specific scene, as well as to improve performance with time while it is adapting to the user's scene. Thus, in order to contribute optimally to realizing the system goals, they should be able to adapt to the context they are placed in and its current state. In this work we are going to look into various contextual data from the environment and design algorithms that can leverage this information in order to improve classification performance.

Difficulty: Moderate

Fake news detection - a machine learning approach

Radu-Casian Mihailescu

Dubai

Details

The digital media landscape has been exposed in recent years to an increasing number of deliberately misleading news and disinformation campaigns, a phenomenon popularly referred as fake news. In an effort to combat the dissemination of fake news, designing machine learning models that can classify text as fake or not has become an active line of research. In this work we will investigate viable machine learning approaches for the task of fake news detection.

Difficulty: Moderate

Intelligent decision-making model for energy consumption in a smart building

Radu-Casian Mihailescu

Dubai

Details

Within the worldwide perspective of energy efficiency, it is important to highlight that buildings are responsible for 40% of total European energy consumption, which has a contribution of 36% towards greenhouse gas emissions. Since buildings are large contributors of greenhouse gases, it is critical to develop solutions that improves energy savings and achieve sustainability goals in the development of cities. In this project, development of an Internet of Energy (IoE) management system in a smart building is expected for intelligent decision making processes. Internet of energy management systems by collecting sensory data and analyzing them with machine learning techniques will make intelligent decisions to improve energy consumption of appliances in one or several rooms in a smart building. The aim of this project is to propose a model based on the preferences and behavioural habits of the people that live in households and the interdependency of the appliances that are active at the moment. The dataset or collected data will be analyzed in order to provide a suggestion for optimization of the electricity consumption per appliance.

Difficulty: Moderate

Synthetic dataset generation

Radu-Casian Mihailescu

Dubai

Details

Synthetics data generation has the potential to become an excellent source of ground truth for many computer vision applications. However, the gap between real and synthetic data remains a problem that we are going to address in this task. The goal is to use procedurally-generated data models to synthesize datasets with minimal domain gap.

Difficulty: Moderate

Image analysis for satellite data

Radu-Casian Mihailescu

Dubai

Details

In this project we will investigate various deep learning neural network architectures for image analysis on satellite data. The aim is providing actionable insights from analyzing the remote sensing data. Use cases may include one or more of the following: Qualitative analysis 1. Vegetation quality 2. Soil quality 3. Sand dune movements patterns 4. Oil spills near the sea 5. Monitoring temperature of seawater near power stations/cooling purposes/Independent sensors 6. Monitoring heat leakage in residential/commercial buildings 7. Monitoring of solar panel conditions 8. Monitoring coastal changes 9. Land use/Land cover Quantitative analysis 1. Detecting Number of Buildings 2. Cars, People etc. 3. Monitoring above the ground high voltage installation 4. Mapping of Geotechnical Investigation 5. Building permit verification 6. Base map updating 7. Disaster mitigation planning 8. Counting Palm Trees 9. Monitoring city night lights 10. Monitoring of building usage 11. Monitoring construction progress This project may involve collaboration with Eaglei71.

Difficulty: Moderate

Domain adaptation in computer vision

Radu-Casian Mihailescu

Dubai

Details

Perform domain adaptation from photometric images (visible imager) to radiometric (thermal imager) using a neural network. Proposed methods are (but not limited to): a. self supervision with an attention network b. supervised learning with a GAN Goal of the thesis is good features representation capability of the network. This work will be carried out in collaboration with the Technology Innovation Institute (Masdar City).

Difficulty: Moderate

Forecasting Energy Consumption using Machine Learning

Radu-Casian Mihailescu

Dubai

Details

Forecasting electricity demand accurately is a critical part in ensuring optimized and cost-effective operation, especially in the context of smart buildings (office, commercial or household). Various Machine learning techniques will be implemented and evaluated comparatively within this project. The project will investigate consumption prediction for different time horizons and at various levels of aggregation, customer profiling and segmentation, as well as including work on exploratory data analysis and different visualisations techniques. Possible collaboration and internship with RTA (Details to be communicated later). Could entail working with real data and might require meetings with external stakeholder.

Difficulty: Moderate

Evaluation of ChatGPT and related LLMs

Radu-Casian Mihailescu

Dubai

Details

The goal of this project is to conduct a comprehensive quantitative evaluation of ChatGPT and related LLMs using publicly available datasets on various NLP tasks such as question-answering, summarisation, information extraction, natural language generation, etc.

Difficulty: Moderate

Deep Reinforcement Learning Applications

Radu-Casian Mihailescu

Dubai

Details

(Deep)Reinforcement Learning (RL), agents are trained on a reward and punishment mechanism. The agent is rewarded for correct moves and punished for the wrong ones. In doing so, the agent tries to minimize wrong moves and maximize the right ones. The general framework of RL makes it suitable for a large number of applications including autonomous driving, financial services, healthcare, natural language processing, etc. The aim of this projects is to develop and evaluate an RL agent for a specific application domain.

Difficulty: Moderate

Machine Learning for Biomedical Data and Personalised Healthcare

Radu-Casian Mihailescu

Dubai

Details

Machine Learning and AI is gradually transforming healthcare services by offering diagnosis and information tools that enable individualised patient management. Examples include, providing personalised treatment recommendations for patients about the right drug and the right dose, as well as customised recommendations based on the person’s lifestyle and behaviour, in order to prevent disease and get ahead of problems that could be troublesome down the road. In this context, machine learning approaches based on medical dataset are proving to be a highly effective strategy in addressing these challenges. The thesis work will focus either at providing a comprehensive study of the application of machine learning to personalised healthcare or will be addressing more in-depth a specific topic such as drug discovery or image processing for automated diagnosis.

Difficulty: High

Federated learning systems for computer vision

Radu-Casian Mihailescu

Dubai

Details

The implementation specifications for FL frameworks can vary over a significantly large design space, based on the intended properties and parameterization of the models. In this task we are going to systematically analyse the importance of different quality characteristics as it pertains to key computer vision tasks.

Difficulty: High

Solar Photovoltaic Characterisation & Yield Prediction

Radu-Casian Mihailescu

Dubai

Details

Renewable Energy is generally capital intensive relative to its cost of operation and maintenance. The cost of raising capital therefore has a significant impact on the Levelised Cost of Energy (LCOE). Higher confidence of yield prediction throughout the expected life of a renewable energy plan therefore can drive down the cost of energy. This project will train artificial neural networks to better predict energy yield in varying environmental conditions (irradiance, air mass, ambient and panel temperatures, etc) and over long periods of time, to account for cleaning schedules and degradation rates.

Difficulty: Easy

Influencing User Behaviour to “Be Green”

Radu-Casian Mihailescu

Dubai

Details

What initiatives and technology can influence occupant behaviour to reduce energy consumption, carbon footprint and improve overall sustainability. If technology is to bridge the gap between renewable energy resource and service level need, then what behaviour changes will help to narrow the gap? And can this be achieved without substantially reducing service standards? This project is likely to assess the service level needs of HWU Dubai staff and students and identify opportunities to “behave” more sustainably. The role of technology to inform sustainable choices and the potential to incentivise sustainable behaviour will also be assessed.

Difficulty: Easy

Explainable AI for Deep Learning Models: Better understanding of their decision-making processes

Radu-Casian Mihailescu

Dubai

Details

Deep neural networks are very complex, and their decisions can be hard to interpret. In this project we want to use Explainable AI (XAI) to understand why a deep neural network makes a classification decision or prediction. XAI can be used to determine the importance of features of the input data, as a proxy for the importance of the features to the deep neural network. It has a wide range of applications across various domains where the transparency, interpretability, and accountability of AI systems are essential such as health, finance, or autonomous systems.

Difficulty: Moderate

Federated Learning for IoT

Radu-Casian Mihailescu

Dubai

Details

Federated Learning (FL) is a decentralised approach / technique to train Machine Learning (ML) models that are distributed at the edge of the network. FL aims to enable multiple actors to build a common and robust ML model over local datasets (i.e., without shring data). A number of FL frameworks exist showing different charachteristics. This project aims to conduct experiments and evaluate some of the most popular open-source FL frameworks against criteria like: performances (complete task per time unit, aka, throughput), resources consumption (CPU, Memory, GPU), convergence, deployment effort, flexibility, accuracy, and scalability.

Difficulty: Moderate

Interactive learning with LLMs

Radu-Casian Mihailescu

Dubai

Details

The projects aims to investigate the deployment of LLMs via efficient interactive learning strategies in order to support and enhance human learning environments. Sub-topics that will be investigated throughout the project include: - Adaptive learning experience by providing personalised feedback to learners and adapting to different learning styles, question & answer scenarios, etc. - Educational content generation: investigate the capabilities of LLM to produce (customised) material in the form of practice exercise and relevant content for specific domains The outcome of the project is to implement and design an interactive assistant/tutor and to provide an analysis of the potential use of LLM in educational setups.

Difficulty: Moderate

An investigation of Bias and Fairness in LLM generated content

Radu-Casian Mihailescu

Dubai

Details

It has well been documented that LLM have the propensity towards hallucinations i.e. generating non-factual data, as well as being biased in their output due to the nature of their training data and/or training procedures. The goal of this project is two-fold: -develop methods and tools aimed towards detecting the occurrence of biased output/hallucinations in LLMs -to propose and experiment with different techniques and methodologies designed to mitigate bias

Difficulty: Moderate

Enhancing LLM reasoning by integrating formal planners and theorem provers

Radu-Casian Mihailescu

Dubai

Details

On the one hand, LLMs are currently the state-of-the-art method in terms understanding and generating natural language, however they are inherently limited for tasks that require rigorous formal reasoning or planning. On the other hand, tools such as theorem provers and formal planners are specialised in handling precise logical operations or sequential decision-making based on formal rules. The scope of this project is to combine the strength of both approaches into a hybrid model that brings together LLMs with formal reasoning tools. The proposed approach will be evaluated in the context of complex problem-solving tasks.

Difficulty: Moderate

Efficient training techniques for fine-tuning LLMs to domain-specific applications

Radu-Casian Mihailescu

Dubai

Details

Fine-tuning LLMs involves adapting a pre-trained model for a specific downstream task, with applications to various domains such as medical, financial, educational, or legal. Several parameter-efficient methods have been proposed in order to deal with reducing computational resources during training, such as: - Adapters: small neural network modules are inserted into each layer of a pertained model - Low-Rank Adaptation: training smaller low-rank matrices that are added to the existing weights - Prefix-tuning: adjusting the prefix embeddings without altering the model's weights -Sparse Fine-tuning: training only a subset of the model's parameters based on their relevance for a specific task The goal of the project is to conduct a comprehensive comparative study of the different approaches, and provide insights into the strength and drawbacks of such methods in different contexts.

Difficulty: Moderate

Evaluating Computational Creativity in Large Language Models (LLMs)

Radu-Casian Mihailescu

Dubai

Details

Over the past years, LLMs have demonstrated impressive performance across various Natural Language Processing tasks. However, evaluating creativity of LLMs remains a challenging topic due to the subjective nature of creativity itself, as well as the lack of clear-cut metrics and benchmarking datasets. The aim of this project is to propose a framework for assessing the creative abilities of LLMs, drawing attention to the capabilities and limitations across a number of state-of-the-art LLMs.

Difficulty: Moderate

Machine Learning for 3D Tooth Segmentation on Cone-beam Computed Tomography(CBCT) Image Data

Radu-Casian Mihailescu

Dubai

Details

The aim of the study is to utilise Deep Learning image processing techniques in a unique CBCT dataset to evaluate adverse effects on the bone tissue surrounding the teeth following orthodontic treatment. In the study, AI assisted interpretation of CBCT images and clinical data will be employed to identify biomarkers capable of predicting which patients are suitable for various orthodontic treatments and which are at increased risk of adverse effects and relapse after orthodontic treatment. The project involves collaboration with medical specialist. The outcome of the project involves developing and AI-driven tool for automated tooth segmentation on CBCT imaging.

Difficulty: Moderate

A Multi-Modal Gait Analysis and Clinician-Aligned Report Generation Framework for Children with Cerebral Palsy

Radu-Casian Mihailescu

Dubai

Details

Cerebral palsy (CP) is the leading cause of motor disability in children, significantly impairing mobility and quality of life. Children with CP exhibit gait patterns that are difficult to categorize without thorough clinical assessment. Accurate identification of CP-specific gait abnormalities is crucial for effective intervention. Conventional clinical assessments are both labor-intensive and subjective in nature, whereas the machine learning-based efforts lack explainability and robustness, especially under low-data settings. Moreover, existing work has utilized predominantly kinematic and/or kinetic data for CP gait analysis without considering the skeleton joint data and multimodal data integration. This project proposes a novel, interpretable, and end-to-end multi-modal framework for gait analysis and clinician-focused report generation tailored to CP, using advanced Deep learning models for computer vision and LLMs for report generation. The system will be evaluated using real-world data collected at the Al Jalila Children's Hospital’s Gait Laboratory in the UAE.

Difficulty: Moderate

Language Understanding through the Integration of Large Language Models and Knowledge Graphs

Radu-Casian Mihailescu

Dubai

Details

Natural Language Processing (NLP) has seen transformative progress with the rise of large language models (LLMs), powering applications like chatbots, translation systems, and text generation. While LLMs demonstrate impressive linguistic capabilities, they remain susceptible to factual inaccuracies ("hallucinations") and often lack grounding in domain-specific knowledge. To address these limitations, knowledge graphs (KGs) offer a promising solution by providing structured, interpretable representations of real-world knowledge. Conversely, LLMs can assist in constructing, validating, and enriching KGs. This dissertation explores the complementary strengths of LLMs and KGs to build more reliable and context-aware NLP systems.

Difficulty: Moderate

Sentiment-Augmented Deep Learning for Anomaly Detection in Cryptocurrency Markets

Radu-Casian Mihailescu

Dubai

Details

Cryptocurrency markets are highly volatile, sentiment-driven, and difficult to model using traditional anomaly detection methods. Existing statistical and price-based techniques often fail to adapt to rapid regime shifts and subtle behavioural patterns. With the rise of deep learning and natural language processing, particularly large language models (LLMs), new opportunities have emerged to enhance market surveillance and risk detection. Research Problem: How can deep learning models be augmented with sentiment signals derived from LLMs to improve the detection of anomalies and regime shifts in cryptocurrency markets?

Difficulty: Moderate

Data-Driven Prediction and Categorization of Student Performance for Educational Decision Support

Radu-Casian Mihailescu

Dubai

Details

As educational institutions increasingly adopt data-driven strategies, there is growing interest in leveraging predictive analytics to improve student outcomes. Understanding and forecasting student performance enables timely interventions and informed decision-making. However, existing methods often lack scalability, adaptability, or precision when applied in dynamic educational environments. Research Problem: How can student performance be predicted and categorized accurately using data-driven approaches to support early intervention and strategic planning in educational settings?

Difficulty: Moderate

Enhancing Efficient Reasoning in Smaller Language Models through Distillation Techniques

Radu-Casian Mihailescu

Dubai

Details

Large Language Models (LLMs) have demonstrated impressive performance in complex reasoning tasks. However, their high computational cost and resource demands pose significant barriers to widespread deployment. Knowledge Distillation (KD) has emerged as a promising solution, transferring knowledge from large models to smaller, more efficient ones. Despite this, traditional KD methods often focus solely on final outputs and neglect the intermediate reasoning steps that contribute to an LLM’s performance. Research Problem: How can the reasoning capabilities of large language models be effectively transferred to smaller models in a resource-efficient manner, especially in scenarios with limited annotated data?

Difficulty: Moderate

Efficient Strategies for Small-Object Detection in UAV Imagery

Radu-Casian Mihailescu

Dubai

Details

Small-object detection in unmanned aerial vehicle (UAV) imagery is critical for numerous real-world applications, including search-and-rescue, traffic monitoring, and environmental surveillance. However, these tasks are challenging due to the small size of target objects, cluttered backgrounds, and the limited resolution of aerial images. Conventional multi-scale fusion techniques improve detection but often introduce computational complexity and degrade fine visual details—making them less suitable for deployment on resource-constrained UAV platforms. Research Problem: How can small-object detection in UAV imagery be improved to balance accuracy, efficiency, and robustness in visually complex and resource-limited environments?

Difficulty: Moderate

Rethinking Evaluation Strategies for Artificial General Intelligence:

Radu-Casian Mihailescu

Dubai

Details

Evaluating progress toward Artificial General Intelligence (AGI) is a fundamental yet unresolved challenge. Traditional approaches rely heavily on synthetic benchmarks inspired by human intuition about intelligence. However, such tests often fail to capture the full complexity of general intelligence or demonstrate real-world applicability. This dissertation proposes a shift in evaluation philosophy—from abstract proxies of intelligence to competence-based assessment grounded in practical task performance and deployment readiness. Research Problem: How can we design more meaningful and reliable evaluation strategies for AGI systems that reflect general competence rather than intuitive or synthetic benchmarks?

Difficulty: Moderate

Understanding and Predicting Delayed Generalization in Neural Networks

Radu-Casian Mihailescu

Dubai

Details

The phenomenon of "grokking"—the sudden, delayed onset of generalization in overparameterized neural networks—has recently attracted attention for its theoretical significance and practical implications. Despite achieving low training error early on, models may not generalize until significantly later, posing challenges for interpretability and training efficiency. This dissertation explores the underlying mechanisms behind grokking and investigates how measurable factors contribute to the emergence of generalization. Research Problem: What mechanisms drive the delayed generalization phenomenon in neural networks, and how can we predict and harness this behavior to improve training efficiency?

Difficulty: Moderate

AI-Powered Repeat Sentence Speaking Practice Platform

ai speaking interactive

Chit Su Mon

Malaysia

Details

Implement AI algorithms to accurately recognize and analyze the user's repeated sentences. Use models like Google's Speech-to-Text, DeepSpeech, or other open-source alternatives. Integrate NLP techniques to assess grammar, vocabulary usage, fluency, and pronunciation. Develop a scoring system to provide feedback based on these parameters.

Difficulty: Variable

AI-ReadSmart: Personalized Voice-Assisted News Website

voice integration text-to-speech web development

Chit Su Mon

Malaysia

Details

A web app that uses a voice assistant (e.g., using Google Text-to-Speech or Azure Speech) to read and summarize daily news articles tailored to user preferences.

Difficulty: Moderate

Designing virtual campus tour for Heriot-Watt University Malaysia

usability-centered design hci virtual campus

Chit Su Mon

Malaysia

Details

Design and develop a virtual, interactive campus tour to help prospective students, parents, and new enrollees explore Heriot-Watt University Malaysia (HWUM) remotely.

Difficulty: Moderate

Deep Learning-based Image Recognition for Medical Diagnostics

deep learning cnn medical diagnosing image recognition

Mahmoud Mousa

Dubai

Details

This project focuses on leveraging the power of deep learning techniques for medical image recognition and diagnosis. The project will involve collecting or utilizing existing medical image datasets, preprocessing the images, and training a CNN architecture, for example, to learn meaningful representations and features from the images. The model will then be evaluated on a separate test set to assess its performance in detecting and classifying diseases or abnormalities accurately.

Difficulty: Moderate

Proposing approximation algorithms for NP-complete problems such as Knapsack Problem.

optimal algorithms approximation algorithms np-complete knapsack problem optimisation

Mahmoud Mousa

Dubai

Details

The goal of this project is to study an NP-complete problem such as Knapsack problem and proposing algorithms to find optimal and approximate solutions for this problem. The optimal algorithms for NP-complete problems usually take a long time, could be exponential complexity, to generate optimal solutions, especially for hard instance datasets. The aim is to design approximate algorithms to find near optimal answers in polynomial time and compare the performance of the suggested approximation algorithm(s) to the optimal one(s).

Difficulty: Moderate

Building Trust and Reputation System using Blockchain and Deep learning Approaches

trust and reputation systems blockchain artificial intelligence smart contracts

Mahmoud Mousa

Dubai

Details

The goal is to use the blockchain as a trustless platform to create a trust and reputation system to assess several services based on questionnaires. Those questionnaires will be filled by different users. The aim is to calculate the digital trust values for each service which reflects the users' satisfaction levels. You can use subjective logic or deep learning approaches written on smart contracts over the blockchain to extract the users' opinions which update the trust and reputation system you build.

Difficulty: Moderate

Sign language Recognizer for English language

sign language recognizer computer vision deep learning

Mahmoud Mousa

Dubai

Details

You will need to collect videos of people signing and their corresponding text transcriptions or use an open-source dataset from the web. You can then use this data to train a machine learning model to recognize sign language. This project will be done on two stages: The first stage is to use images to train your model so that it recognizes the sign language represented by each picture. Next, you need to consider inputting videos, selecting frames, processing it and outputting the recognized English description. Furthermore, the output description could be translated into voice using known APIs.

Difficulty: Moderate

Handwritten Text Recognition

handwritten text recognition deep-learning image recognition cnn

Mahmoud Mousa

Dubai

Details

Handwritten text recognition is a challenging task due to the complexity of the script and diversity of handwritings. However, deep learning techniques have enabled significant progress recently. To develop an HTR model, researchers collect a large dataset of handwritten text images written in specific language and corresponding transcripts. You preprocess the images to improve accuracy and extract features from the images. Then you train a deep learning model like CNNs or RNNs on the features. The trained model is evaluated on a test set to measure accuracy. Once developed, the HTR model can be deployed in real-world applications like mobile apps or web services to recognize handwritten text.

Difficulty: Moderate

Generic Computer Vision Project

Mahmoud Mousa

Dubai

Details

Problems similar to the following could be considered. - emotion detection for masked/unmasked faces. - Object recognition - Face recognition for masked faces.

Difficulty: Moderate

A comparative study on Signature Recognition

Mahmoud Mousa

Dubai

Details

the problem involves analyzing a person's handwritten signature from a dataset to verify its identity. The problem could be solved by involving Feature Extraction, Template Creation, and Matching approaches.

Difficulty: Easy

Generic Data Analysis Project

Mahmoud Mousa

Dubai

Details

Problems similar to the following could be considered. - Energy Consumption Analysis - Climate Data Analysis - Healthcare Analytics

Difficulty: Moderate

Machine Learning Based Malware Detection

Ali Muzaffar

Dubai

Details

Difficulty: Variable

Android Application Analysis Tool

Ali Muzaffar

Dubai

Details

Difficulty: Challenging

Online Learning Based Malware Detection

Ali Muzaffar

Dubai

Details

The project will be based on Online (stream) learning-based malware detection. Focus may be on Windows, Linux, or Android-based malware detection.

Difficulty: Variable

Customer Profiling and Sentiment Analysis for E-commerce customers

Ali Muzaffar

Dubai

Details

Difficulty: Moderate

LLM based malware detection

Ali Muzaffar

Dubai

Details

The project will explore the use of LLM in malware detection. The focus can be on Windows, Android or Linux based malware detection.

Difficulty: Variable

Android application dataset exploration

Ali Muzaffar

Dubai

Details

Difficulty: Variable

Sports based AI project

Ali Muzaffar

Dubai

Details

This can vary in different applications of AI in sports.

Difficulty: Variable

Tracing of Phishing Calls Using Voice Forensics and Network Metadata using ML

Ali Muzaffar

Dubai

Details

Difficulty: Variable

Faster LTL to Parity Automata Translation for Faster Rational Verification

formal verification model checking multiagent systems game theory

Muhammad Najib

Edinburgh

Details

Rational verification is the problem of checking whether a given temporal logic formula Ï• is satisfied in some or all game-theoretic equilibria of a multi-agent system. EVE (Equilibrium Verification Environment) is a tool for rational verification. In this project, you will modify EVE (https://github.com/eve-mas/eve-parity) to work with faster LTL to Parity Automata translator, e.g., Owl (https://owl.model.in.tum.de/).

Difficulty: Moderate

Faster Parity Games Solver for Faster Rational Verification

formal verification model checking multiagent systems game theory

Muhammad Najib

Edinburgh

Details

Rational verification is the problem of checking whether a given temporal logic formula Ï• is satisfied in some or all game-theoretic equilibria of a multi-agent system. EVE (Equilibrium Verification Environment) is a tool for rational verification. In this project, you will modify EVE (https://github.com/eve-mas/eve-parity) to work with faster parity games solver, e.g., Oink (https://github.com/trolando/oink). EVE currently uses PGSolver (https://github.com/tcsprojects/pgsolver).

Difficulty: Moderate

Rational Verification for Mean-Payoff Games

formal verification model checking multiagent systems game theory

Muhammad Najib

Edinburgh

Details

Rational verification is the problem of checking whether a given temporal logic formula Ï• is satisfied in some or all game-theoretic equilibria of a multi-agent system. EVE (Equilibrium Verification Environment) is a tool for rational verification. Currently, EVE only support games with LTL objectives. In this project, you will extend EVE to support games with mean-payoff objectives. In particular, you will do the following - Extend EVE (https://github.com/eve-mas/eve-parity) to multi-player mean-payoff games - Implement algorithm in [1] to solve relevant decision problems - Integrate mean-payoff solver to EVE, e.g., https://github.com/romainbrenguier/MeanPayoffSolver Reference: [1] Gutierrez, Julian, et al. "On computational tractability for rational verification." International Joint Conferences on Artificial Intelligence, 2019.

Difficulty: Challenging

Game-theoretical Analysis of Multi-Agent Systems with PRISM-games

Muhammad Najib

Edinburgh

Details

PRISM-games is an extension of PRISM, designed for the verification of probabilistic systems. These systems can incorporate either competitive or collaborative behaviour, modelled as stochastic multi-player games. In this project, you will explore a realistic scenario of a multi-agent system and model it using PRISM-games. Subsequently, you will conduct an analysis of its game-theoretical properties, such as Nash equilibria.

Difficulty: Variable

XAI for Explainable Rational Synthesis

Muhammad Najib

Edinburgh

Details

EVE (https://github.com/eve-mas/eve-parity/) is a tool that can be used to synthesise strategies in multi-agent systems, which are modelled as concurrent games. These synthesised strategies can be viewed as conditional plans for the agents. In this project, you will employ approaches from explainable planning (XAIP) to elucidate the synthesised strategies, thereby enhancing their comprehensibility.

Difficulty: Moderate

Any appropriate topic

Muhammad Najib

Edinburgh

Details

I am open to discussing any topic of interest, as long as it is appropriate and within scope. Feel free to reach out to me to explore potential topics.

Difficulty: Variable

Experimenting quantum algorithms using IBM quantum devices

quantum computing qiskit

Kai Lin Ong

Malaysia

Details

Student will have the opportunity to demonstrate mastery of the following: - Fundamental concepts of quantum computing and selected quantum algorithms. - The quantum workflow of running IBM devices. - Quantum circuit design in simulating quantum algorithms efficiently using IBM devices. - Post-processing and result visualization

Difficulty: Variable

Quantum walks and their applications

quantum walks search algorithms quantum computing

Kai Lin Ong

Malaysia

Details

Random walks are stochastic processes defined on some mathematical state space, consisting of a sequence of steps described by independent identically distributed random variables. Numerous algorithms designed using classical random walks were developed where each has its areas of importance, contributing to the emergence of various applications in computing and technological fields. Quantum walks are quantum analogues of classical random walks. The core difference is that unlike classical random walks where randomness arises from the transition probability between different states, quantum walks exhibit randomness via quantum mechanical properties such as superposition and the measurement postulate. This resulted in them to have greater advantage over its classical counterpart, for instance, with exponentially faster hitting times. This project is devoted to studying quantum walks comprehensively and their applications in a selected field, supported by some simulations using Qiskit Python or other suitable programming. Some well-known applications are in developing search algorithms and quantum cryptography.

Difficulty: Challenging

🟢 📦 Entrepreneurship: Develop your own idea into a product! 🟢

entrepreneurship design and development product development

Stefano Padilla

Edinburgh

Details

This project is open for you to develop any idea into a product. Note this project does not only include development, you are also required to: * Research an issue (possible solutions, tools, etc), * Possibly test the idea in a focus group, * Design and draft requirements for the product. * Plan development, * Develop the idea into a product, * Test and evaluate the product. As this is quite an open project, so please do drop me an email to discuss the suitability of your idea as a project.

Difficulty: Challenging

🟢 🕹️ Game Design and Development: My own wonderful level! ^_^ 🟢

game design game developement games development

Stefano Padilla

Edinburgh

Details

Have an idea of a game level you would like to design and develop from scratch? If yes, then this might be your project. This project is open for you to develop any level... But, please note this project does not only include development, you are also required to: * Research level design (theory, possible solutions, tools, etc), * Possibly test your level idea in a focus group, * Design and draft requirements for the level. * Plan development, * Develop the level using your preferred tech, * Test and evaluate your level. As this is quite an open project, so please do drop me an email to discuss the suitability of your idea as a project. Finally, please note, your Graphics and Games courses run a bit too late, so you will need to learn some tools and concepts quite early in the project cycle.

Difficulty: Challenging

Artificial Immune Systems

bio-inspired computing machine learning deep learning

Wei Pang

Edinburgh

Details

The human immune system can effectively protect us from viruses and other invaders. Algorithms inspired by this have been developed, the so-called artificial immune systems (AIS) You are free to explore any ideas or existing immune-inspired algorithms to solve any problems you like. Previous projects include using AIS for the following problems (you can do similar but not the same, or select any problems that interest you): 1. Searching best neural network architecture 2. Playing games 3. Make Machine Learning fair 4. Anomaly detection 5. Cybersecurity

Difficulty: Variable

Machine Learning for Understanding Evolution of Topics and Public Attentions

Wei Pang

Edinburgh

Details

We try to understand topics and public concerns over time from news media or other related documents (such as patent data) and gain insights into the evolution of topics and public attention on the circular economy or green innovations. So we need to use machine learning models (e.g. dynamic topic modeling and dynamic clustering) to understand how the public's opinions are changing over time. Specifically, this is related to an EPSRC project DCEE (https://dcee.org.uk/) with Imperial and Loughborough (https://gow.epsrc.ukri.org/NGBOViewGrant.aspx?GrantRef=EP/V042432/1). One task is to understand public perception of the electrochemical circular economy through social media or large amounts of online texts. For example, what are people's views and concerns on sustainable chemical products?

Difficulty: Moderate

Immune-inspired Algorithm for robust and secure machine learning

machine learning deep learning artificial immune systems robust ml safe ml

Wei Pang

Edinburgh

Details

Have you heard of the one-pixel attack? https://arxiv.org/abs/1710.08864 It can fool advanced deep learning algorithms by only changing only one pixel of an image. Recently Sparse attack is becoming more and more popular: https://openaccess.thecvf.com/content/CVPR2023/papers/Williams_Black-Box_Sparse_Adversarial_Attack_via_Multi-Objective_Optimisation_CVPR_2023_paper.pdf In this project, we will use immune-inspired algorithms to develop attacking methods in order to fool existing deep learning algorithms, which can raise awareness of security and safety for machine learning. We wil aslo develop protection methods to protect machine learning algorithms from potential attacks, and this will also be inspired by immune systems or evolutionary algorithms.

Difficulty: Variable

Swarm Intelligence and Its applications

Wei Pang

Edinburgh

Details

This is an open research question, and you can use swarm to solve any problems that you are interested.

Difficulty: Easy

Graph Diffusion Models

graph neural networks diffusion models molecular property prediction

Wei Pang

Edinburgh

Details

This project will investigate cutting-edge diffusion models on graphs, and its applications on molecular property prediction, graph classification etc.

Difficulty: Challenging

Large Language Model based Multi-agent framework applied in Finance

large language models multi-agent systems

Wei Pang

Edinburgh

Details

We will develop an agenetic framework powered by large language models. In this framework, there will be multiple large language models interacting each other to simulate the players in the finance market. You may get advice from a company which specialised in designing green finance market.

Difficulty: Moderate

Collaborative Large Language Models for Problem Solving

large language models multi-agent systems swarm intelligence evolutionary computing

Wei Pang

Edinburgh

Details

We will build a system in which several large language models can collaborate with each other and solve real-world problems, such as optimisation and scheduling. We will use evolutionary computing or swarm intelligence to optimise how these models interact and collaborate. See this Neurips paper: https://neurips.cc/virtual/2024/poster/96692

Difficulty: Moderate

Evolving LLM-based multi-agent based system

llm evolutionary computing multi-agent system

Wei Pang

Edinburgh

Details

Inspired by Google's AlphaEvolve, we aim to investigate ho evolutionary computing can be used to optimise the structure and workflow of LLM multi-agent systems for complex problem solving

Difficulty: Challenging

LLM-based Artificial Immune Systems

llm immune systems alphaevolve

Wei Pang

Edinburgh

Details

Each LLM is treated as an antibody in the artificial immune system, and how can we use immune-inspired computing paradigm to build LLM multi-agent systems, just like using evolutionary computing, as in Google's AlphaEvolve: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

Difficulty: High

Open Set Recognition for Industrial Anomaly Detection

deep learning open set recognition anomaly detection

Wei Pang

Edinburgh

Details

This research project aims to explore and apply Open Set Recognition (OSR) techniques to the field of Industrial Anomaly Detection (IAD). The goal is to address the limitations of existing industrial quality inspection systems when confronted with unknown and novel defects. Traditional anomaly detection methods typically operate under a "closed-set" assumption, meaning all possible defect types are known during the training phase. However, in real-world industrial production environments, due to the complexity and dynamic nature of manufacturing processes, new and unforeseen defect types constantly emerge. This project will tackle this challenge by researching and developing a deep learning model capable of effectively identifying known defects while simultaneously rejecting unknown defect types.

Difficulty: High

Predicting and Optimizing Rental Prices in Dubai Using Generative AI

gen-ai rental price prediction

Andres Barajas Paz

Dubai

Details

The idea is to explore the use of Gen-AI models for more accurate rental price prediction, potentially combined with optimization techniques to support smarter pricing strategies in the real estate market.

Difficulty: Moderate

A Hybrid Model for Predicting Insurance Claims Using GLM and Neural Networks

machine learning neural network car insurance claims glms

Andres Barajas Paz

Dubai

Details

In this project, the objective is to explore a hybrid machine learning model that combines a Generalized Linear Model (GLM) with a neural network to predict car insurance claims. GLMs are commonly used in insurance because they are easy to understand, but they may miss complex patterns in the data. Neural networks can find those patterns but are harder to explain. By combining both, the goal is to build a model that is accurate and still somewhat explainable. The model will be tested on an insurance dataset and compared with a regular GLM and a standalone neural network.

Difficulty: Challenging

Attitudes towards artificial intelligence in society

Ron Petrick

Edinburgh

Details

Investigate current attitudes towards artificial intelligence (AI) in different parts of society, including the general public, media, and AI practitioners. This project will involve a survey of recent research literature on AI trends as well as the portrayal of AI in the popular media. Building on recent studies in other parts of the world, interviews will be conducted with members of the public to understand concerns about recent trends towards wider deployment of AI. This project may also involve the creation of small AI artefacts (e.g., such as chatbots) to help guide the study.

Difficulty: Variable

Combined machine learning and automated planning

Ron Petrick

Edinburgh

Details

This project will explore the use of modern machine learning techniques (e.g., deep learning, reinforcement learning, etc.) for different problems in automated planning. While automated planners are good at making goal-directed plans of action under many challenging conditions, the addition of machine learning tools to the process could lead to optimisations in terms of more efficient planning or higher quality plans. Also, some symbolic aspects of the planning problem (e.g., action specification in PDDL) could be learnt by using machine learning techniques. Applications of this task will be applied in planning scenarios such as robot control or human-machine interaction.

Difficulty: Moderate

Large language models for automated planning

Ron Petrick

Edinburgh

Details

This project will explore the use of large language models (LLMs), vision language models (VLMs), and/or foundation models for different problems in automated planning. While automated planners are good at making goal-directed plans of action under many challenging conditions, the addition of LLMs/VLMs can help improve or replaces parts of the planning process. Some symbolic aspects of the planning problem (e.g., action specification in PDDL) could be integrated with such techniques. The approach will be demonstrated in challenging planning tasks, such as robot control or human-machine interaction.

Difficulty: Moderate

Plan-based artificial intelligence and games

Ron Petrick

Edinburgh

Details

This project will explore the use of artificial intelligence techniques in games, with a particular focus on automated planning and related approaches. The student will survey the state-of-the-art in some aspect of AI game playing and focus on applying new AI algorithms to a game environment. Such techniques could be used to control artificial game players or automate particular aspects of the game environment (e.g., board layout, puzzle creation, move suggestion, etc.). This project will involve a significant amount of software development for AI techniques and the game environment.

Difficulty: High

Plan-based explainability in artificial intelligence systems

Ron Petrick

Edinburgh

Details

This project will explore the problem of explainability in AI systems (XAI) using automated planning tools (XAIP). Automated planners provide a causal model of states, actions, and plans which will serve as the underlying framework for explaining agent behaviour in particular circumstances. New approaches to plan explainability will be explored and implemented using existing planners that may be augmented and tested using representative planning domains.

Difficulty: Moderate

Human-in-the-loop automated planning

Ron Petrick

Edinburgh

Details

Automated planning systems are good at generating goal-based plans of action but typically require a fixed model. Human-in-the-loop planning enables a user to introduce constraints into the planning process which affects the plans that are generated. This project will build a suitable interface for an off-the-shelf automated planner to enable certain types of constraints to be specified by the user (e.g., preferences, temporal, goal ordering constraints) and control the planning/replanning process.

Difficulty: High

Simulation and visualisation environments for planning

Ron Petrick

Edinburgh

Details

This project will explore the problem of producing or extending simulation or visualisation environments for automated planning. Automated planners provide a causal model of states, actions, and plans which will serve as the underlying framework for the environment to be simulated/visualised. This project may explore new approaches to plan simulation/visualisation or extend existing systems (e.g., PDSim). Testing will be done using representative planning domains.

Difficulty: High

Post-quantum Cryptography

cyber security cryptography

Sasa Radomirovic

Edinburgh

Details

Governments are increasingly pushing the industry to transition to post-quantum cryptography. Several new such algorithms have been standardized, but they are still immature in comparison to the current standard public key cryptosystems. A variety of attacks have been announced that target some of the standardized algorithms. This project aims to implement and demonstrate some of the announced attacks on post-quantum crypto algorithms. It is for students interested in the mathematics of post-quantum cryptography.

Difficulty: Challenging

Your own project in Cyber Security

cyber security cryptography authentication network protocols digital forensics formal modeling formal verification

Sasa Radomirovic

Edinburgh

Details

If you have an idea for a project related to Cyber Security get in touch! Your idea does not have to be very mature or precise. I am happy to chat about it and interactively refine your idea into a possible project that you can choose to carry out. You can also email me by writing something like "I would like to do a cyber security project, but don't know what. My preference would be a project that ( does / does not ) require ( coding / user study / mathematics / cryptography / network protocols / AI / ... )"

Difficulty: Variable

Transition to post-quantum cryptography

cyber security cryptography

Sasa Radomirovic

Edinburgh

Details

Governments are increasingly pushing the industry to transition to post-quantum cryptography (PQC). Several PQC schemes have been standardized, but they are still immature in comparison to the current standard public key cryptosystems. For this reason, the new PQC schemes are used together with public key cryptosystems in a hybrid mode that ensures the scheme is at least as secure as the stronger of the two components. The implementation of such schemes is tricky for several reasons and is just one of the challenges that the industry is facing in the transition to PQC. This project aims to overcome some of the challenges the industry is facing. The first objective is to get a better understanding of these challenges. The student can then select one of them and work towards finding a solution. The project can be of a technical or non-technical nature.

Difficulty: Variable

Gamified User Account Management

cyber security authentication

Sasa Radomirovic

Edinburgh

Details

Password managers facilitate the use of strong passwords and make it easier to avoid password reuse, which are two critical defenses against account breaches. However, people do not like to use password managers as there is no obvious day-to-day benefit. There is a significant initial cost to starting to use a password manager - passwords for each and every account need to be entered into the account manager. Where's the fun in that? People are too busy and too tired to spend time on yet another complicated app and they're only days away from that 2000 day streak on Duolingo. This project aims to change that by redesigning and implementing a prototype password and account management tool that people actually want to use.

Difficulty: Moderate

User Account Management

cyber security authentication

Sasa Radomirovic

Edinburgh

Details

Password managers facilitate the use of strong passwords and make it easier to avoid password reuse, which are two critical defenses against account breaches. However, this is not sufficient. Our accounts can also be accessed without a password, for example by going through password recovery, a biometric authentication, using a passkey, etc. To keep our online accounts secure, we need more than a password manager, we need an account manager. In this project you can design and build your own account manager or extend an open source password manager to become an account manager, or extend an existing prototype implementation of an account manager.

Difficulty: Moderate

A Security Framework for BitTorrent

Hani Ragab

Dubai

Details

BitTorrent is the de-facto peer-to-peer (P2P) standard. Unfortunately, its lack of security has led to it being mainly used for illegally exchanging gray content (e.g. copyrighted materiel). The objective of this project is to add authentication, authorisation, confidentiality and data integrity to BitTorrent.

Difficulty: Challenging

AI for Video Games

Hani Ragab

Dubai

Details

To be discussed face-to-face. You can get inspired by reading Saarah's dissertation, which was ranked second at an international British Computer Society competition http://www.macs.hw.ac.uk/cs/project-system/projectdata/archive/2019/ugcse/sw36_full_text.pdf

Difficulty: Challenging

AI-based Dominoes Player

Hani Ragab

Dubai

Details

The project covers a few topics, including: - Possible use of deep reinforcement learning to control the player - Potential for covering the interface with the real world by, e.g., doing image acquisition of what a player has in their hands + dominoes that were played to feed them into the AI.

Difficulty: Challenging

Anti-Malware Software

Hani Ragab

Dubai

Details

Existing anti-malware products struggle to keep up with the hundreds of thousand of malware that appear on daily basis. Previous research in our University have obtained interesting results by applying machines learning techniques. In this project, you design and build a prototype of an anti-malware that uses those results. The anti-malware can be built by either writing it from scratch, or by adding rules/signatures into an existing anti-malware (e.g. ClamAV). Arjun Rajeev did implement a first functional version of this software. You will add features and possibly re-implement some.

Difficulty: Variable

Attendance Monitoring System

Hani Ragab

Dubai

Details

The objective of this project is to provide an easy-to-deploy, secure and scalable student attendance monitoring system. The new system should be able to automatically collect student attendance information (e.g. by using NFC, fingerprint readers, or face recognition) without an (major) intervention from the lecturer. The lecturer will still have a fail-over interface where they can enter/edit attendance information. The system will be able to generate automatic attendance notification emails to the school administration office (e.g. when a student misses 3 lectures in a raw).

Difficulty: Variable

Blockchain and Bitcoin Applications

Hani Ragab

Dubai

Details

Blockchain can be defined as an online distributed ledger, with the possibility to store virtually any information on it. Cryptocurrencies, such as Bitcoin, are based on blockchains. We aim to investigate the use of blockchains to different sectors of activity, e.g - Medical records and other health-related applications. - Real-estate property management (and similar applications, such as car property management) - Data Science students might want to investigate available datasets about blockchain and the different cryptocurrencies (e.g. Bitcoin) - Surprise me !

Difficulty: Variable

Building Ethical Hacking Tools (with AI?)

Hani Ragab

Dubai

Details

The ethical hacking tool could perform one or more attack types, including attacks on TCP/IP and common networking protocols, websites, Windows, Android, Ubuntu and any other software/system. It also could use any technique, such as (D)DoS, injection (e.g., SQL), overflow (e.g., buffer and heap), etc. The tool could be for any purpose related to ethical hacking, including reconnaissance (e.g., identifying open ports and available services), deceiving end-users for all sorts of social engineering, building exploits and using them, maintaining access on a system (e.g., through backdoors, remote admin tools, steganography). It would be interesting to use AI to automate or improve the hacking tool in general. For example, an AI system could be used to automatically find vulnerabilities, generate obfuscated malware, or determine which attack to carry out depending on the target. I am also interested in upgrading existing tools with new capabilities; most of them are open source and available on github. Potential challenges (depend on the attack and target): parallelisation of the attack (e.g., port enumeration), training machine learning models, identifying vulnerabilities. Note: - Multiple students could work on different hacking tools in parallel. - Programming languages: C, Python, Ruby, assembly, ... (but not Java!) - The project's exact level on difficulty will depend on the agreed aim and objectives.

Difficulty: Variable

Data Synchronisation using BitTorrent

Hani Ragab

Dubai

Details

The objective of this project is produce a file synchronisation system based on BitTorrent. The system will provide functionalities for saving and retrieving files as well as synchronising them across the devices of files owners.

Difficulty: Challenging

Design and Implementation of an Authorisation (Access Control) System

Hani Ragab

Dubai

Details

There are two types of certificates: X.509 and PGP. Public key certificates are used to identify users. Privilege certificates, on the other hand, are used to define access rights are privileges. X.509 PMI is an example implementation of privilege certificates. Privilege certificates can reliably be used in auhtorisation systems to grant/deny access to resources. Permis (https://en.wikipedia.org/wiki/PERMIS) is an example of authorisation system. Our objective is create an authorisation system. Implementation: Programming language: C, C++ or Python OS: Linux (Preferably CentOS), Unix. F21CN Computer Network Security (or equivalent course) is a pre-requisite for MSc students and co-requisite for Hons students.

Difficulty: Challenging

Exoplanet Detection

Hani Ragab

Dubai

Details

This project builds on previous projects I supervised to use several ML techniques (mostly deep learning) to detection exoplanets.

Difficulty: Challenging

Exoplanet detection using machine learning

Hani Ragab

Dubai

Details

Exoplanets are planets that orbit other stars. 3000+ exoplanets have been detected so far, and there are more to come! The most successful exoplanet detection technique so far (with 2000+ detected planets) is transit shape detection. This technique looks for drops in light intensity of a star when its planet comes between the star and earth observers. The challenge here is that there are 150K+ stars being observed and this is too many to be processed by humans. Machine learning can be applied to efficiently detect such drop in stars brightness without human intervention. This project builds on a previous MSc project done under my supervision in 2016/17.

Difficulty: Challenging

Fake News Detection

Hani Ragab

Dubai

Details

To be discussed in a face-to-face meeting.

Difficulty: Challenging

Machine Learning API

Hani Ragab

Dubai

Details

Several machine learning algorithms exist and can be used to, e.g. predict the value of an output based on inputs. The input is a matrix. Sparse matrices are matrices whose elements are mostly zeroes. Not taking that fact into account results in sub-optimal manipulation of the matrix and a waste of CPU time, RAM and storage. Our objective is to: - Build a library that implements one or more feature selection mechanisms - Implement one or more machine learning algorithms - (Optionally) parallelise computations. - (Optionally) integrate our library in R. Programming Language: - C, C++ - (Optionally) Assembly - Certainly not Java :) Some Wikipedia Reading: - Sparse matrices: https://en.wikipedia.org/wiki/Sparse_matrix - Feature selection: https://en.wikipedia.org/wiki/Feature_selection - Machine learning: https://en.wikipedia.org/wiki/Machine_learning The project can be taken a group of students where each of them will be working on a particular component of the API.

Difficulty: Variable

Machine Learning for Android Malware Analysis

Hani Ragab

Dubai

Details

This project will review existing works for android malware detection. It will then investigate how to apply machine learning techniques to it. This will include identifying possible features that can characterise malware (e.g. subsets of binary code) then applying suitable techniques to them. The following books are in the library, you might want to have a look at them beforehand: - https://www.nostarch.com/malware, - https://www.nostarch.com/androidsecurity, - https://www.packtpub.com/big-data-and-business-intelligence/building-machine-learning-systems-python-second-edition

Difficulty: Variable

Machine Learning for Linux Malware Detection

Hani Ragab

Dubai

Details

This project will review existing works for Linux malware detection. It will then investigate how to apply machine learning techniques to it. This will include identifying possible features that can characterise malware (e.g. subsets of binary code) then applying suitable techniques to them. The following books are in the library, you might want to have a look at them beforehand: - https://www.nostarch.com/malware, - https://www.packtpub.com/networking-and-servers/learning-linux-binary-analysis, - https://www.packtpub.com/big-data-and-business-intelligence/building-machine-learning-systems-python-second-edition

Difficulty: Variable

Machine Learning for Windows Malware Analysis

Hani Ragab

Dubai

Details

This project will review several existing machine learning-based malware analysis and critically appraise them. A comparative study would allow to draw conclusions on the suitability of the different techniques for malware analysis. The following books are in the library, you might want to have a look at them beforehand: - https://www.nostarch.com/malware, - https://www.packtpub.com/big-data-and-business-intelligence/building-machine-learning-systems-python-second-edition

Difficulty: Challenging

Machine Learning/Data Mining Applications

Hani Ragab

Dubai

Details

The project will be about a dataset(s) of your choice (e.g. from your work, kaggle.com) with an objective to uncover hidden patterns in it. Specific details to be discussed.

Difficulty: Variable

Pandemic Data Analysis

Hani Ragab

Dubai

Details

Details to be discussed.

Difficulty: Challenging

Self-driving Drones/Cars

Hani Ragab

Dubai

Details

Will provide required hardware. Details of the project to be discussed in a face-to-face meeting

Difficulty: Challenging

Smart Homes

Hani Ragab

Dubai

Details

Several topics can be done here. Just as an example, enable mobility of users, both at intra-house and inter-house levels. The location of the user inside the house would allow the system to adjust the house parameters to the user. For example, when the user moves from room 1 to room 2, i.e. intra-house movement, the music in room 1 is switched off, and it is switched on in room 1. When the user visits their neighbours, i.e. inter-house movement, non-confidential parts of their profile can be automatically uploaded to their destination to personalise environment parameters, e.g. A/C temperature (if the host policy allows it).

Difficulty: Variable

[New] Cybersecurity in Robotics

Hani Ragab

Dubai

Details

There are several cybersecurity challenges in robotics and we can work on one of them: Secure Communication: communication could be either with other robots (e.g., in a swarm scenario) or with controllers or base stations; compromising those communications can lead to disasters. Intrusion Detection Systems (IDS): since the traditional IDS are made for standard TCP/IP networks, expecting a different type of traffic. Developing suitable IDS using AI would add an excellent layer of protection. Data Privacy and Ethics: robots can get access to sensitive data, especially those used in medical context or personal assistance. It is crucial that this data is only accessible by authorised individuals and according to a pre-defined policy. AI and Machine Learning Security: many robots are controlled (partially or entirely) by AI systems. It is important to protect those systems against adversarial attacks that could manipulate the AI controller of a robot.

Difficulty: Challenging

Application user study

user study applications qualitative research quantitative research

Dave Robb

Edinburgh

Details

This project is open to students with an interest in or idea involving how a particular population or group use a particular technology or application. Please contact me to discuss your ideas before selecting this project.

Difficulty: Moderate

Robot failure explanations

human-robot interaction failure explanations explanation systems ai autonomous systems

Dave Robb

Edinburgh

Details

The composition of Robot failure explanations can affect system uptake, user trust, and mental models of the autonomous system they relate to. Investigate the effects of varying aspects of explanation content in some AI/Robotics context. Please contact me to discuss these ideas before selecting this project.

Difficulty: Moderate

AI explanations

human- robot interaction failure explanations explanation systems ai autonomous systems

Dave Robb

Edinburgh

Details

The amount of content in, or the completeness of an autonomous system’s explanation of its actions often has to be balanced with the accuracy or depth of the explanation. This is so as to provide users with enough information without overwhelming them. In this project you will investigate how generative AI tools could be used to achieve the required balance between comprehensiveness (covering all the issues) and depth (fully explaining each issue) for different user groups (experts or new users). If a user study is involved then full ethical approval would be required. Please contact me to discuss before selecting this project.

Difficulty: Moderate

Emotion Recognition

affective computing emotion recognition natural language processing

Dave Robb

Edinburgh

Details

Research emotion recognition systems. Evaluate open source emotion recognition tools. Develop a simple chat interface which modifies its interactions in response to detected emotions.

Difficulty: Moderate

Socially assistive robots - digitalising cognitive tests

Marta Romeo

Edinburgh

Details

Socially Assistive Robots (SARs) represent a promising technology to provide assistance outside of clinical and controlled environments. In particular, their deployment in the house of prospective users can provide effective opportunities for early detection of Mild Cognitive Impairment, a condition of increasing impact in our ageing society, by means of digitalised cognitive tests. The aim of this project is to turn classical paper and pencil tests into SAR-based applications. The final outcome will be a ROS-based application that integrates a speech module, a graphical user interface and a dialogue manager.

Difficulty: Variable

Wizard-of-Oz for robotic experiments

Marta Romeo

Edinburgh

Details

In most of human-robot interaction experiments the robot is not fully autonomous. In these cases, the investigator becomes the puppeteer, or Wizard, and controls the robot without letting the participants of the experiment realise that the robot is merely remote controlled. The wizard can control all the functionalities of the robot or take over from the robot only in specific circumstances. This is usually done for safety reasons or because, when the variables under observation are all human-dependent, it is much more important that the experiment carries on without any interruption caused by a robotic malfunctioning. To be able to remote control the robot the wizard needs to have a full picture of what is happening in the experiment room in terms of flow of interaction between the robot and the participant, inner state of the robot, possible next action to take etc. The aim of the project will be to create a modular wizard interface, possibly integrating ROS, that could be easily used under different experimental conditions.

Difficulty: Moderate

Developing a digital ethnographic tool to study AI- teenagers interaction

Marta Romeo

Edinburgh

Details

Teenagers are increasingly turning to AI systems, such as chatbots and recommendation systems, for information, guidance, and even emotional support. While these technologies offer cheap and readily available forms of assistance, they also introduce serious risks: some young users place too much trust in AI, disclosing sensitive information or uncritically accepting advice. Understanding how trust in AI develops within this population is essential for designing safe and equitable AI systems. This is particularly important for these younger demographics, as they face increasing mental health challenges yet are among the hardest to reach with targeted interventions, and are actively targeted by AI-based marketing. With this project, you will help advancing the research around AI-teenagers interactions by developing a new tool in the form of a diary, where users can report how they use AI and their personal consideration on their usage of AI-based systems. The diary will have Gemini embedded in it to further investigate whether and how it would be used in a writing task.

Difficulty: Variable

The role of expectations in the face of robots’ failures

Marta Romeo

Edinburgh

Details

Adoption of robotics solution is still a challenging problem. This is driven by the fact that the general public is exposed to robots and their applications mainly through the media, that often times exaggerate the capabilities of the platforms. This is even more true with social robots, that are expected to be our companions and everyday helpers. What happens when robots fail? Most of the users get discouraged and dismiss the technology. This project will look at how the perceived gravity of the robot failure changes with respect to the level of knowledge about the robot’s capabilities. This will be tested with a user study. Within the project you will have to: 1) define the different levels of information to give to participants before the study begins; 2) develop a task for a social robot (platform to be decided as a part of the study design) to complete together with the participants; 2) program the robot to fail within the task 3) collect and analyse the data on participants perception of the robot following the interaction.

Difficulty: Variable

Trust modelling for human-robot interaction

Marta Romeo

Edinburgh

Details

Trust is essential for successful human-human interactions and plays a major role in human-robot interactions, influencing the human’s willingness to accept information from a robot and to cooperate with it. Modelling trust could therefore be a vehicle to build more intelligent social robots. For this reason, much work has been done in trying to identify the factors defining trust and a computational model that could encapsulate the concept. Bayesian models and reinforcement learning have shown promises in this respect. Although trust evolves as the interaction evolves, in many works time is not fully taken into consideration. The objective of this project will be to dive into the literature on trust modelling to develop and test (through simulations) a cognitive architecture on the evolution of trust in human-robot interaction.

Difficulty: Moderate

Anything contributing to a free/open source software project

open source free software foss

Adam Sampson

Edinburgh

Details

I have 20+ years experience in running and contributing to free/open source software projects. If you have an idea for a project in any area that will make a substantial code contribution to an existing FOSS project, I'd potentially be interested in supervising it - I can advise on FOSS licensing, working with a FOSS community, tools and techniques normally used in FOSS development, and so on. (I'm not interested in projects related to generative AI or machine learning - please speak to other supervisors for those.)

Difficulty: Moderate

Anything related to analogue video decoding

video signal processing dsp open source history

Adam Sampson

Edinburgh

Details

I work on the ld-decode FOSS project, which captures high-quality digital video from analogue video sources such as LaserDisc and videotape. The ld-chroma-decoder tool is part of this: it extracts colour information from the video, converting the original "composite" signal into an RGB representation that can be shown on modern displays. This is a complex and computationally-expensive signal processing task. There are various ways ld-chroma-decoder could be improved: you could speed it up by making parts of it run on the GPU, or you could improve the PAL decoder by training it more effectively, or you could add support for the French SECAM video standard, or or or... there are more ideas on the ld-decode Wiki. I'd be interested in supervising anything related to this project; if working with analogue video sounds interesting to you then please talk to me.

Difficulty: High

Anything related to interactive fiction games

games interactive fiction modelling

Adam Sampson

Edinburgh

Details

Interactive fiction is one of the oldest genres of computer game. These days it includes text and graphic adventures, hypertext games, visual novel and other forms of game based on storytelling within a modelled world. You can build games from scratch in this genre (you might even have built a text adventure as a beginner's programming task at some point), but most creators use game engines such as Inform, Twine and RenPy. I'm interested in supervising projects that want to build or analyse this kind of game, or to work on tools for creators to use, or to resurrect classic games through emulation or reimplementation. Please talk to me if you'd like to work in this space.

Difficulty: Moderate

Automatic RAID bit flip correction

linux kernel raid storage

Adam Sampson

Edinburgh

Details

Modern hard disks and SSDs corrupt data at a fairly predictable rate. Integrity-checking filesystems and software RAID schemes protect against this by using various checksumming approaches to detect corrupt blocks; if a corrupt block is detected, it must be obtained from another copy of the data. However, since the most common type of corruption is a single bit being flipped, it should also be possible to try to repair a corrupt block by flipping individual bits and seeing whether the checksum is correct. Implement and evaluate this scheme inside Linux software RAID or btrfs.

Difficulty: Challenging

Better folk music metadata for MusicBrainz

music metadata audio analysis

Adam Sampson

Edinburgh

Details

If you don't like folk music, this is not the project for you. MusicBrainz is an openly-licensed public database of metadata about recoirded music - it's used by the BBC, for example, to provide various kinds of information about music on their web site. I'd like it to have better metadata for traditional/folk music - for example, cross-referencing songs to catalogues such as the Roud catalogue (songs) or thesession.org (tunes). Some of this information has been added manually, but you should be able to identify likely recordings of particular pieces based on their names and performers... and perhaps based on audio fingerprinting? (I am not interested in generative AI approaches to this project, but data mining may be worth investigating.)

Difficulty: Moderate

Certificate-encoding names for TLS web sites

tls security cryptography

Adam Sampson

Edinburgh

Details

"Ugly names" for TLS web sites. As an alternative to traditional CA infrastructure, encode cryptographic identifiers in DNS names as a mechanism for verifying certificates. This is how Tor hidden services work already - you end up with a long, awkward name, but you are no longer dependent on a fragile, expensive (and often corrupt/fraudulent) certificate authority. Implement this in OpenSSL or Firefox. This is a complex and technically challenging project, and you shouldn't choose it unless you've got some understanding of cryptography already.

Difficulty: Challenging

Customisable pointer decorator syntax for C++

c++ syntax compiler

Adam Sampson

Edinburgh

Details

Programming in modern C++ is made considerably safer by the existence of smart pointer classes such as std::shared_ptr; in most cases, these can be used as drop-in replacements for C-style pointers, avoiding many of the security and correctness faults common with pointer use. However, these are library features - there's no affordance in the language to make using them more convenient. Add support to a C++ compiler for a more convenient syntax for smart pointers, like the existing syntaxes for C-style pointers (*) and C++ references (&). For example, you might allow a shared_ptr argument to be written as "Foo% ptr". You'd need to design an appropriate syntax, modify a compiler to understand it, and evaluate whether it measurably simplifies real-world code. The "cpp2" syntax (https://hsutter.github.io/cppfront/) would be worth looking at for ideas, but the idea here is to maintain compatibility with existing C++ programs rather than designing a new, incompatible syntax.

Difficulty: High

Deterministic filesystem/archive format

filesystem security forensics

Adam Sampson

Edinburgh

Details

In a typical filesystem, the contents of the disk depends not just on the files being stored, but other factors such as the order they were written in, previously-deleted files, the size of the disk, and so on. This makes forensic analysis of disk images possible - you can extract deleted files, or tell information about how the filesystem was built. Instead, I'm proposing that for any given collection of files, there should be exactly one valid representation of them on disk - guaranteeing that no information is being accidentally leaked. You would need to design the filesystem layout and build a tool to construct and verify the filesystem. Ideally you would then build a Linux kernel filesystem to read (and maybe modify) it. The real challenge comes in making it efficient to update later on...

Difficulty: Challenging

Encrypted Git storage

git version control encryption security

Adam Sampson

Edinburgh

Details

The Git version control system is widely used, and has been extended over the years to serve various purposes - for example, it's possible to cryptographically sign a commit. It would be useful to be able to encrypt some files within a repository - e.g. if you have files containing secret keys within a project that only some contributors should have access to. You could draw ideas for this from the Git-LFS large file extension, and from encryption extensions in Linux filesystems.

Difficulty: Challenging

Family tree rendering

genealogy family tree constraints graphics

Adam Sampson

Edinburgh

Details

Given a genealogical database from a system like GRAMPS, generate a high-quality vector-graphics family tree rendering - not just ancestors or descendants of a single person, but using a constraints system to represent as much of the tree as possible in a single rendering. This is something I've prototyped before and have some ideas about, but needs redoing properly using modern graphics technologies. I can provide sample data.

Difficulty: Moderate

Fix camera faults in analogue video

video signal processing history

Adam Sampson

Edinburgh

Details

I work on the ld-decode FOSS project, which produces high-quality digital captures of analogue video sources (such as LaserDisc or videotape). It's fairly common to find examples of video where faults in the original source are visible - for example, the red/green/blue sensors in the camera are misaligned, or the image is disturbed by loud noises near the camera (microphony), or the original video was played back on a misaligned video recorder so lines are offset. These show up in commercially-released video as well. Given an understanding of the structure of the video signal, it should be possible to correct for these kinds of faults in software to substantially improve picture quality.

Difficulty: Moderate

GPU malware

security malware gpu

Adam Sampson

Edinburgh

Details

The modern GPU is a high-powered general-purpose computer system, with large quantities of memory and the ability to access parts of the CPU memory space. On APU systems, it can potentially access hardware devices too. Investigate what malicious software running on the GPU might be capable of. I'd be particularly interested in the implications for APU systems such as AMD Ryzen - can you do stealth network communication from the GPU, for example?

Difficulty: High

GPU operating system

gpu operating system kernel security

Adam Sampson

Edinburgh

Details

A modern GPU is a highly capable, multicore, general-purpose computer system, that just happens to be particularly good at vector arithmetic. But they're generally used for graphics or for offloading maths-intensive tasks from the main CPU. What would a proper operating system designed to take advantage of a GPU's architecture look like? There's plenty of existing work in operating systems consisting of communicating parallel tasks - you could look at microkernel systems like Minix, or OSs for loosely-coupled parallel architectures like HeliOS. See what you can do to enable efficient, secure (if possible!) general-purpose computing on the GPU. This is a complex project that will require low-level understanding of GPU architecture and some experience of operating system programming.

Difficulty: High

Identify reused elements in folk tunes

music audio analysis music theory

Adam Sampson

Edinburgh

Details

If you don't like folk music, this is probably not the project for you. Thousands of folk tunes (short instrumental pieces), from many different traditions, are available in easily machine-readable ABC format in online databases like thesession.org and folktunefinder.com. As folk tune authors tend to "borrow" elements of existing tunes, it should be possible to take a collection like this and identify common elements between tunes - for example, showing how a tune has moved between different traditions (e.g. Scotland/Ireland/US) and been adapted to different instruments, or how different versions of the same tune have diverged over time. You could potentially use this to build a tool for exploring a collection of tunes, by showing links between tunes that share elements of melody.

Difficulty: Moderate

Implement the game Dazzle Dart

games history retrocomputing

Adam Sampson

Edinburgh

Details

Harold Abelson's Dazzle Dart (https://dl.acm.org/doi/10.1145/1216479.1216482) was one of the earliest multiplayer video games, created at MIT in the early 1970s. It's well overdue for a remake using a modern engine! The original was constrained by the hardware it ran on and had a very abstract 2D display; you'd need to think about whether to adapt it to 3D and how to take advantage of modern controls. (A previous student did a basic 3D version using Unity, which worked pretty well, but I'd like to see a more polished version without the dependency on a proprietary game engine.)

Difficulty: Moderate

Introduce randomness into kernel compilation

linux security aslr kernel

Adam Sampson

Edinburgh

Details

The Linux kernel, like Linux userspace, takes advantage of address space layout randomisation (ASLR) to make it harder for an attacker to predict memory addresses within the kernel. But we could go further than this with some help from the compiler - you could also randomise the layout of the stack frame, the layout of structs in the kernel, and so on. This would mean compiling a new kernel each time you upgrade the kernel (or even each time you reboot), but that may be a price worth paying - and Fabrice Bellard's tccboot project showed that this can be done with relatively low overhead.

Difficulty: Challenging

Linux kernel NFS over TLS or NoiseSocket

nfs filesystem linux kernel security cryptography tls noise

Adam Sampson

Edinburgh

Details

NFS is the standard network filesystem on Unix-like systems. Traditionally it's unencrypted, relying on the security of the network; it can be run over Kerberos, but that's complex, difficult to set up in small networks, and does not support modern cryptography. The Linux kernel now has good built-in support for TLS and other modern cryptographic primitives; in particular, the Wireguard VPN system uses a protocol based on the Noise framework. In this project, you would add support to Linux for running NFS over a TLS or NoiseSocket transport, making it easy to set up secure network filesystems.

Difficulty: Challenging

Make sudo less awful

linux security open source

Adam Sampson

Edinburgh

Details

The sudo tool is sadly nearly ubiquitous on modern Linux systems - sadly, because it has a long and inglorious history of appallingly bad security holes, through being written in C and doing a complex, security-critical job. Find ways to improve this! You might look at re-engineering it in a more secure language (or language subset), or redesigning it to take advantage of privilege separation or operating system sandboxing, or...?

Difficulty: High

Make X work for high-DPI displays

x graphics linux unix

Adam Sampson

Edinburgh

Details

The X11 graphics system has been widely used on Unix-like operating systems since the 1980s. It was originally designed to be resolution-independent, supporting high-DPI output devices such as printers in addition to regular displays. However, if you try using it on a high-DPI display these days, you will find that some of the libraries and server behaviour make assumptions about display resolution that are not appropriate for a modern 200+ DPI display (e.g. requiring low-resolution fonts or not computing spacing correctly in GUI layouts). As a result, people resort to ugly, inefficient Windows-style hacks such as pixel scaling - rather than using the display at its native resolution. Fix this - disable pixel scaling, configure X to run at the native DPI of a modern 4k display, try a range of applications and work out what's broken.

Difficulty: Moderate

Model-check Linux's BPF verifier

model checking bpf linux kernel security

Adam Sampson

Edinburgh

Details

BPF is a virtual machine architecture that is used for various "programmability" tasks inside the Linux kernel - for example, you can use it to specify custom firewall rules or custom scheduling conditions. It's important that BPF is *not* a general-purpose architecture, since BPF programs must execute within a fixed amount of time and resources - the BPF verifier is responsible for checking BPF programs to make sure they meet these rules. Since the BPF verifier is just a bit of code written in C, it's had several bugs where harmful BPF programs are incorrectly validated. This seems like an ideal application for some formal reasoning - can you come up with a way of making the BPF verifier itself verifiably safe, so you can prove that it can't validate an unsafe program? I'm imagining using model-checking techniques for this, but I'm sure there are other ways you could attack this problem as well.

Difficulty: Challenging

Physically modelled classic drum machines

audio synthesis signal processing dsp music watch ya bass bins

Adam Sampson

Edinburgh

Details

A musical project - 1980s drum machines like the Roland TR-808 and TR-909 are still widely used and cloned today, both in hardware and software. Software implementations are often based on samples, though, rather than on a physical simulation of the circuitry - making for less variation and flexibility. In this project, you'd use an existing FOSS electronics simulation system to build a model of a drum machine (or some parts of it), and wrap it in a LV2 software synthesiser so it could be played within a digital audio workstation such as Ardour.

Difficulty: Moderate

Programming language based on Wadler's CP calculus

language design process calculus cp concurrency

Adam Sampson

Edinburgh

Details

A process calculus is a mathematical model of the behaviour of a concurrent program. Designing a programming language's facilities to correspond to a particular process calculus is interesting because it allows you to reason mathematically about the behaviour of programs written in that language - for example, proving that they can't deadlock or livelock. The Communication Sequential Processes calculus has been particularly successful, with languages like occam, Go and Rust using it as the basis of their concurrency facilities. But CSP dates from the 1970s, and there have been advances in process calculi since then! Philip Wadler's Classical Processes (CP) calculus is a particularly interesting example - it makes use of ideas from the theory of session types, which has traditionally been used to reason about the safety of things like network protocols and cryptographic procedures. It would be interesting to experiment with designing and implementating a simple programming language based on CP, in the same way occam is based on CSP.

Difficulty: High

Ransomware-resistant filesystem

filesystem linux kernel security

Adam Sampson

Edinburgh

Details

Ransomware-resistant filesystem or storage device. Revisit the ideas behind log-structured filesystems in order to maintain the filesystem so that it can always be rolled back to previous states. This could even be done at the physical device level (e.g. build a device that filters SATA commands), so you can't actually destroy anything permanently without physical intervention. Implement this, either within the Linux kernel, as a FUSE userspace filesystem, or as a prototype in userspace. (A previous student did the last of these, so I'd rather see a working implementation.)

Difficulty: High

Retrocomputing support for Radare

reverse engineering security cpu history

Adam Sampson

Edinburgh

Details

Radare is a suite of tools for reverse-engineering software - for example, automatically extracting a structured disassembly from a binary. It supports a range of modern architectures, but it would also be useful to apply it to code used on historical computer architectures - for example, when understanding the code as part of a computing history project, or when porting it to a new platform. I'd be particularly interested in support for the Motorola 68000 architecture (a 16/32-bit architecture widely used in the 1980s) and DEC PDP-10 architecture (a 36-bit architecture commonly used in the 1970s), but other architecture - e.g. various IBM mainframes - would also be interesting.

Difficulty: High

Secure video conferencing support to Jamulus

audio real-time conferencing music security privacy

Adam Sampson

Edinburgh

Details

Jamulus is a FOSS system that allows musicians (like me) to play together in real time over the Internet. It has good support for high-quality audio, but it doesn't support video, so many groups that meet on Jamulus also have to use a separate video conferencing system such as Zoom or Jitsi to see each other. Add support for simple video conferencing to Jamulus. Since the Jamulus protocol is highly latency-sensitive, I suspect this would be best done by integrating a separate video-conferencing protocol into the Jamulus client (ideally an existing one for interoperability). There are some pretty substantial privacy concerns around this so it would make an interesting project in terms of security usability engineering.

Difficulty: Moderate

Security extensions for RISC-V

risc-v security cpu architecture cheri

Adam Sampson

Edinburgh

Details

RISC-V is a modern RISC architecture based on open-source principles; there's a core of instructions and a collection of extensions that provide additional facilities (e.g. vector maths). There are existing high-quality toolchains and software emulators for it - you don't need RISC-V hardware to work with it. Projects like CHERI have experimented with extending existing computer architectures to provide better security facilities - CHERI adds pointer bounds to the ARMv8 architecture. Design an extension to RISC-V to provide similar facilities, or to improve software security in other ways (e.g. bounds checking, pointer authentication, untrusted data tracking...). Implement this in an emulator to demonstrate that it can be used to detect errors in programs. (A previous student experimented with pointer authentication successfully, so maybe try a different approach.)

Difficulty: High

TV Studio Simulator game

games history tv broadcasting social history

Adam Sampson

Edinburgh

Details

There are plenty of silly "XYZ simulator" games out there - how about one that simulates the staff of a busy 1960s/1970s TV studio? You could play as a camera operator, vision mixer, producer, boom operator, etc., or play together in multiplayer mode as a team of people trying to make a complex drama or news show work. Disasters would include broken equipment, misbehaving actors, unreasonable time pressure and invasions by visiting schoolchildren. For inspiration, have a look at the ADAPT project (https://www.adapttvhistory.org.uk/) or the stories on the BBC Tech Ops site (http://www.tech-ops.co.uk/next/).

Difficulty: Moderate

Use fuzzing to automatically test narrative games

fuzzing testing games interactive fiction

Adam Sampson

Edinburgh

Details

Difficulty: Moderate

Use fuzzing to identify faults in emulators

fuzzing testing emulation cpu security

Adam Sampson

Edinburgh

Details

Coverage-directed fuzzing is a highly effective technique for testing software - it combines random input with feedback from software coverage measurement to generate input that explores all the possible paths of execution through a piece of software. An emulator such as qemu, simh or MAME executes software written for a different architecture by simulating the CPU and peripherals in software. Faults in emulation are common - either producing incorrect results, or worse, producing security holes. However, if you have two emulators for a given architecture - or an emulator and a real CPU - then you could detect faults by using fuzzing to generate code, running it on both, and comparing the results; if they don't match, or the emulator crashes, you've found a problem. I would suggest picking a simple, common architecture with lots of different emulators available (Z80, 6502...) to maximise the chance of finding an interesting problem. (A previous student had a good attempt at this with a custom emulator, so I'd like the focus to be on analysing faults in existing emulators.)

Difficulty: Moderate

X or Wayland server in a safe language

graphics linux unix x wayland security

Adam Sampson

Edinburgh

Details

The X graphics system is widely used on Unix-like systems; its successor, Wayland, is starting to come into wide usage. Both of these have existing good-quality implementations that are written in C, and thus suffer from the usual security problems of unsafe languages. Implement a new X or Wayland server using a modern, safe programming language such as Rust, Go, Nim, Haskell or OCaml (I'm not interested in doing this with Java or C#). Alternatively, take one of the existing implementations and find a way of making it safe - for example by adding annotations to the C code to allow better safety analysis.

Difficulty: High

Anything related to emulation or the history of computing

history emulation retrocomputing games

Adam Sampson

Edinburgh

Details

I work on some open source projects that aim to preserve the history of computing. For example, I've written software to rescue data from failing floppy disks, and to reconstruct historical operating systems from the 1970s and early games engines. Previous student projects in this area have included emulation of 1980s processors and a software-hardware setup to explore the behaviour of historical microchips. I'd be interested in supervising any projects in this area.

Difficulty: Moderate

Port a classic operating system to a modern platform

operating system portability history arm risc-v

Adam Sampson

Edinburgh

Details

There are a range of older operating systems that have been released under FOSS licenses, including: - EmuTOS and MiNT (originally 68000) - https://emutos.sourceforge.io/ and https://freemint.github.io/ - RISC OS (originally ARM) - https://www.riscosopen.org/ - Coherent (originally x86 and others) - https://gunkies.org/wiki/Coherent As these were intended for use on computers of the 1980s, with processors running at a few MHz and at most a few MiB of memory, they would be a good fit in terms of resources for modern middle-spec microcontrollers. Take one of these systems and make it run on, say, an embedded RISC-V chip. (EmuTOS would be the simplest; MiNT the most capable; both written in C and compilable with modern toolchains. RISC OS is mostly in ARM assembler so porting some of it to ARMv8 would make for an interesting project. Coherent is 1980s-style C but was intended to be portable originally.)

Difficulty: High

Anything related to audio or music

music audio music theory signal processing

Adam Sampson

Edinburgh

Details

Projects I've supervised before in this space have included: - Using a Raspberry Pi board to implement a guitar effects pedal - Simulating the Hammond Novachord synthesiser as a software plugin - Generative music for an RPG game - Software to teach electric guitar by analysing chords played in real time - Automated analysis of the use of musical modes in game soundtracks If you're interested in doing something in this area please get in touch with me. (I'm not interested in anything related to generative AI.)

Difficulty: Easy

Investigating prominent factors affecting E-commerce development in Africa

e-commerce business intelligence data science digital marketing

Usman S Sanusi

Edinburgh

Details

African e-commerce users are expected to surpass half a billion by 2025, a record 40% landmark compared to about 140 million users in 2017 (13%), representing nearly 17% compound annual growth rate. Additionally, Africa leads mobile internet usage with over 13% above global average, as well as nearly 5% more than Asian mobile usage. This indicated a clear indispensability and promising value for mobile approach to online businesses targeting African markets. However, African ecommerce is far from maturity in terms of profitability, as recent reports showed less than 30% of e-commerce start-ups were profitable in the continent, while most of the bigger companies are yet to record a profit in more than a decade. Meanwhile various studies have identified challenges that include underdeveloped infrastructure, logistical constraints and limited payment gateways. This project would study the influencing factors for the successes and challenges affecting e-commerce development in Africa, building regression models of e-commerce development and its turnover index as the contribution to the national GDP. Resultant models would be validated on relevant time series data, while Business intelligence tools including Google Analytics software, to be employed for preliminary investigations on the leading African e-commerce platforms. To drive insight and potentially to provide suggestions on how to advance e-commerce, machine learning toolkit – weka and SPSS software package would be utilized for modelling and statistical analysis of the data.

Difficulty: Easy

Sentiment Analysis: Social Media influence on stock market prediction in the developing economies

forecasting data science artificial intelligence e-commerce digital economy

Usman S Sanusi

Edinburgh

Details

Difficulty: Moderate

State-of-the-art Machine learning in Energy demands and supplies

forecasting artificial intelligence machine learning renewable and electrical energy

Usman S Sanusi

Edinburgh

Details

Difficulty: Moderate

State-of-the-art Machine learning in Finance

data science artificial intelligence machine learning finance and stocks

Usman S Sanusi

Edinburgh

Details

Difficulty: Moderate

State-of-the-art Machine learning in Healthcare

data science artificial intelligence machine learning healthcare management

Usman S Sanusi

Edinburgh

Details

Difficulty: Variable

Multicultural Inheritance Application Software

Usman S Sanusi

Edinburgh

Details

An application that provides inheritance or estate sharing recommendations to different people according to their different customs or beliefs, as well as sharing based on a few national jurisdictions.

Difficulty: Moderate

Investigating prominent factors affecting E-commerce development in Developing Nations

Usman S Sanusi

Edinburgh

Details

The project would study the influencing factors for the successes and challenges affecting e-commerce development in a number of developing nations, building regression models of e-commerce development and its turnover index as the contribution to the national GDP. This project would study the influencing factors for the successes and challenges affecting e-commerce development in developing nations, building regression models of e-commerce development and its turnover index as the contribution to the national GDP. Business intelligence tools including Google Analytics (GA) software could be employed for preliminary investigations on the leading African e-commerce platforms upon timely agreement. Leveraging linear model’s capability to produce relationship between a set of independent variables and dependent variable, the project will build robust influence factor regression models of e-commerce development, exploring different implementations of algorithms. These include Classification and Regression Tree (CART), Multivariate Linear Regression, and partial least square regression (PLR). The regression models would be developed based on relevant time series data including indices on access to computers, Internet penetration, mobile phone ownership, population of middle class and levels of financial inclusion amongst others. These indices would be derived primarily from a number of publicly accessible data including those from United State Trade department, World Trade Organization (WTO) and global data platforms such as Statista. To drive insights and potentially to provide suggestions on how to advance e-commerce, machine learning toolkit – weka and SPSS software package would be utilized for modelling and statistical analysis of the data, while GA to provide quick and easy indications on the likely usability problems using non-identifiable and aggregate data.

Difficulty: Variable

Self-Supervised Learning for Analysis of Facial Micro-Expressions

machine learning self-supervised learning micro-expressions affective computing

John See

Malaysia

Details

The self-supervision learning (SSL) paradigm -- which allows for models to be trained on a task using the data itself to generate supervisory signals rather than relying on externally-provided labels-- has opened up new possibilities of having one generalised model to cater for a range of downstream tasks. Meanwhile, computational analysis of facial micro-expressions has been gaining attention among the affective computing community in the last decade due to the proliferation of new datasets. The field comprises several associated tasks, such as spotting and recognition, which may qualify as potential downstream tasks using a specially crafted SSL method. This project aims to investigate the feasibility of SSL models in several micro-expression analysis tasks and to discover insights that will be valuable to the community. This is a research-centric project.

Difficulty: High

ηEmo - Naturalistic Emotion Analysis Application for Web Video Streams

emotion analysis web application stream processing

John See

Malaysia

Details

Today's web streaming technology enables us to connect with others and interact with them remotely -- even interviews, work, and clinical assessments can now be conducted with the availability of video conferencing platforms. In this project, the aim is to build a prototype application (running in a web browser) that can analyse the emotional state of the person in the video stream. The definition of what it means by 'emotional state' can be further decided: categorical expressions (e.g. happy, sad), dimensional expressions (e.g. valence, arousal), and/or other related concepts like engagement levels or cognitive load. There are AI models available out there that can be directly utilised to achieve these tasks. The term "naturalistic" refers to real-world settings and environments where people express their emotions spontaneously rather than being instructed or told to act. *Depending on how the testing/verification process will be conducted, full ethical approval may be needed if people will be subjected to being assessed in the application.

Difficulty: Moderate

Drone-based Navigation using Vision-Language Models (VLMs)

drone navigation scene understanding vision-language models

John See

Malaysia

Details

As we enter a new era of autonomous navigation for drones (also known as unmanned aerial vehicles or UAVs), the challenge lies in providing intelligent interfacing components and a deeper knowledge of the environment in which the drone operates. It is essential to equip the drone with visual perception and understanding of the surroundings, and to locate suitable paths to navigate and accomplish mission goals. This project aims to develop a prototype system/software for drone-based navigation using vision-language models (VLM). By leveraging VLMs, it would be possible to incorporate more layers of multimodal information, including map information and instruction sets, to enhance navigation in drone missions. Steps to achieve this include matching text phrases to visual objects, detecting scene elements, converting instructions into geometric goals, using a planner to determine safe waypoints and actions, and finally, executing and deploying it on a drone.

Difficulty: High

FOODIE: Deployment of Models for Food Image Understanding

artificial intelligence model deployment restful api

John See

Malaysia

Details

Artificial Intelligence (AI) models for food images use computer vision and deep learning techniques to analyse, recognise, and interpret food-related visual data. Many of these models can classify dishes, estimate portion sizes, identify ingredients, assess food quality, and even predict calorie content from images. This project aims to develop a suite of SaaS (Software-as-a-Service) RESTful API endpoints for AI models that extract information from food images. Some feasible services to start with include food image classification, food image segmentation and food calorie estimation. The implemented APIs can be demonstrated via a simple program (web or mobile) that takes in an input image (or a set of images from a gallery) and returns the results to be displayed and visualised at the front-end. *As some available models may be dependent on GPU for inference, a feasible solution should be engineered - either by cloud hosting with compute (costly), or by finding alternative lightweight models for local hosting.

Difficulty: Moderate

AI In Digital Health

Talal Shaikh

Dubai

Details

Healthcare is one of the notable industries that has been influenced by the Fourth Industrial Revolution and Healthcare 4.0 is a term that has emerged to resemble this revolution. Healthcare 4.0 is a collective term for data driven digital health technologies such as smart health, mobile health, wireless health, eHealth, online health, medical IT, telehealth/telemedicine, digital medicine, health informatics, pervasive health, and health information system. The revolution in the healthcare industry is already underway, yet, because of the conservative and slower pace of technological adoption by healthcare insiders, as compared to other industries, digitalization in this sector has not been so evident (Pace et al., 2018; Manogaran et al., 2017). Multiple projects can be done in this space. Please get back to me for further discussion.

Difficulty: Variable

AR / VR Project for Smart Spaces

Talal Shaikh

Dubai

Details

To be discussed

Difficulty: Moderate

Authentication via intrabody communication

Talal Shaikh

Dubai

Details

Biometric authentication is simply the process of verifying your identity using your measurements or other unique characteristics of your body, then logging you in a service, an app, a device and so on. The human machine interface (HMI) is a main communication method between human and computer. Through current HMI, a machine receives and accurately responds to the commands instructed by the users. In the next generation of HMI, machines will be required to deal with more challenging problems/decisions (such as affective evaluations, ethical quandaries, and other innovations) in a self-governing manner. We can use Galvanic intrabody communication to transfer data through the human body.

Difficulty: Challenging

Embodied Agents that Chat

Talal Shaikh

Dubai

Details

Details to be discussed with the student.

Difficulty: Challenging

Emotion AI

Talal Shaikh

Dubai

Details

Artificial emotional intelligence or Emotion AI is also known as emotion recognition or emotion detection technology. Humans use a lot of non-verbal cues, such as facial expressions, gesture, body language and tone of voice, to communicate their emotions. The vision is to develop Emotion AI that can detect emotion just the way humans do, from multiple channels.

Difficulty: Challenging

Emotion Recognition in Chats / Videos

Talal Shaikh

Dubai

Details

TO BE DISCUSSED

Difficulty: Easy

Energy Aware Software Development

Talal Shaikh

Dubai

Details

Software plays an important role in battery life. The OS, firmware, drivers, and all small components are typically optimized to give better performance and energy efficiency. As the notebook PC (and smaller form factor devices, including tablets and smart phones) become pervasive compute platforms, battery life is becoming increasingly important, particularly with regard to standby or idle time. In addition, as hardware power states become more sensitive, software must be well behaved at idle so it doesnâ€™t needlessly wake components, which would limit battery life. Several case studies presented here show how software â€œidleâ€ behavior can have a negative impact in this area on Window -based systems.

Difficulty: Challenging

Energy Efficient Programming

Talal Shaikh

Dubai

Details

Software can influence the energy efficiency of hardware significantly, since all hardware is controlled by software.

Difficulty: High

Hidden Web Databases

Talal Shaikh

Dubai

Details

To be Discussed with the Student.

Difficulty: Moderate

Human Activity Detection

Talal Shaikh

Dubai

Details

Being able to detect and recognize human activities is essential for several applications, including smart homes and personal assistive robotics. We perform detection and recognition of unstructured human activity in unstructured environments. There are many different ways this can be We use a RGBD sensor (Microsoft Kinect) as the input sensor or Radio Wave(WIFI) and compute a set of features based on human pose and motion, as well as based on image and point-cloud information.

Difficulty: Easy

Indoor Drone Assistant

Talal Shaikh

Dubai

Details

With the rapid advance of sophisticated control algorithms, the capabilities of drones to stabilise, fly and manoeuvre autonomously have dramatically improved, enabling us to pay greater attention to entire missions and the interaction of a drone with humans and with its environment during the course of such a mission. I would be intrested to consider a project dealing with drones.For further discussions please do get in touch with me.

Difficulty: Challenging

IOT on Blockchain

Talal Shaikh

Dubai

Details

To be discussed

Difficulty: Variable

IOT Systems planner

Talal Shaikh

Dubai

Details

a suggestion system which selects the devices required based on the needs of the user, provides the price,possible network maps etc.

Difficulty: Variable

ML for finance

Talal Shaikh

Dubai

Details

This project aims to use machine learning techniques such as ensemble learning, convolutional neural networks etc. to predict spot prices for a variety of industries. Machine learning is increasingly used in finance to make predictions as well as to aggregate among existing strategies for making investments over time. We will use various free as well as proprietary data sets to assess the value of our newly developed methods in terms of both profit and risk, and compare them with state of the art techniques. This will also involve developing new â€œlucky factorsâ€ (features) that can be extracted from the data to inform and improve existing and new investment strategies. The expectation is that the work will lead to a conference publication.

Difficulty: Variable

Pervasive Authentication

Talal Shaikh

Dubai

Details

Description to be added later.

Difficulty: Challenging

Projects in Computer Vision

Talal Shaikh

Dubai

Details

I am intrested in Computer Vision projects that could be used for applications like Object identification with Semantic Analysis in real time, Self Driving Cars, Spatial Data Analysis etc.

Difficulty: Challenging

Robotics Based Projects

Talal Shaikh

Dubai

Details

I am interested in any robotics-based projects or application like:- Self Driving Cars Drones Navigation Robot mapping Robot Interactions Please get in touch with me if you need to discuss on any of these topics.

Difficulty: Variable

Text To Image Synthesis

Talal Shaikh

Dubai

Details

Generating photo-realistic images from text is an important problem and has tremendous applications, including photo-editing, computer-aided design, etc. More details to be discussed with the student

Difficulty: Moderate

User Authentication Via Wifi Signals

Talal Shaikh

Dubai

Details

There has been a growing interest to build the smart indoor environment solution such as smart home or office, which is capable of sensing and responding to the users using WIFI signals only. In this project, i would like to investigate the use of WIFI for robust authentication of the user in different situations.

Difficulty: Challenging

Visual Attendence Monitiory Sytem

Talal Shaikh

Dubai

Details

To create an student attendance monitoring system that would use Face recognition as one of its major features.

Difficulty: Challenging

Visual Food Log

Talal Shaikh

Dubai

Details

TO be discussed

Difficulty: Easy

Voice User Interface for a Smart House.

Talal Shaikh

Dubai

Details

To be discussed with the student

Difficulty: Challenging

Parsing with Algebraic Effects and Handlers

Filip Sieczkowski

Edinburgh

Details

Algebraic effects and their handlers are a modern approach to structuring computational effects of programs, including interaction with outside world, but also effects internal to the program. In addition to various libraries that provide the programmer with an ability to use algebraic effects, several experimental programming languages, including Helium (https://bitbucket.org/pl-uwr/helium/src/master/), Frank (https://github.com/frank-lang), Koka (https://koka-lang.github.io/koka/doc/book.html), etc. have been recently developed. These can be used as a means to study the impact of the new programming idiom on software development. This project aims to study the impact of programming with algebraic languages on parsing technology. It would require the student to a) investigate the programming idiom to be used, b) investigate parsing techniques, and decide on a family of techniques suited to implementation using algebraic effects, c) build a tool/library for a chosen language that uses algebraic effects to provide support for parsing. The prerequisites for this project include background in functional programming and strong interest in cutting-edge language technology in this area, as well as a background in language technology that would allow the student to efficiently review and adapt approaches to parsing (around the level given by the Language Processors course). Understanding of formal semantics of programming languages is not strictly necessary, but even limited exposure may be helpful in understanding the research papers that will need to be studied.

Difficulty: Challenging

Applying statistical and machine learning techniques to study the joint modelling of insurance claims and lapsation for insureds who subscribed to both automobile and homeowners insurance.

Karamjeet Singh K.Ranthir Singh

Malaysia

Details

This project proposes to study the joint modelling of insurance claims and lapsation related to insureds who subscribed to both automobile and homeowners insurance. This is useful to insurers as it may provide valuable insights into the area of rate making hence improving both pricing and underwriting. In the first semester we will look to replicate the study of Guillén et al. (2021). Data is available for policyholders in the Spanish market. In the second semester we shall try to get data from another country (student’s home country, if available) and do a similar study perhaps with some machine learning. We will then make comparisons with the study by Guillén et al. (2021).

Difficulty: Moderate

An Exploration of Machine Learning Algorithms in Healthcare sector (e.g. Heart Failure, Cancer classification, COPD prediction)

Drishty Sobnath

Dubai

Details

According to the World Health Organisation (WHO), cardiovascular diseases cause around 17.9 million deaths every year, due to an increasing number of heart attacks and strokes. Therefore, this study aims to utilize data science and machine learning techniques to explore the relationships among contributing factors that may have a bearing on the risk of suffering from heart disease. It aims to accurately predict whether a patient is likely to suffer from heart disease based on based on existing health public datasets.

Difficulty: Variable

Currency Recognition System for the Visually Impaired with Computer Vision AI

Drishty Sobnath

Dubai

Details

This project aims to develop an AI-powered mobile application that assists visually impaired individuals in identifying currency notes through real-time image recognition. Using a smartphone camera, the system detects and classifies banknotes under various lighting conditions and orientations. The recognized denomination is then communicated via audio feedback using text-to-speech technology. The model is optimized for accuracy, speed, and offline functionality, ensuring accessibility in both urban and rural settings.

Difficulty: Variable

Data-Enabled Mental Health: From Patterns to Interventions

Drishty Sobnath

Dubai

Details

According to a new report from a project carried out by Harvard Graduate School of Education, young adults in the U.S. report twice the rates of anxiety and depression as teens. The report identifies a variety of stressors that may be driving young adults’ high rates of anxiety and depression. The proposed project can utilize a mixed-methods approach, including surveys, focus groups, and interviews, and use of machine learning and data science tools to predict or visualize patterns in young people's perceptions and experiences surrounding their mental health.

Difficulty: Variable

Artificial Intelligence of Things (AIoT) in Smart Cities for indoor air pollution prediction

Drishty Sobnath

Dubai

Details

The accelerating convergence of artificial intelligence (AI) and the Internet of Things (IoT) has sparked a recent wave of interest in Artificial Intelligence of Things (AIoT). At this point, most of society understands the issue of air pollution and its repercussions not only on the climate but human health as well. However, not many of us seem to realise that indoor air quality is as important. Unfortunately, indoor air is also susceptible to pollution, and as studies show, its presence can be up to 8 times higher than in outdoor air and most people spend around 80 to 90% of their time indoors. By analysing historical data sets, physical, chemical and biological characteristics of indoor air, different models can be evaluated to predict Indoor Air Quality.

Difficulty: Variable

BREATH PROJECT: Better indoor Respiratory Environment through AI-based Tracking and Health-control for COPD Patients in the UAE

Drishty Sobnath

Dubai

Details

Difficulty: Challenging

Intelligent Model for Cyberbullying Detection and Intervention

Drishty Sobnath

Dubai

Details

Cyberbullying has become a severe issue, facilitated by the growth of social media. It can have devastating psychological impacts on victims. This project should develop an artificial intelligence framework leveraging natural language processing and deep learning for real-time cyberbullying detection.

Difficulty: Moderate

Intensive Care Unit Admission Prediction for Cardiovascular Patients Using Machine Learning

Drishty Sobnath

Dubai

Details

Intensive Care Unit (ICU) admission for cardiovascular patients represents a critical decision-making process fraught with challenges such as subjective criteria, delayed risk identification, and resource allocation complexities. This project should leverage machine learning to address these issues, aiming to refine ICU admission practices for enhanced patient care and optimal resource utilization.

Difficulty: Easy

Yoga Pose Corrector Using Machine Learning

Drishty Sobnath

Dubai

Details

Using pose estimation algorithms (e.g., OpenPose or MediaPipe), the system tracks key body joints from video or camera input and compares them against ideal pose templates. A classification model or rule-based system then determines pose correctness and identifies misalignments. The system provides audio or visual feedback to guide users toward proper alignment, making yoga practice safer and more effective especially for remote or self-guided sessions. Train a custom CNN/LSTM model for temporal pose sequence evaluation. Add real-time feedback via a mobile app. Include difficulty-based pose classification and progress tracker

Difficulty: Moderate

Serious Games

serious games

Mario Soflano

Edinburgh

Details

Serious games are a category of games designed with a primary purpose other than pure entertainment. They leverage the engaging and interactive nature of gaming to achieve specific educational, training, health, or social objectives. Here are some key aspects of serious games: 1. Educational Content: Serious games incorporate educational material to teach specific knowledge or skills. This can range from academic subjects to professional training. 2. Engagement and Motivation: By using game mechanics such as points, levels, and rewards, serious games keep users motivated and engaged in the learning process. 3. Interactive Learning: Players actively participate in the learning process, which can enhance understanding and retention of information. 4. Real-World Applications: These games often simulate real-world scenarios, allowing players to practice and apply what they have learned in a safe environment. Serious Games has benefits of: • Enhanced Learning: The interactive nature of games can make learning more effective and enjoyable. • Skill Development: Players can develop a range of skills, from cognitive abilities to social and emotional skills. • Immediate Feedback: Games provide instant feedback, helping players understand their progress and areas for improvement. • Accessibility: Serious games can be accessed on various platforms, making them available to a wide audience. Serious Games can be applied to: • Education: Used in schools and universities to teach subjects like math, science, history, and languages. • Healthcare: Help patients manage chronic diseases, undergo rehabilitation, or learn about health and wellness. • Corporate Training: Provide employees with training in areas such as leadership, safety, and technical skills. • Social Change: Raise awareness about social issues and promote positive behavior change. Serious Games can be implemented for PC / Console, Mobile, Virtual Reality and Augmented Reality

Difficulty: Easy

Location-based System

serious games

Mario Soflano

Edinburgh

Details

Location-based systems (LBS) leverage geographic information to provide services and information tailored to a user’s location. These systems have transformative potential in both educational and commercial sectors, enhancing user experiences and operational efficiency. As technology continues to evolve, the potential applications of LBS will expand, offering even more innovative solutions for various sectors.

Difficulty: Easy

Digital Health

Mario Soflano

Edinburgh

Details

Digital health encompasses the use of digital technologies to improve health outcomes, healthcare services, and overall well-being. With the aim to enhance the efficiency and accessibility of healthcare Digital Health integrates various technological advancements, including mobile health (for example for remote health monitoring), health information technology (such as Electronic Health Records and Health Data Analytics), wearable devices like Fitness Trackers and Medical Wearables and telehealth such as Virtual Consultations and Remote Diagnostics.

Difficulty: Easy

Create interactive extended reality applications

Ryad Soobhany

Dubai

Details

To create interactive extended reality (AR, VR) applications (e.g. healthcare, education).

Difficulty: Variable

Multimedia security and forensics

Ryad Soobhany

Dubai

Details

Applications of multimedia security. Extracting features and artefacts from multimedia for forensic/security analysis.

Difficulty: Variable

Visualising and modelling Satellite data

Ryad Soobhany

Dubai

Details

To visualise high-dimensional image Satellite data. Student will also need to perform ml modelling.

Difficulty: Challenging

Human Robot Interaction/Human Computer Interaction for Assistive Rehabilitation

human robot interaction hri human computer interaction hci user experience design ux design rehabilitation games health

Shenando Stals

Edinburgh

Details

This project aims to explore, test, and evaluate games and equipment for assistive rehabilitation from the Fourier Lab at the National Robotarium. This will involve a study/experiment with human participants in need of lower limb rehabilitation in collaboration with external organizations, and analyzing and reflecting upon the data collected with possible suggestions for design improvements. Assistance and basic training will be provided. Please note that this project is currently in its inital phase and we are discussing and exploring the focus and possibilities with external collaborators, so it would be advisable to email me first to have a quick chat regarding this project, see what the current state is, and what your own ideas are before applying for this project.

Difficulty: Moderate

Human Robot Interaction/Human Computer Interaction in the Urban Environment

human robot interaction hri human computer interaction hci health football memories stories social robot urban environment urban interaction design

Shenando Stals

Edinburgh

Details

This project aims to design, implement, and evaluate a social robotic intervention in the urban environment. The focus will be on using multimodal interaction with a social robot in a football stadium to share personal, football-related memories and stories, potentially for people with dementia. Besideds the project outlined above which I am currently working on, I am also open to discussing new project ideas that you might have yourself in the field of urban interaction design, which take a human-centred design approach to technology (not just robots!) and data-rich urban environments such as smart cities. Please drop me an email to if you would like to discuss this further.

Difficulty: High

Autoformalization with LLMs and Proof Assistants

Kathrin Stark

Edinburgh

Details

Are you interested in the intersection of machine learning, mathematics, and verification? Do you want to explore how machine learning models and proof assistants can be combined to automatically translate natural language mathematics into formal specifications? Then this honours thesis project is for you! In this project, you will work on testing how well LLMs perform on autoformalization, i.e. translating natural language mathematics into formal language. You will explore the capabilities and limitations, as well as evaluate its performance in comparison to existing methods for autoformalization. You will also have the opportunity to learn about proof assistants and their role in formal verification. If you are interested in this project, please contact me at [k.stark@hw.ac.uk](mailto:k.stark@hw.ac.uk) or make a short appointment at https://outlook.office365.com/owa/calendar/KathrinsMeetingCalendar@heriotwatt.onmicrosoft.com/bookings/.

Difficulty: Variable

Develop solution strategies for a game or riddle

Kathrin Stark

Edinburgh

Details

The goal of this project is to develop a base solution, and then iteratively improve on this solution while reasoning on the soundness of the improved solution. See for example https://www.cs.tufts.edu/~nr/cs257/archive/richard-bird/sudoku.pdf for a solution of Sudoku in Haskell. I have some ideas, but you are very welcome to bring your own ideas. If you are interested in this project or would like to discuss your own project ideas, please contact me at [k.stark@hw.ac.uk](mailto:k.stark@hw.ac.uk) or make a short appointment at https://outlook.office365.com/owa/calendar/KathrinsMeetingCalendar@heriotwatt.onmicrosoft.com/bookings/.

Difficulty: Variable

How to ensure a program has never bugs? (Program Verification)

Kathrin Stark

Edinburgh

Details

From security protocols used in online banking, over embedded control systems, to e-mail and disk encryption: every day, we use software we rely on to be safe and secure. Many things can go wrong: there might be bugs in the program itself or the compiler can produce wrong machine code. Formal verification of a program allows us to prove indisputably â€“ using only a small set of assumptions and deduction rules â€“ that all inputs lead to the desired behaviour. This guarantee is particularly important if faulty software would lead to a significant loss or if the software has to withstand a determined attacker. For realistic programs, verifying rich specifications beyond shallow properties such as no out-of-bound array subscripts in a fully automated way is challenging due to the immense search space. Interactive proof assistants such as Coq or Isabelle allow humans to fill in where fully automated methods fail by allowing such proofs to be developed in an interaction between humans and computers. For this reason, proof assistants have recently gained importance also in industrial applications and are used by companies like Microsoft, Amazon, Apple, and Google. I'm looking for students who are interested in learning about the verification of programs. I also have ideas of variable difficulty from current research projects. Let's talk! If you are interested in this project or would like to discuss your own project ideas, please contact me at [k.stark@hw.ac.uk](mailto:k.stark@hw.ac.uk) or make a short appointment at https://outlook.office365.com/owa/calendar/KathrinsMeetingCalendar@heriotwatt.onmicrosoft.com/bookings/.

Difficulty: Variable

Optimizing an OCaml Compiler

Kathrin Stark

Edinburgh

Details

Are you interested in compilers and programming languages? Do you want to extend your knowledge in OCaml and optimize a compiler for a real course project? Then this honours thesis project is for you! In this project, you will work on extending the existing OCaml compiler for the F29LP course with several optimizations, such as live variable analysis and dead code removal. You will learn about compiler design and implementation, as well as gain experience with OCaml programming. Your work will involve researching and implementing the optimizations in the compiler, evaluating their effectiveness, and documenting your findings. If you are interested in this project, please contact me at [k.stark@hw.ac.uk](mailto:k.stark@hw.ac.uk) or make a short appointment at https://outlook.office365.com/owa/calendar/KathrinsMeetingCalendar@heriotwatt.onmicrosoft.com/bookings/.

Difficulty: Variable

Verification of Compiler Optimizations

Kathrin Stark

Edinburgh

Details

Compiler optimization is the process of improving the performance of a compiler-generated code by applying a series of transformations to the code. For realistic programs, verifying rich specifications beyond shallow properties such as no out-of-bound array subscripts in a fully automated way is challenging due to the immense search space. Interactive proof assistants such as Coq or Isabelle allow humans to fill in where fully automated methods fail by allowing such proofs to be developed in an interaction between humans and computers. For this reason, proof assistants have recently gained importance also in industrial applications and are used by companies like Microsoft, Amazon, Apple, and Google. In this project, you will work on proving the correctness of optimization steps in a compiler using the Coq proof assistant. Your work will involve implementing and verifiy the optimization steps in the proof assistant, formalizing their correctness proofs, and documenting your findings. If you are interested in this project or would like to discuss your own project ideas, please contact me at [k.stark@hw.ac.uk](mailto:k.stark@hw.ac.uk) or make a short appointment at https://outlook.office365.com/owa/calendar/KathrinsMeetingCalendar@heriotwatt.onmicrosoft.com/bookings/.

Difficulty: Variable

Writing a Lexer Generator

Kathrin Stark

Edinburgh

Details

In this project, you will work on writing a lexer generator in a programming language of your choice. Lexer generators are tools that automatically generate the lexical analyzer component of a compiler based on a set of regular expressions. You will learn about compiler design and implementation, as well as gain experience with programming language theory and lexing. Your work will involve researching and implementing the lexer generator, evaluating its effectiveness, and documenting your findings. You will also have the opportunity to explore different programming languages and lexing techniques. If you are interested in this project or would like to discuss your own project ideas, please contact me at [k.stark@hw.ac.uk](mailto:k.stark@hw.ac.uk) or make a short appointment at https://outlook.office365.com/owa/calendar/KathrinsMeetingCalendar@heriotwatt.onmicrosoft.com/bookings/.

Difficulty: Variable

Your own project around compilers

Kathrin Stark

Edinburgh

Details

If you have a project idea around compilers/F29LP, I am happy to meet and discuss it. I also have several ideas for suitable projects. Let's talk! Feel free to contact me at k.stark@hw.ac.uk or make a short appointment at https://outlook.office365.com/owa/calendar/KathrinsMeetingCalendar@heriotwatt.onmicrosoft.com/bookings/ so that we can discuss a possible project together. ###

Difficulty: Variable

Your own project around functional programming

Kathrin Stark

Edinburgh

Details

If you have a project idea around functional programming, I am happy to meet and discuss it. I also have several ideas for suitable projects. Let's talk! If you are interested in this project please contact me at [k.stark@hw.ac.uk](mailto:k.stark@hw.ac.uk) or make a short appointment at https://outlook.office365.com/owa/calendar/KathrinsMeetingCalendar@heriotwatt.onmicrosoft.com/bookings/.

Difficulty: Variable

Master Class in anything around Verification/Interactive Theorem Proving/Language Processors

itp; verification

Kathrin Stark

Edinburgh

Details

I'm happy to supervise any student in a master class on any of the above topics. There are many topics available; talk to me if you have an idea or are just interested in one of the topics.

Difficulty: Easy

A high level dependently typed dataflow language for hardware

Rob Stewart

Edinburgh

Details

Dataflow languages are commonly used to program embedded systems such as Field Programmable Gate Arrays (FPGAs). Static dataflow models ease reasoning and compile-time scheduling, however their lack of expressivity constrains the programmer's ability to implement complex algorithms. Various approaches have been taken to 1) identify and 2) reason about static dataflow properties of dataflow programs, including static analysis and model checking. This project will take a new approach, which is to implement a static dataflow embedded domain specific language (DSL) in the dependently typed Idris language. This approach will use Idris's type checker to prove static properties of dataflow actors, and will infer data rates and ideal scheduling policies for compilation to FPGAs. A possible implementation plan for the project is: 1) Developing the dataflow EDSL in Idris. 2) Developing scheduling policies within the types of the EDSL. 3) A Verilog backend of the EDSL to target FPGAs.

Difficulty: Challenging

Add library-level prefetching to Haskell

Rob Stewart

Edinburgh

Details

A recent paper shows that adding prefetching to the implementation of operations over data structures (i.e. container libraries) can yield a significant speedup by hiding the latency of memory access. Here's the paper: "Prefetching in Functional Languages", ISMM 2020. https://www.cl.cam.ac.uk/~tmj32/papers/docs/ainsworth20-ismm.pdf That work was done in the context of the language OCaml. This project will investigate whether adding prefetching to Haskell libraries like the `containers` library can yield speedups in the same way. An additional dimension in the design space for Haskell is its lazy evaluation semantics, which might mean that prefetching benefits are even more significant for Haskell than they are for OCaml. This project will find out.

Difficulty: Challenging

Haskell memory performance programmer feedback

Rob Stewart

Edinburgh

Details

Haskell is an almost unique language, in the sense that it has lazy-by-default evaluation semantics and it is a pure language (no side effects). Laziness poses challenges for reasoning about memory performance and memory access behaviours. Recent tooling developments have enabled more precise memory profiling of Haskell code: https://well-typed.com/blog/2024/01/ghc-eras-profiling/ This project would evaluate the usefulness of these new tools, and look to extend them by integrating their reports into IDEs for source-code annotations.

Difficulty: Moderate

Comparing deep learning accelerator hardware

Rob Stewart

Edinburgh

Details

"AI at the edge" allows autonomous devices and smart sensors to perform tasks such as object detection, classification, speech recognition and complex text processing tasks -- in real time and with very low power requirements. This project will compare the performance and usability of two neural network accelerator devices: The a Google TPU on a Coral.AI USB, and an Intel Neural Compute stick USB. If the student wishes to go further, a third comparator would be programming a neural network into hardware fabric with an FPGA.

Difficulty: Moderate

Compressing neural networks for FPGA based hardware accelerators

Rob Stewart

Edinburgh

Details

For executing deep learning algorithms based on neural networks, there is a shift expensive (finance and energy) centralised GPUs to Edge Computing devices such as embedded CPUs and programmable hardware (Field Programmable Gate Arrays, FPGAs). Trained neural networks require hundred of Megabytes or memory, and high computation resources. Emerging compression techniques are able to reduce those resource costs significantly. However, this also affects the accuracy of neural networks. This project will explore industry driven frameworks e.g. Xilinx's Python FINN framework and Intel's Distiller framework, to assess the speed, accuracy and resource use performance metrics for across a set of neural network models. The outcomes of the project will inform users of neural networks and of embedded processors on how best to construct and refine deep learning models for application domains such as remote image processing smart cameras, autonomous robot, and driverless vehicles.

Difficulty: Challenging

Implementing Haskell with FPGA hardware

Rob Stewart

Edinburgh

Details

The performance of functional programming language implementations until 15 years ago relied on increasing CPU clock frequencies. The last decade has seen the rise of the multi-core era to overcome this stall. Due to a single connection from multiple cores on a CPU to main memory, general software implementations of parallel programming languages are finding the limits of CPU architectures due to the Von Neumann bottleneck. In very recent times, the fabric on which we compute has changed fundamentally. Driven by the needs of AI, Big Data and energy efficiency, industry is moving away from general purpose CPUs to efficient special purpose hardware e.g. Google's Tensorflow Processing Unit (TPU) in 2016, Huawei's Neural Processing Unit (NPU) in smartphones, and Graphcore's Intelligent Processing Unit (IPU) in 2017. This reflects a wider shift to special purpose hardware to improve execution efficiency. This BSc project will explore alternative approaches to implementing functional languages, by offloading runtime system components onto dedicated FPGA-based hardware implementation. For example special purpose memory hierarchies to minimise memory traffic, prefetching in hardware, or garbage collection in hardware to reclaim memory with minimal latency. The project will work alongside the EPSRC HAFLANG project ( https://haflang.github.io ), where the aim is to develop a processor architecture for managed functional languages to significantly outperform the runtime and energy performance of CPUs. The student would be working alongside the HAFLANG project's postdoc researcher on this project, with frequent meetings with Rob Stewart.

Difficulty: Challenging

Monitoring protocol compliance of IP hardware blocks

Rob Stewart

Edinburgh

Details

Programmable hardware is increasingly used in many application areas, including Internet of Things and Smart devices, as well as safety critical systems. The correctness of hardware designs is therefore essential. When two hardware components communicate, they must do so via an agreed protocol. An example is the widely used AXI protocol. Many hardware designs are not open source, meaning you can use the generated IP hardware block from a vendor, but cannot verify its implementation by inspecting the hardware description code. How are you to trust a piece of hardware? So instead, one must inspect its communication behaviours to have confidence in its correctness. There is a tool called 'mbac', which generates hardware blocks from temporal properties that describe protocol rules. You can synthesise the generated hardware, meaning you can deploy it alongside closed source IP blocks, and it flags any time these properties are violated. This project will evaluate the mbac tool in terms of its usefulness for widely used protocol standards, its scalability to complex protocols, its effectiveness in catching hardware bugs, and potential practical deployment scenarios to make users aware of hardware bugs.

Difficulty: Challenging

Profiling Haskell's lazy evaluation

Rob Stewart

Edinburgh

Details

The Haskell functional language is based on lazy evaluation where computation is not performed until their values are required. This has many formal and pragmatic advantages over the more common strict evaluation but carry some runtime overhead. This project will involve systematically profiling lazy, strict and lazy/strict hybrid Haskell benchmark implementations to expose the strengths and weaknesses of Haskell's non-strict evaluation.

Difficulty: Challenging

Semantic Web type providers in F#

Rob Stewart

Edinburgh

Details

Type providers is an idea in programming languages research, whereby an external data source such as a CSV file is used to generating programming constructs. Typically these components are types (i.e. Java classes, or types in functional languages), but can also be properties or even functions or methods. In the semantic web world, the goal is to provide as much meaning to data as possible. In other words, what is the semantic meaning of an entity from a particular domain, and what is relationship between entities in that domain. This project will investigate whether there is a deep connection between how domains are captured using ontologies from the semantic web, and type systems for functional languages. The student on this project may wish to explore building type providers in their favourite functional language. As a starting point, with support for RDF data to convert ontology schema into types of that programming language. For a richer investigation, the project will explore how ontological rules can be mapped to types and constraints in the target programming language. This project may raise some interesting questions: if type constraints derived from ontology rules are sufficiently strong, can we perform ontology reasoning to check RDF semantic web data is sound and consistent? Can we do this at compile time type provider OWL/schema data sources and RDF data sources? On the practical side of this work, take a look at RDF type providers for the F# language available in the Iride library: https://github.com/giacomociti/iride That may be a starting point for any implementation work.

Difficulty: High

Special purpose hardware for RDF stream processing

Rob Stewart

Edinburgh

Details

Stream processing is about doing one-pass execution of continuous queries over a potentially infinite stream of values. Streams come with different characteristics, e.g. data rate from low (one or two values an hour) to high (thousands of values a second), and is published either in steady stream or in highly irregular bursts. Many use cases require joining values across streams and stored sources, as well as computing aggregation functions. To support these operations over potentially infinite streams, windowing operators are used to provide a scope for the operation. Additionally with RDF streams, there is also the possibility of performing inferencing over the stream of values, i.e. generating new data based on the content of the stream. All these needs pose interesting challenges for stream processing: how much data needs to be windowed to fire inference rules? And crucially, what processing hardware should be used to support throughput of thousands of RDF data items a second? Field Programmable Gate Arrays (FPGAs) are programmable hardware chips, which when configured have a circuit that is precisely design to meet an algorithmic need. For streaming domains they offer extremely high throughput, and use very little energy. This project will investigate the use of FPGAs for developing special purpose a RDF stream processor. A key hardware design decision will be to decide how "programmable" the hardware is at runtime. In other words should the hardware design allow runtime FPGA programming to (1) switch window operators, (2) change the window size, (3) upload new inference rules. A desirable artefact from this project will be an open source hardware design for a RDF stream processing hardware accelerator.

Difficulty: High

Static analysis of dataflow program performance

Rob Stewart

Edinburgh

Details

Dataflow languages are good models for programming embedded and configurable hardware architectures, because the distributed memory model maps well to these architectures. There's a wide variety of dataflow models, from static dataflow all the way to dynamic dataflow. Static models are easier to reason about e.g. how much data can be processed, but these programming models are restrictive. Dynamic models can express complex algorithms but reasoning about runtime behaviour is much harder. This exposes a sweet spot: can we expose moderately expressive programming models without losing all ability to reason about performance, or to generate efficient hardware. This project will explore the idea of abstracting properties of dataflow programs, to compute what the throughput performance capabilities of such programs is, without needing to run them. The technology to integrate could be (1) HoCL for programming, (2) Kiter to determine throughput performance and (2) PREESM to turn programs into executables. Possibly also using DIF as an interchangeable dataflow model format. https://github.com/jserot/hocl/ https://github.com/bbodin/kiter https://preesm.github.io https://www.researchgate.net/publication/220714226_DIF_An_Interchange_Format_for_Dataflow-Based_Design_Tools

Difficulty: High

Targeting FPGAs with parallel functional languages

Rob Stewart

Edinburgh

Details

FPGAs are reconfigurable chips and offer promise of very high performance, low powered targets for accelerated computation. They have potential in many domains including High Performance Computing, Cloud Computing, embedded processing and autonomous robotics. FPGAs are usually programmed at very low levels with hardware description languages, and sometimes at the higher level C language. High level parallel array processing languages like APL, Accelerate and Chapel abstract above hardware, usually targeting multicore CPUs and GPUs. This project will involve writing an FPGA backend for the Accelerate DSL, an embedded language in Haskell, Backend options are OpenCL or C++, from which High Level Synthesis tools will generate FPGA hardware designs. This is a compiler research project that will establish how to produce high performance (outperforming parallel CPUs and GPUs) and efficient hardware from very high level array processing codes.

Difficulty: Challenging

Verification aware quantisation of neural networks for FPGAs

Rob Stewart

Edinburgh

Details

Neural networks have until recent times predominately been trained and executed on GPUs in data centres. The recent trend is to push trained neural networks to Edge Computing devices, such as smart phones, autonomous vehicles and smart cameras. One hardware architecture in this space are FPGAs, which are programmable hardware chips that can be deployed with a sensor for low powered operation, reaching extremely high throughput. This makes FPGAs ideally suited for data driven AI problems that involves high volumes of input data and where a high throughput of inferences are required. One downside of FPGAs and other Edge Computing AI accelerators, is that they are limited in how much memory they have to store trained neural network parameters. The solution is the compress neural networks, e.g. by quantising the precision of their weights. Doing so has an inevitable effect on inference accuracy, but this is a surprisingly small amount. Another, more serious, effect of quantisation is the loss of robustness a neural network might have against adversarial attack. Currently, quantisation algorithms focus on the trade off between accuracy and memory requirements. This project will instead design a new quantisation algorithm, one that guided by formal verification approaches such that reducing precision of neural network parameters does not affect the robustness of the overall network to adversarial attack. This will be done in the context of the open source Brevitas component in the FINN compiler framework developed by Xilinx Research. This project can be done in close collaboration with Xilinx Research if the student wishes.

Difficulty: High

Profiling low-level memory access of Haskell

Rob Stewart

Edinburgh

Details

Haskell is an almost unique language, in the sense that it has lazy-by-default evaluation semantics and it is a pure language (no side effects). Laziness poses challenges for reasoning about memory performance and memory access behaviours. Existing profiling tools for Haskell do not measure the latency of memory access, or the contention on the memory bus for parallel Haskell programs. This project will investigate how to use low-level tooling to evaluate the cost of Haskell's properties of (1) laziness and (2) immutability (purity), when it comes to memory access costs. E.g. how long to CPU cores need to stall waiting for code and data from memory, and how much contention is there on the serial memory bus when running parallel Haskell programs on up to 64 CPU cores. This project will involve lifting very low level profiling information into the context of Haskell user code.

Difficulty: High

Mechanising English writing checks

Rob Stewart

Edinburgh

Details

William Strunk wrote the book "The Elements of Style" in 1918, and it remains influential and invaluable for English writers. It provides "principal requirements of plain English style". It aims to "lighten the task of instructor and student by concentrating attention on a few essentials, the rules of usage and principles of composition most commonly violated." This project will explore to what extent one can mechanise the English writing rules set out in this book. Similar attempts have been made to mechanise "linting" checks as software, e.g. textlint and its vast set of plugins. https://textlint.github.io/ Can William Strunk's book be implemented as one or several textlint plugins? Will users, e.g. academics writing papers and students writing dissertations, value the suggestions made? How accurate is the software to the book's given examples? Which rules in the book cannot mechanised in this way?

Difficulty: Moderate

Hardware acceleration of skin lesion classification

Rob Stewart

Edinburgh

Details

Using smart imaging devices to diagnose skin lesions could automate early clinical diagnosis in developing countries, and provide accessible dermatological care in remote areas with limited healthcare infrastructure or where access to health services is expensive. The main challenges for such imaging technology for skin lesion classification are (1) using a deep learning model small enough to fit on a compact handheld device, (2) fast and sufficiently accurate classification and (3) operating without reliance on internet connectivity. With embedded processing platforms there is a compute spectrum, from general purpose embedded CPUs, to GPU co-processors, all the way to truly custom hardware logic e.g. with FPGAs. Although challenging to program, the main advance of configurable FPGA hardware is you often get both high throughput and low power use, rather than a compromise between the two. This goal of this project is to develop a skin lesion classifier deep learning model for FPGAs. It will target the Pynq FPGA platform, using AMD's Brevitas quantisation framework for compressing the model and the AMD's FINN framework for deploying to the Pynq architecture. The classification accuracy, inference time and energy use of the developed system will be compared with a similar Raspberry Pi based skin lesion classifier. This would be in collaboration with PhD student Tess Watt, whose research is closely aligned to this proposed project, and Dr Christos Chrysoulas.

Difficulty: Moderate

Automated Essay Grading using Fine-Grained Linguistic Features

essay grading machine learning dimensionality reduction feature extraction

Ian Tan

Malaysia

Details

The intricate relationship between fine-grained linguistic features and writing quality is to be reproduced. This is based on a recent paper titled "Incorporating Fine-Grained Linguistic Features and Explainable AI into Multi-Dimensional Automated Writing Assessment" by Tang et al. (2024). The paper harnessed computational analytic tools and Principal Component Analysis (PCA) to distill and refine linguistic indicators for model building and construction. This project is to reproduce the work done by Tang et al. and is to be scoped accordingly, likely to be with a subset of the feature extraction tool, and without the explanable AI component.

Difficulty: Moderate

Report Writer for Authors' Publications

scopus information extraction api report writing publication

Ian Tan

Malaysia

Details

This project is to use the Elsevier Developer Product API to extract a list of authors' publications based on a range of dates and either develop or use a reporting tool to allow for flexible output format, including MS-Excel readable formats. In summary: 1) An administrator interface, which will include the management of authors 2) Extract using the Elsevier API (and from Google Scholar or other indexing) using a selected list of authors 3) Store the raw format 4) Allow a configurable report output format based on the raw format

Difficulty: Moderate

Auto k-means with cluster labelling using LLM

clustering k-means llm unsupervised machine learning

Ian Tan

Malaysia

Details

Auto k-means clustering is the automated process of determining the optimal number of clusters, k, in the k-means algorithm. This is typically an iterative process that attempts to find the ideal number of clusters. In this project, it will be influence by the domain that it is to be applied to and furthermore, this unsupervise machine learning method assigns a numeric identifier to the clusters where the project's second part is to assign meaningful names to the clusters based on the domain and the influential features that determined the clusters.

Difficulty: High

Effect of Economic Uncertainty on Bitcoin prices

bitcoin economic uncertainty correlation causation

Ian Tan

Malaysia

Details

The "Kellogg economic uncertainty measure", or commonly known as the Economic Policy Uncertainty (EPU) index was developed at the Kellogg School of Management, Northwestern University, by Scott Baker, Nicholas Bloom, and Steven J. Davis. This project is about determining whether economic uncertainty has an impact on the prices of cyptocurrency, mainly Bitcoin. The EPU index is constructed from extracting information from newspaper articles where it quantifies the level of economic policy uncertainty through the frequency of articles containing specific keywords. Using this EPU and time lagged Bitcoin (or other crytocurrency) prices, the project is to determine whether there is a correlation, and more importantly, whether a causation can be determined using Granger causality, cross-correlation causality, or other appropriate algorithms.

Difficulty: Easy

Real and Fake Face Images Detection using Machine Learning

Nurul Ain Toha

Malaysia

Details

This project focuses on developing a machine learning model to accurately detect real and fake face images using the "Real and Fake Face Detection" dataset from Yonsei University on Kaggle. With the rise of deepfake technology, which creates hyper-realistic synthetic faces through methods like GANs (Generative Adversarial Networks), this project aims to build a model that can distinguish between authentic and AI-generated faces. The model will utilize deep learning, specifically Convolutional Neural Networks (CNNs), and transfer learning techniques to enhance performance. By addressing the growing need for digital media authenticity, the project contributes to combating the misuse of AI in generating fake images.

Difficulty: Moderate

Data Science Project

Adrian Turcanu

Dubai

Details

To be updated.

Difficulty: Challenging

Event-B models of Spiking Neural P systems

Adrian Turcanu

Dubai

Details

Event-B is a modelling language that can be used to specify mathematical models of transitional systems. Spiking neural P systems are the result of introducing the idea of neurons into membrane computing. The main goal of this project is to give a methodology to obtain the Event-B model of a SN P systems, to apply it on some examples and to verify the properties of the model using Pro-B, a model checker integrated into a platform called Rodin.

Difficulty: High

Using model checking on Event-B models of non-deterministic algorithms

Adrian Turcanu

Dubai

Details

Event-B is a mathematical modelling language that can be used to model various transitional systems. The aim of the project is to develop a methodology for constructing Event-B models of non-deterministic algorithm, to apply it on several case studies and to investigate these by using the model checked ProB.

Difficulty: Challenging

Modelling, testing and verification of robotic mechanisms

Adrian Turcanu

Dubai

Details

TBU

Difficulty: Challenging

Employing Robotic Process Automation (RPA) for Streamlining a Business Process

robotic process automation rpa

Cristina Turcanu

Dubai

Details

RPA bots could be implemented in areas as diverse as finance, compliance, legal, customer service, operations, and IT (student’s choice). Students can choose to implement attended automation, i.e., software robots that can work alongside humans to share the workload in real-time. In case of unattended robots, they should be scheduled to handle long processes or automations without the need of human interaction. Prove the usability of RPA in several use-cases (https://docs.uipath.com/robot/docs/attended-vs-unattended).

Difficulty: Variable

Process Mining and RPA: benefits and challenges in the Industry 4.0

robotic process automation rpa process mining

Cristina Turcanu

Dubai

Details

A comparative study of Process Mining and RPA, highlight the benefits of putting them together, as well as the key differences between these two. Provide an overview of the context and end-to-end viewpoint to enhance processes and ensure the successful delivery of automation-driven outcomes. Research how process mining helps identify the most suitable processes for RPA The project should contain some use cases.

Difficulty: Variable

Expand automation by using Robotic Process Automation combined with Machine Learning models.

rpa robotic process automation machine learning ml

Cristina Turcanu

Dubai

Details

Highlight the benefits of using Automation (RPA) combined with ML models. The research should involve some use cases. Explain Intelligent Process Automation.

Difficulty: Variable

Process Mining powered by Machine Learning

process mining machine learning

Cristina Turcanu

Dubai

Details

Highlight the benefits of using Process Mining combined with ML models. The research should involve some use cases.

Difficulty: Variable

Machine Learning Applications

Cristina Turcanu

Dubai

Details

Difficulty: Challenging

Blockchain Applications

Cristina Turcanu

Dubai

Details

Investigate the use of blockchains to different sectors of activity, e.g - Lending - Real estate (maybe with cryptocurrency payments) - Voting - Data storage - other

Difficulty: Challenging

Cognitive architectures for autonomous robots

Adrian Turcanu

Dubai

Details

TBD

Difficulty: Challenging

Classification of Recyclable Waste Using Machine Learning

convolutional neural network (cnn) image recognitioreal-time detection smart waste sorting sustainable recycling computer vision intelligent system deep learning

Cristina Turcanu

Dubai

Details

The system aims to improve the efficiency of waste management and support sustainable recycling practices.

Difficulty: Challenging

Assessing the Impact of Climate Change on Insurance Claims Using Advanced Statistical Approaches

George Tzougas

Edinburgh

Details

Climate change-induced weather hazards are significantly straining property insurance, resulting in extensive damage and substantial claims, particularly in vulnerable regions. A recent report by the European Insurance and Occupational Pensions Authority (EIOPA) reveals that many non-life insurance businesses adjust their pricing annually based on recent events to implicitly consider climate change, given their short-term contract duration. However, as the report also highlights, this approach may have detrimental consequences. Specifically, the sector requires foresight to understand the im-pact of climate change, anticipate higher premiums, prioritize adaptation and mitigation measures, and continuously monitor trends for informed decision-making. Government interventions to share the premium burden are particularly crucial, especially for economically disadvantaged policyholders, to ensure equitable access to insurance and enhance overall societal resilience against climate risks. This project will enhance conventional regression models by incorporating advanced statistical techniques to develop more accurate climate-related claim frequency and severity models for the property insurance market. Unlike traditional models, which often overlook detailed risk factors, our approach will include specific property and content risks while exploring their complex interactions with weather-related hazards. This integration is expected to improve the prediction accuracy of claim numbers and costs, providing a more robust framework for managing the increasing risks associated with climate change and offering greater adaptability to evolving conditions. The proposed approach is expected to significantly enhance predictive performance, enabling actuaries to ensure that premiums accurately reflect the evolving weather-related risks.

Difficulty: Easy

Computer Vision related Projects where you process Image or Video Data and use ML for classification or prediction

Md Azher Uddin

Dubai

Details

Computer Vision

Difficulty: Variable

Depression Analysis from audio or video data

Md Azher Uddin

Dubai

Details

Computer Vision

Difficulty: Variable

Dynamic Facial Emotion Recognition from video data

Md Azher Uddin

Dubai

Details

Computer Vision

Difficulty: Variable

Dynamic Scene Understanding from video data

Md Azher Uddin

Dubai

Details

Computer Vision

Difficulty: Variable

Indian Hand Sign Language Recognition from video data

Md Azher Uddin

Dubai

Details

Computer Vision

Difficulty: Variable

Healthcare Research Project

Md Azher Uddin

Dubai

Details

Healthcare analysis

Difficulty: Variable

Video Annotation

Md Azher Uddin

Dubai

Details

Computer Vision

Difficulty: Variable

Automated Paddy Disease Classification Using deep and handcrafted Feature Fusion

Md Azher Uddin

Dubai

Details

Difficulty: Moderate

Handwriting-Based Gender Classification Using deep and handcrafted feature fusion

Md Azher Uddin

Dubai

Details

Difficulty: Moderate

Flood prediction system using machine learning approach

Abrar Ullah

Dubai

Details

Designing a flood prediction system using machine learning approach

Difficulty: Easy

Hybrid Authentication for eCommerce Applications

Abrar Ullah

Dubai

Details

Research and develop a hybrid authentication approach for security of eCommerce application.

Difficulty: Easy

Service Oriented Architecture for Smart homes

Abrar Ullah

Dubai

Details

Service Oriented Architecture for Smart homes

Difficulty: Easy

Developing and Fine Tuning LLM

Abrar Ullah

Dubai

Details

Developing a Q & A system on Biomedical data using Large Language Model (LLM). An existing pre-trained model can used and fine tunned on new data.

Difficulty: Challenging

Emotion-Aware Interfaces for Student Wellbeing Monitoring in Online Classes

Abrar Ullah

Dubai

Details

Emotion-Aware Interfaces for Student Wellbeing Monitoring in Online Classes

Difficulty: High

Automating Software Testing Using AI and Machine Learning Techniques

Abrar Ullah

Dubai

Details

Automating Software Testing Using AI and Machine Learning Techniques

Difficulty: Variable

Optimizing Cloud-Native Applications for Scalability and Performance

Abrar Ullah

Dubai

Details

Optimizing Cloud-Native Applications for Scalability and Performance

Difficulty: Easy

Improving Legacy Systems through transformation from Monolithic to Microservice Architecture

Abrar Ullah

Dubai

Details

Improving Legacy Systems through transformation from Monolithic to Microservice Architecture

Difficulty: Easy

Meta Model for Enterprise Security

Abrar Ullah

Dubai

Details

Meta Model for Enterprise Security

Difficulty: Easy

Develop AI based invigilation for online exams

Abrar Ullah

Dubai

Details

Develop AI based invigilation for online exams

Difficulty: Easy

Multi-modal Data Integration in the Characterisation of Protein Aggregation in Amyotrophic Lateral Sclerosis (ALS) using Deep Learning - Collaboration with University of Aberdeen

machine learning deep learning medical imagen

Marta Vallejo

Edinburgh

Details

Amyotrophic lateral sclerosis (ALS) is a rapidly debilitating neurodegenerative disease that affects motor neurons. Patients develop progressive muscle weakness, leading to death due to respiratory failure, which typically occurs after 3–5 years of symptom onset. ALS affects 1.75 – 3 out of 100,000 individuals per year. The existence of protein aggregates (TDP-43) in affected motor neurons is still a poorly understood hallmark. This project aims to increase understanding of these structures by utilising a real clinical Immunohistochemistry dataset collected by the University of Aberdeen and applying various machine learning techniques to enhance the understanding of TDP-43 aggregates at an individual level. A previous student project published (2023-24): https://www.nature.com/articles/s41598-025-90881-9 In 2024-25, we were interested in applying multiple explainability tools to assess the generated models. One of my 2023-24 MSc students is currently working as a research assistant in an EPSRC project, where outcomes from this year's student projects will be used for a future publication. This year, a new tabular dataset associated with the image dataset was given, and new student ideas are possible. Join the team!

Difficulty: Variable

Investigate Disease Severity - Improving Class Imbalance in Parkinson’s Sensor Data via Conditional LSTM-VAE and Latent Space Analysis - Collaboration with York University

machine learning deep learning sensor data

Marta Vallejo

Edinburgh

Details

Parkinson’s Disease (PD) is a neurodegenerative disease of high incidence in the ageing population. This project aims at the application of deep learning technologies to a clinical dataset that contains information on patients with prodromal or early-stage PD. By analysing and processing digitalised movement data captured by three standard clinical assessments, the classifier will be expected to characterise bradykinesia, a slowing of movement, which is the fundamental motor feature of PD. The complex nature of bradykinesia makes it difficult to reliably identify it, particularly at the early stages of the disease (Ahlrichs and Lawo, 2013). The types of clinical assessments used in this study are the following: * Finger tapping * Hand pronation-supination * Hand opening-closing * Hand movements measured by accelerometers The given dataset is significantly imbalanced between PD patients and healthy controls. A previous student project, which used a traditional LSTM, could not converge under these circumstances, assigning all the labels to the majority class. The applied loss-weighted sum, SMOTE and resampling were applied independently, concluding that only resampling was effective. Even with all the effort, the final performance of the model was rather poor. The next step of the project will consist of: 1.- Investigate the combination of the previous techniques to improve the model's performance. 2.- Using an LSTM-VAE to generate synthetic sequences for the minority class to extend the imbalanced dataset. 3.- Combine with class-conditional generation (e.g., Conditional VAE). 4.- Use t-SNE and/or UMAP to visualise how the synthetic sequences relate to real data in latent or feature space. 5.- Test the effect of augmentation by comparing models trained on real vs. augmented data. 6.- (Optional) Pretrain the LSTM component (e.g., as an autoencoder or sequence predictor) before integrating it into the LSTM-VAE, to improve training stability and representation learning, especially in low-data regimes. This step of the project will open the door to investigating other techniques, such as GANs for sequences (e.g., SeqGAN, TimeGAN), that are more expressive but harder to train.

Difficulty: High

Health Data Visualisation and Monitoring with Extended Reality - Other ideas also possible

extended reality data visualisation wearables hololens 2 healthcare

Marta Vallejo

Edinburgh

Details

Description: Develop an application that visualises health data from wearables (to be decided), such as heart rate and blood pressure, in real-time through Microsoft Hololens 2. This system could help users keep track of their health status and make informed decisions or be used for carers to monitor home patients (elderly or other frailty groups). We can accommodate other ideas. This project will also be supported by Dr Alistair McConnell (alistair.mconnell@hw.ac.uk) and Dr Babis Koniaris (b.koniaris@hw.ac.uk).

Difficulty: Variable

The Pro-Act Dataset Exploring Machine Learning Opportunities in ALS Research

machine learning deep learning healthcare

Marta Vallejo

Edinburgh

Details

The Pro-Act (Pooled Resource Open-Access ALS Clinical Trials Database) stores a wealth of data crucial for understanding Amyotrophic Lateral Sclerosis (ALS). This project aims to identify promising research questions and design proof-of-concept machine learning models utilising the Pro-Act dataset. The study of these research questions could uncover key insights into ALS progression, prognosis, and treatment response, culminating in the development of models and showcasing the potential of machine learning and the Pro-Act dataset in advancing ALS research. Objectives: 1.- Conduct exploratory data analysis to understand the structure and characteristics of the Pro-Act dataset. 2.- Identify research questions relevant to ALS prognosis, disease progression, and treatment response. 3.- Design and propose machine learning models to address the identified research questions. 4.- Develop a proof-of-concept machine learning model using a subset of the Pro-Act dataset.

Difficulty: High

Exploring Attention and Cognitive Responses During Walking Using Pupil Labs "Pupil Invisible" Glasses

machine learning data analysis computer vision eye-tracking technology pupil labs glasses visual attention.

Marta Vallejo

Edinburgh

Details

Attention and cognitive responses are crucial aspects of human behaviour, especially during tasks requiring simultaneous motor and cognitive functions, such as walking. The Pupil Labs "Pupil Invisible" glasses offer a unique opportunity to capture first-person video and eye-tracking data, providing detailed insights into these processes. The primary objective is to gather insights into how individuals of various age groups navigate and respond to their environment. Secondary objectives include comparing attention levels across different age ranges, identifying environmental factors influencing cognitive responses, and developing a comprehensive dataset for future research on attention and mobility. The expected outcomes of this project include a detailed analysis of how different environments affect attention and cognitive responses during walking, insights into age-related differences in attentional focus and distraction, and a valuable dataset for future research on mobility and cognitive health.

Difficulty: Variable

Exploring Machine Learning Models to Uncover Pathways in ALS Pathogenesis Using Immunohistochemical Features

machine learning medical data

Marta Vallejo

Edinburgh

Details

This project invites students with a machine learning background to investigate the complexities of ALS pathogenesis by applying diverse machine learning models to an existing dataset. The study involves patients with a specific ALS-related mutation (C9orf72 HRE), whose data includes thousands of post-mortem tissue images with quantified immunohistochemical markers for microglial activation and protein misfolding. Using features extracted from a previous study, you will assess model performance and predictive accuracy using methods beyond the random forest approach originally applied. They will experiment with advanced algorithms such as support vector machines, gradient boosting, and neural networks to identify relationships within the dataset and to investigate which features or feature combinations best classify disease status and predict clinical outcomes. By implementing and comparing different machine learning models, students will gain insight into feature importance and model interpretability in biomedical data, with a focus on neurodegenerative disease applications. This project offers a hands-on opportunity to contribute to the understanding of ALS clinical heterogeneity and to test innovative model approaches, with the potential to inform future trial designs and therapeutic strategies for ALS.

Difficulty: High

Enhancing Machine Learning Models to Analyse Immune Profiles in ALS Tissue (in collaboration with Aberdeen University)

machine learning deep learning healthcare

Marta Vallejo

Edinburgh

Details

Amyotrophic Lateral Sclerosis (ALS) is a progressive neurodegenerative disease. Recent work has revealed that ALS patients can be grouped based on immune activity levels into two profiles: * NPS1: High adaptive immune response * NPS2: Low adaptive immune response This project builds on an ongoing collaboration with the Gregory Lab at the University of Aberdeen (https://gregorylaboratory.com/), which has used immunohistochemistry (IHC) on ALS brain and spinal cord tissues to explore these immune profiles. For generating the dataset, two key markers were used (POM121 (PA5-36498) and a novel in-house sCTLA-4 antibody (JMW-3B3), developed at Aberdeen). Image analysis was performed using QuPath, with features extracted via superpixel and nuclear segmentation. A simple linear model has already been implemented. The next step is to extend this model using more advanced techniques. If successful, this project could identify interpretable imaging biomarkers that distinguish immune profiles in ALS tissue, contributing to personalised disease classification and supporting future clinical decision-making or therapeutic targeting. Aims - Improve the baseline linear model using more robust and interpretable methods - Investigate which features (e.g. nuclear shape, intensity, spatial patterns) best separate NPS1 and NPS2 - Explore cross-region patterns (e.g. spinal cord vs. motor cortex) - Optionally: test models for sCTLA-4 prediction or unsupervised clustering Tools & Skills - Python - Scikit-learn / XGBoost / PyTorch (for modelling) - QuPath feature understanding (dataset already extracted) - Experience with pandas, NumPy, and plotting libraries is a plus Support & Collaboration - Direct contact with researchers from the University of Aberdeen and the Gregory Lab - Support from supervisors involved in the clinical publication effort - Opportunity to co-author a peer-reviewed journal paper if results are strong Who is this for? - Students interested in machine learning in healthcare or medical imaging - Those keen to work on a real dataset with clear clinical relevance and publication potential - Ideal for Honours or MSc-level individual projects

Difficulty: Variable

AI-Powered Registration of Cellular ALS Images using Pretrained and Contrastive Learning Models

cellular imaging image segmentation semi-supervised learning deep learning contrastive learning transfer learning cell registration biomedical image analysis

Marta Vallejo

Edinburgh

Details

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease characterised by the progressive loss of motor neurons, leading to muscle weakness, paralysis, and ultimately respiratory failure. Most patients die within two to four years of symptom onset, though the clinical course varies widely. In addition to motor impairment, cognitive deficits, often associated with TDP-43 proteinopathy, are increasingly recognised as a core feature of the disease. Despite its severity, early diagnosis remains difficult due to clinical heterogeneity and the absence of definitive biomarkers. This project will utilise a new dataset of high-resolution cellular images provided by the Spanish National Research Council (CSIC). These images display multiple cells in grey, with nuclei stained in blue using DAPI. Currently, cell and nucleus contours are manually delineated, a labour-intensive and time-consuming process. The project aims to automate this annotation pipeline using semi-supervised learning and transfer learning techniques, thereby reducing reliance on extensive manual labelling. Specifically, the student will explore consistency regularisation methods (e.g. Mean Teacher, FixMatch), self-training strategies, and contrastive pretraining approaches (e.g. SimCLR, MoCo) on unlabelled data to enhance segmentation performance. Pre-trained encoders from biomedical models, such as BioViT, Cellpose, or UNet++ with ImageNet or Cellpose weights, will be evaluated for model initialisation. The ultimate goal is to accelerate cell and nucleus segmentation while improving the reliability and scalability of image-based analysis pipelines for ALS pathology research.

Difficulty: Variable

Modelling Disease Progression in ALS Mouse Models Using Multimodal Data - Collaboration with the University of Zaragoza (Spain)

machine learning deep learning healthcare

Marta Vallejo

Edinburgh

Details

Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that currently lacks effective predictive tools for stratifying patients or understanding early markers of disease severity. In this project, you will focus on applying machine learning and deep learning techniques to data obtained from transgenic SOD1G93A mice, a widely used preclinical model of ALS. The goal is to classify animals into fast vs slow disease progression categories based on multimodal data collected longitudinally. This classification will support the development of tools for early prognosis and guide translational work in human ALS patients.

Difficulty: Variable

Learning to play games from experts

Patricia Vargas

Edinburgh

Details

In this project, we will assess the ability of classification models to create an intelligent agent for video-game playing by learning from expert data. The idea consists of first capturing a sample of play sessions from expert players to create a training data set. Next, we will apply different Machine Learning models and a Symbolic Classification model to create an intelligent agent that mimics the actions of the expert player and evaluate the extrapolation abilities for later stages. We will also evaluate different approaches that help to improve the extrapolation abilities of the model and assess the performance of the agent by how far they can play the game. Additionally, we will debug the symbolic models to understand the agent behaviour and improve their performance.

Difficulty: Challenging

Federated Learning for Social Robots performing Activity Recognition in Ambient Assisted Living environments.

python machine learning basics at least one machine learning framework

Patricia Vargas

Edinburgh

Details

In the context of assistive technologies for the elderly or people with disabilities, intelligent environments have been designed to empower this public with more autonomy in daily activities. To this aim, sensors and actuators embedded in the environment or in social robots might be orchestrated to produce helpful behaviours. In any case, deploying this type of technology requires that the system identifies the state of the environment and the users. This understanding can be achieved through activity recognition methods, many of which are presented in the literature with good results in several applications. However, these methods usually require data from several users to be concentrated in a centralised computational base to train machine learning models. This requirement, especially considering modalities such as videos or audio, raises ethical and legal concerns regarding data privacy. In this work, we propose to train the models locally for each participant using a Federated Learning approach to induce models based on public datasets such as the HWU-USP dataset. This approach preserves privacy because it does not require that the data is transferred out of the user’s environment, only partially trained models. Metrics such as time elapsed and accuracy will be evaluated.

Difficulty: Challenging

Neurorobotics Models to Understand the Neural Mechanisms Underlying Parkinson’s Disease.

Patricia Vargas

Edinburgh

Details

This project is part of a wider project and aims to contribute to the development and understanding of a Neurorobotics model of Parkinson’s Disease (PD), the Neuro4PD project (http://www.macs.hw.ac.uk/neuro4pd/). You will work with a neurorobotics model of PD currently being developed by our team with the goal of further understanding its computational properties. In particular, your work will investigate the computational capabilities of the BG-T-C loop in generating diverse control signals that can be exploited in robotics tasks. Then, you will compare neural computations in the BG-T-C loop with and without a robotic body. Overall, your work will be a further step in understanding the neural mechanisms underlying PD.

Difficulty: Challenging

Neurorobotics Models Uncovering Neural Sensory Processing and Muscle Control.

Patricia Vargas

Edinburgh

Details

This project is part of a wider project and aims to contribute to the development and understanding of a Neurorobotics model of Parkinson’s Disease (PD), the Neuro4PD project (http://www.macs.hw.ac.uk/neuro4pd/). In this project, you will work with a neurorobotics model currently being developed by our team with the goal of further understanding how complex motor control can be accomplished from realistic computational neural models embedded in simulated and real robots engaged in challenging tasks. In particular, you will explore current neuroscience findings with the goal of translating core neural mechanisms to practical robotic controllers. Overall, your work may also lead to novel AI architectures for embedded systems.

Difficulty: Challenging

Application of behaviour assessment

Patricia Vargas

Edinburgh

Details

Difficulty: Challenging

Trustworthy serverless machine learning on heterogeneous and distributed data and devices

machine learning deep learning federated learning ai computer vision nlp multi-modality iot

Chengjia Wang

Edinburgh

Details

Deep convoloutional networks have been widely deployed in modern cyber-physical systems performing different visual classification tasks. As the fog and edge devices have different computing capacity and perform different subtasks, models trained for one device may not be deployable on another. Knowledge distillation technique can effectively compress well trained convolutional neural networks into light-weight models suitable to different devices. However, due to privacy issue and transmission cost, manually annotated data for training the deep learning models are usually gradually collected and archived in different sites. Simply training a model on powerful cloud servers and compressing them for particular edge devices failed to use the distributed data stored at different sites. This offline training approach is also inefficient to deal with new data collected from the edge devices. To overcome these obstacles, in this project, a heterogeneous brain storming (HBS) method is implemented and developed for object recognition tasks in real-world Internet of Things (IoT) scenarios. This method enables flexible bidirectional federated learning of heterogeneous models trained on distributed datasets with a new “brain storming” mechanism and optimizable temperature parameters.

Difficulty: High

AI-aided drug discovery using graphical neural network: Retrosynthesis with simulated restriction

ai machine learning graphical neural network deep learning drug discovery nlp

Chengjia Wang

Edinburgh

Details

As a fundamental problem in chemistry, Retrosynthesis is the process of decomposing a target molecule into readily available starting materials. It aims at designing reaction pathways and intermediates for a target compound. The goal of artificial intelligence (AI)-aided retrosynthesis is to automate this process by learning from the previous chemical reactions to make new predictions. Although several models have demonstrated their potentials for automated retrosynthesis, there is still a significant need to further enhance the prediction accuracy to a more practical level. This project aims to review, implement and review existing retrosynthesis methods and their potential applications.

Difficulty: High

XAI in the Prediction of COVID-19 Clinical Outcome

machine learning neural network deep learning ai computer vision explainability

Chengjia Wang

Edinburgh

Details

SARS-CoV-2 pandemic has more than 1.6 million deaths worldwide by the end of 2020 and has overwhelmed health care resources in most countries. Medical imaging, especially chest CT and X-ray techniques have played critical roles in the diagnosis and treatment planning of COVID-19. In the past two years, a large number of deep learning methods have been proposed to: 1. assist the analysis and post-processing of chest imaging data; 2. predict the possible clinical outcomes and development of disease; 3. predict the spreading speed and pandemic status in human society; 4. etc.. This project will develop deep learning methods that can directly benefit the efficiency and accuracy of clinical analysis for COVID-19 using the available public challenge dataset (the STOIC2021 competition: https://stoic2021.grand-challenge.org/stoic2021/). Then focus on the analysis and assessment of explainability of currently mainstream models (ResNet variations: ConvNeXt, Transformer, gMLP, GNN, etc.) Purposes and milestones: Specifically, this project will develop DL models for to predict: 1. Predict COVID19 positivity. 2. Predict occurance of severe COVID-19 cases, defined as intubation or death within one month from the acquisition of the CT scan (metric: AUC). Milestones of this project: 1. A simple ConvNeXt model applied to STOIC2021 data and produce result (successfully submit to the competition) 2. review, collect, implement and compare the SOTA deep learning models on STOIC2021 models, you may use some extra data 3. review and implement different ways to assess the explainability of different models 4. Assess the explainability for different models

Difficulty: Easy

AIGC fashion design based on state-of-the-arts generative models and/or image-to-3D algorithm with GUI design

ai aigc stable diffusion machine learning generative model creative fashion design

Chengjia Wang

Edinburgh

Details

Tasks distributed to 4 students: 1. Dataset collection, diffusion model finetuning, model evaluation 2. Inject guidence information to 2D diffusion model and compare the guided model to the unguided ones, simple 2D GUI deisgn by modifying SD WebUI 3. Implement and explore algorithms for image-to-3D rendering, model evaluation, simple 3D GUI deisgn by modifying 3D WebUI 4. AIGC model comparison, workflow optimization, evaluation of diffusion models

Difficulty: High

Finetune large generative models using domain-specific data

ai aigc stable diffusion machine learning generative model creative fashion design

Chengjia Wang

Edinburgh

Details

Modifying the stable diffusion model with its webui for 2D creative design: injecting guidance information, collect simple datasets, evaluate the models.

Difficulty: High

Zero-shot or few-shot learning for single image to 3D object generation

ai aigc stable diffusion machine learning generative model creative design

Chengjia Wang

Edinburgh

Details

Review existing 2D-to-3D image generation and rendering methods, integrate it to existing stable-diffusion-webui

Difficulty: High

Finetuning of large vision generative model using domain specific data

ai aigc stable diffusion machine learning generative model creative design

Chengjia Wang

Edinburgh

Details

1. data collection (simple) 2. diffusion model finetuning 3. design the optimal workflow for specific applications 4. evaluate the model

Difficulty: High

Image generation using stable diffusuion and natural language prompt using CLIP

ai aigc stable diffusion machine learning generative model

Chengjia Wang

Edinburgh

Details

Difficulty: High

PCN: predictive coding network

ai machine learning network architecture vision nlp deep learning

Chengjia Wang

Edinburgh

Details

Predictive network is not as famous as the CNN, transformer, diffusion and Mamba models, yet have unbeatable advantages in modern computing system. The aim of this work is to implement PCN to solve one simple vision or NLP task (or processing other types of serial data), and further discover possible approaches to improve its performance and robustness. You need to have a prior knowledge about deep learning models, such as, CNN, MLP, transformer, attention, etc to conduct this research.

Difficulty: Challenging

Mamba in Vision

ai machine learning network architecture vision nlp deep learning mamba

Chengjia Wang

Edinburgh

Details

Mamba is a new deep learning model originally designed for processing series data, but soon gained its popular in vision tasks. In this work the student will review the newest published works and implement the extended mamba models to solve image recognition problems.

Difficulty: Challenging

Vision GNN

ai machine learning network architecture vision nlp deep learning graphical neural network

Chengjia Wang

Edinburgh

Details

Difficulty: High

Deep learning model: KAN

machine learning new deep learning models multi-agent system banking large language models fraud detection industrial collaboration

Chengjia Wang

Edinburgh

Details

KAN: Kolmogorov-Arnold Networks （https://arxiv.org/abs/2404.19756）

Difficulty: Easy

Blockchain and smart contract with chainlink

blockchain finance machine learning deep learning agentic

Chengjia Wang

Edinburgh

Details

This is a collaboration with an on-chain finance startup

Difficulty: High

Compare 8 agentic workflow tools for heterogeneous AI system

machine learning multi-agent system banking large language models workflow

Chengjia Wang

Edinburgh

Details

This workflow can be used for: 1. Healthcare applications, such as homecare, ICU system, or drug discovery 2. AI4Science 3. Daily office automation 4. can be discussed Tools: (Make, n8n, Dify, Coze, etc.)

Difficulty: Easy

Best Fashion Design workflow with open source AIGC

Chengjia Wang

Edinburgh

Details

Collect data, design workflow, finetuning stable diffusion (with Lora), and make it work with a clean GUI

Difficulty: Easy

Incremental and scalable machine learning systems

machine learning new deep learning models multi-agent system

Chengjia Wang

Edinburgh

Details

A swarm of small AI models combined to form a LEGO-like model which can beat a large AI model in complicated computer vision or NLP task. More robust and flexible than conventional deep networks.

Difficulty: High

Reinforcement learning or reverse reinforcement learning for legged robots in IssacGym

machine learning deep learning reinforcement learning genertive model

Chengjia Wang

Edinburgh

Details

Difficulty: Easy

Toward Reliable Drug-Target Interaction Predictions in Out-of-Distribution Data Scenarios

blockchain finance machine learning deep learning agentic graphic neural network

Chengjia Wang

Edinburgh

Details

Given the increasing complexity of drug-target interaction (DTI) predictions and the challenges posed by out-of-distribution (OOD) data, this project will address this issue.

Difficulty: Easy

On going Grandchallenge or Kaggle competition on AI

machine learning multi-agent system workflow hackthon

Chengjia Wang

Edinburgh

Details

We organise student teams every year to attend new AI and data science competitions on Grandchallenge, Kaggle, Codalab, and Synapse.

Difficulty: Easy

Apply Deepseek R1 Training Strategies to Small local models

large language models

Chengjia Wang

Edinburgh

Details

Difficulty: Moderate

Heterogeneous Multi-agent Intelligent system for fraud detection and early prediction

machine learning multi-agent system banking large language models fraud detection industrial collaboration

Chengjia Wang

Edinburgh

Details

This is a project targeting at top-tier conference or journal papers, in collaboration with an industrial partner (one of UK's top banking company).

Difficulty: High

An Eclipse front end for the Skalpel type error explainer

Joe Wells

Edinburgh

Details

Skalpel helps explain type errors in computer programs. The project would build an additional front-end (user interface) for Skalpel by making it usable from within Eclipse. Because support for SML in Eclipse is probably not the greatest, this project could reasonably include general work improving this support.

Difficulty: Variable

assemble and test the Isabelle/Isar proof language definition

Joe Wells

Edinburgh

Details

Isabelle/Isar is a widely used proof assistant and proving environment for formal verification. There is no single place where a complete and up-to-date definition of the Isabelle/Isar input language can be found. Some of the pieces are in research publications, some pieces are in PhD dissertations, some pieces are in software documentation, and some pieces are in the Isabelle/Isar source code. And only the source code is certain to be up-to-date. This project is about gathering the pieces, assembling them, and writing some tests to confirm that the definition that the project synthesizes is correct.

Difficulty: Variable

automatically gather samples of certain mathematical notations

Joe Wells

Edinburgh

Details

The supervisor of this project is trying to develop general theories of mathematical texts. As part of this, it is necessary to see what computer scientists, logician, mathematicians, etc., actually write. Search engines are great for finding documents by the words or phrases they use. However, they are not much use for searching for instances of BNF-like notation (M, N ::= x | lam x. M | M N), or set comprehensions ({ x | exists y. (x,y) in S }), or ellipses (x = (y1, ..., yn)), or other mathematical notations. This project is about developing programs to process documents to gather samples of the various forms of these and similar notations.

Difficulty: Variable

Generic project contributing to the Skalpel type error explainer

Joe Wells

Edinburgh

Details

Skalpel helps explain type errors in computer programs. Although there are a number of specific projects listed for Skalpel, there are lots and lots of other possible projects, far too many to write a project proposal for each one. Just ask.

Difficulty: Variable

implement constraint solving for type inference for System Fs

Joe Wells

Edinburgh

Details

System F is a type system that is embedded as part of the essential core in type systems used by many programming language and proof systems. The key idea of System F is the forall-quantified type, e.g., the type (forall x. x to x) stands for the collection of all types of the shape (Z to Z) for any Z. System Fs extends System F with _expansion variables_ to enable a particular way of using constraint solving for finding types for programs and proof skeletons with incomplete type information. This project is about implementing the key features of System Fs and exploring possible constraint solving algorithms.

Difficulty: Variable

investigate a lambda-calculus-like machine/assembly language

Joe Wells

Edinburgh

Details

The lambda-calculus is the standard theory for reasoning about computer programs. Machine language is what available CPUs actually run. This project involves investigating a machine-language-like formalism with the equational reasoning power of the lambda calculus. Useful tasks that might be part of the project include implementing the language and testing or verifying its properties.

Difficulty: Variable

Making the Skalpel type error explainer more robust

Joe Wells

Edinburgh

Details

Skalpel helps explain type errors in computer programs. The project would improve testing, find bugs, and improve robustness. Much earlier work has been on theoretical challenges, with less time spent on niceties like, for example, good error messages and test coverage. The project might also finish moving the web site to GitHub.

Difficulty: Variable

parsing/semantics for mixed English/symbolic mathematics

Joe Wells

Edinburgh

Details

Parse and/or give formal semantics to mathematical uses of English combined with symbolic formulas. The starting point is to read the Language of Mathematics by Mohan Ganesalingam and then scan the research in the decade since this book was published. Then some part of the problem must be determined as the goal. Then implement and test and evaluate.

Difficulty: Variable

Reimplementing the user interface of a type error explainer

Joe Wells

Edinburgh

Details

Skalpel helps explain type errors in computer programs. Programmers of the front-end (user interface) have included me, 2 PhD students, and 5 project students, with the result being code that is fragile and hard to modify. A good project would be to rewrite it with proper care for data structure sanity, error checking, error messages, testing, etc.

Difficulty: Variable

Toward type error explanations for the Hume language

Joe Wells

Edinburgh

Details

Hume (http://hume-lang.org/) is a language using ideas from both functional programming and finite automata together with strong types to obtain guarantees on time and space usage for safety-critical systems. The project would begin the process of extending the type error explainer Skalpel so it can find the portion of a Hume program responsible for a type error. Most likely only part of Hume will be handled. This would also begin the process of extending Skalpel to analyze multiple languages.

Difficulty: Variable

Visualizing type errors with graphical type/data-flow diagrams

Joe Wells

Edinburgh

Details

Skalpel helps explain type errors in computer programs. The project would extend Skalpel's back-end to generate graphical type/data-flow diagrams that will show how the program parts causing a type error are connected. Tom Methven (RA for Mike Chantler) can help a bit. Tom and Mike recommend the D3JS library (http://d3js.org/).

Difficulty: Variable

open source implementation of PDF reader extensions for data capture

programming open source graphical user interfaces document standards forms data capture

Joe Wells

Edinburgh

Details

PDF is now much more than a system for arranging ink marks on paper. PDF now includes many new features from 3D visualizations that can be manipulated to dynamic adaptation to changes in media size and shape. One particularly important feature is fancy form filling with programmable checking of entered data. It is particularly important for there to be an open source implementation of these features, because they are often used for mandatory government reporting and it is not good for this kind of functionality to be controlled by a private company. This project aims to assess which parts of the standards in this area that Adobe has put forward are most important to implement as open source, and then to carry out and test and deliver some specific improvements to some specific pieces of open source software.

Difficulty: Variable

Refining Aspect-Based Sentiment Analysis Through Subjectivity in the Pipeline

sentiment analysis aspect-based neural network machine learning

Timothy Yap

Malaysia

Details

Aspect-Based Sentiment Analysis (ABSA) is a fine-grained approach to sentiment analysis that identifies specific aspects of a product, service, or entity within text and determines the sentiment expressed toward each one. Unlike traditional sentiment analysis, which focuses on overall sentiment, ABSA offers more detailed insights. Subjectivity, reflecting the extent to which text conveys personal opinions or emotions rather than objective facts, plays a key role in interpreting sentiment. This project aims to design and evaluate an enhanced ABSA pipeline that integrates subjectivity analysis to better process customer feedback in a specific domain. By incorporating subjectivity into the pipeline, the goal is to improve ABSA accuracy and provide actionable insights into customer sentiment on specific aspects, helping businesses respond more effectively to user concerns.

Difficulty: Moderate

Implementing Decentralized Identity Management on a Distributed Ledger

ethereum digital identity blockchain distribruted ledger

Timothy Yap

Malaysia

Details

Self-sovereign identity (SSI) empowers individuals to control and manage their digital identities without relying on centralized authorities. At the core of SSI are decentralized identifiers (DIDs) and verifiable credentials, which allow users to present trustworthy claims while maintaining privacy and autonomy. This project proposes the development of an SSI management pipeline using Veramo, a modular JavaScript framework for building decentralized identity applications. The system will enable users to create DIDs, issue and store verifiable credentials, and verify those credentials in a decentralized environment. The project aims to demonstrate a working prototype of an SSI solution tailored to a specific use case, such as credential sharing for students or event attendees.

Difficulty: High

Developing a Real-Time Ethereum Network Monitoring Dashboard Using Blockchain Metrics

ethereum analytics blockchain network metrics

Timothy Yap

Malaysia

Details

Blockchain networks like Ethereum generate vast amounts of real-time data related to blocks, transactions, gas usage, and peer connectivity. This project aims to develop a custom real-time monitoring dashboard for the Ethereum network that collects and visualizes key network and performance metrics such as block times, transaction throughput, peer count, gas usage, and pending transaction queue size. Data will be collected via Ethereum’s JSON-RPC API or WebSocket interface, optionally enhanced by running a local node (e.g., Geth or Besu). The system will include a data collection layer, a time-series storage solution, and a dynamic web-based frontend for visualizing the metrics in real time. The dashboard serves as a diagnostic and educational tool to better understand Ethereum network behavior.

Difficulty: Easy

Developing a Real-Time Bitcoin Network Monitoring Dashboard Using JSON-RPC and Node Metrics

bitcoin analytics blockchain network metrics

Timothy Yap

Malaysia

Details

The Bitcoin network operates as a decentralized peer-to-peer system in which thousands of nodes exchange blocks, transactions, and status messages in real time. Monitoring these network activities provides critical insight into node health, block propagation, mempool dynamics, and peer connectivity. This project aims to develop a real-time dashboard that visualizes key Bitcoin network metrics by interfacing directly with a locally hosted Bitcoin Core node. Data will be collected using Bitcoin's JSON-RPC interface and logs, optionally supported by tools like Wireshark or Prometheus exporters. The backend will periodically query live network statistics—such as block times, peer counts, mempool size, and version messages—and store them in a time-series format. A lightweight frontend will display these metrics in a visual dashboard. This project will serve as a diagnostic and educational tool for understanding Bitcoin’s decentralized architecture and real-world performance.

Difficulty: Easy

Smart Autonomous Delivery Vehicle

iot sensor lidar ultrasonic sensor camera obstacle detection

Chee Een Yap

Malaysia

Details

Autonomous delivery vehicles are increasingly used in urban logistics, but ensuring their safe operation in dynamic environments remains a significant challenge. IoT technology can enhance real-time data transmission and sensor fusion, allowing for quicker and more accurate environmental perception, as well as improved computer vision for obstacle recognition.

Difficulty: Moderate

Beehive Monitoring and Management System Using IoT

Chee Een Yap

Malaysia

Details

Stingless bees (meliponines) are vital pollinators and honey producers in tropical and subtropical regions. Stingless bee honey is recognized as a superfood due to its high antioxidant content and potential health benefits. However, beekeepers are facing threats from hive theft, driven by the high value of honey production. Other than theft, beekeepers also need to protect the beehives from animal threats, including bears, mice, which can damage hives and consume honey. This project is to propose an IoT-based beehive monitoring and management system to support sustainable and efficient stingless beekeeping.

Difficulty: Moderate

Agentic AI Cost Optimisation

ai agent

Yingfang Yuan

Edinburgh

Details

Please feel free to contact me if you need any further information. https://arxiv.org/abs/2104.08500

Difficulty: High

Agentic AI for Discovering Technical Insights

ai agent

Yingfang Yuan

Edinburgh

Details

Please feel free to contact me if you need any further information

Difficulty: High

The Architecture Search of Agentic AI

ai agent

Yingfang Yuan

Edinburgh

Details

https://arxiv.org/pdf/2502.07373 Please feel free to contact me if you need any further information

Difficulty: High

Agentic AI for Machine Learning Coding Generation

ai agent

Yingfang Yuan

Edinburgh

Details

Please feel free to contact me if you need any further information https://arxiv.org/pdf/2411.15692

Difficulty: High

Multi/Interdisciplinary AI agents for NetZero

ai agent

Yingfang Yuan

Edinburgh

Details

Please feel free to contact me if you need any further information

Difficulty: Moderate

Visualising Data of Lift Usage at the HW Dubai campus

Hind Zantout

Dubai

Details

Working with the relevant persons at HW facilities management, create a possibly real-time application of lift usage.

Difficulty: Moderate

Coursework Submission Deadline Visualiser System

Hind Zantout

Dubai

Details

This is a software engineering project where requirements need to be collected from both staff and students. It may be split into two separate projects. From the student perspective, it should help student meet the submission deadline. From the staff perspective, it should visualise the impact of the change in one deadline. Usability is also important.

Difficulty: Moderate

Finding schools in Dubai

Hind Zantout

Dubai

Details

KHDA has open data with details of schools in Dubai.This project can help parents decide which school to send their children too. The deliverable is a 'website' that parents can use. Computer systems students can focus of the development aspect of such a project, Computer Science student can include the topic of visualisation. Both cohorts can explore the inclusion of analytics.

Difficulty: Moderate

Generic Topic

Hind Zantout

Dubai

Details

This is a project for building a website or an app with functionality to be determined based on the context.

Difficulty: Easy

Isolation beating App

Hind Zantout

Dubai

Details

There is a wide offering of social media. This project will investigate the strength and weaknesses of each platform and look into which features are best suited to overcome isolation, then develop an app that incorporates the important features. A colleague from psychology can be consulted.

Difficulty: Moderate

Library Champion

Hind Zantout

Dubai

Details

The Library if full of valuable references which can help in the various courses. Starting with the course descriptor of a course, identify available reference, and whenever such a reference is consulted, it gets rated. Think: "Tripadvisor for books".

Difficulty: Moderate

Monitoring Online reading

Hind Zantout

Dubai

Details

In this project you will at random replace very short words such as in, the, of, at... with a number of blanks equivalent to number of letters in the word. As the user enters these correctly, proving that the user is reading the text, the number of blanked words can reduce. The text could be the honours project student handbook. There is scope to add additional features such as gamification or visualisation.

Difficulty: Moderate

Student Proposed Open Project

Hind Zantout

Dubai

Details

Student-proposed project

Difficulty: Moderate

Smart Anything

Hind Zantout

Dubai

Details

A range of applications such as - leveraging the availability of devices such as smart meters to reduce consumption of electricity in the home -leveraging the availability of open data to create smart communities and many more.

Difficulty: Moderate

Visualising/Analysing Health Data

Hind Zantout

Dubai

Details

This project can look into analysing health-related data. For visualisation it could include the development of an animation to track the progress of e.g. Covid in one country, or it could visualise a region and map one country within that region, or compare countries with similar size populations or similar climates or geographic locations.

Difficulty: Moderate

Visualising Speech

Hind Zantout

Dubai

Details

Research what keeps an audience captivated and feed back to a public speaker via an app.

Difficulty: Challenging

Tracking student progress

Hind Zantout

Dubai

Details

Difficulty: Moderate

A project around digital art

Hind Zantout

Dubai

Details

Here are two links to explore and https://journals.ub.uni-heidelberg.de/index.php/dah/article/view/21631 or perhaps https://towardsdatascience.com/tagged/digital-art The Psychology department have recently acquired a mobile eye tracker with a capability to be incorporated with a VR headset. They are looking to using it for some eye movement research related to digital/ immersive art (Dr. Pik Ki Ho https://www.hw.ac.uk/dubai/profiles/teaching/dr-pik-ki-ho.htm will be co-supervisor)

Difficulty: Moderate

Graph Database application

Hind Zantout

Dubai

Details

Difficulty: Easy

Student Attendance

Hind Zantout

Dubai

Details

Students in a class are given 'green' numbers in the range 2-163. At the end of the session, the lecturer notes the number on a sheet of paper which is then scanned and the numbers added to an Excel sheet (or any other suitable file) thus forming an attendance record. The student name and H number should also be considered.

Difficulty: Moderate

Detecting Bias in Generative AI

Hind Zantout

Dubai

Details

Harvard University have compiled an Implicit Attitude Test https://implicit.harvard.edu/implicit/takeatest.html that can be used to measure bias. Working with Dr. Hajar Yekani in SoSS, this project will investigate bias in generative AI systems. It could be using text or images. Interested students should first read up on the test and the different types of tests that are available. Then a suitable framework using genrative AI can be suggested to measure bias in such systems.

Difficulty: Moderate

Visualising Dissertation Topics

Hind Zantout

Dubai

Details

Looking at the abstracts of published dissertations, provide an analysis of the choice of topics either over 3 year time frame or comparing Dubai with Edinburgh topics.

Difficulty: Easy

Image Classification Distinguishing Real vs. Fake Images

machine learning ai

Claudio Zito

Dubai

Details

Difficulty: Moderate

Reinforcement Learning for Autonomous Agents

Claudio Zito

Dubai

Details

We will use the OpenAi gym as platform for training and evaluation

Difficulty: Easy

ML approach to biomarkers detection and discovery

Claudio Zito

Dubai

Details

Exploratory Data Analysis on published datasets containing gene expressions for blood tumor patients. Development of ML models to identify possible biomarkers for earlier detection of the disease.

Difficulty: Moderate

Robot Grasping

robotics robot grasping machine learning

Claudio Zito

Dubai

Details

This dissertation proposes the development of advanced contact-based algorithms tailored for improving the performance and reliability of robot grasping tasks. As robots increasingly integrate into diverse environments, the ability to accurately and efficiently grasp various objects becomes paramount. Current grasping methodologies often rely on visual or geometric information, which can be limiting in dynamic or unstructured settings. By focusing on the contact dynamics between the robotic gripper and objects, this research aims to enhance grasp robustness and adaptability, ensuring more effective operation in real-world applications. The project will be implemented and evaluated using the Robot Operating System 2 (ROS2) and can be conducted in simulation or on the Kinova Gen 3 Lite robot equipped with a 2-DOF gripper available in the lab. Introduction: The ability of robots to grasp objects is a critical aspect of autonomous functionality in fields such as manufacturing, healthcare, logistics, and service industries. Traditional grasping techniques, which rely heavily on visual perception or geometric shape analysis, may struggle to account for the complexities of real-world scenarios, such as object variability, surface friction, and unexpected environmental changes. This research seeks to address these challenges by developing contact-based algorithms that respond to real-time feedback from sensor data during the grasping process. Research Objectives: Algorithm Development: Design and implement contact-based algorithms using ROS2, leveraging sensory information (e.g., force, pressure, and tactile feedback) to enhance the adaptability and reliability of robot grasping. Modeling Contact Dynamics: Develop mathematical models to accurately simulate contact interactions between the 2-DOF gripper of the Kinova Gen 3 Lite and a variety of objects, considering factors like shape, material properties, and environmental conditions. Experimental Validation: Conduct extensive experimental studies that evaluate the performance of the developed algorithms, either in simulation or on the physical Kinova Gen 3 Lite robot, across various grasping scenarios, including handling rigid, soft, and deformable objects. Integration with Learning Techniques: Explore the integration of machine learning methods to optimize grasping strategies based on past experiences and predictive models. Methodology: The proposed research will involve a multi-disciplinary approach, which includes: Literature Review: Conducting a thorough review of existing grasping techniques and contact modeling approaches. Algorithm Design: Creating algorithms that utilize real-time sensor data to compute grasp strategies dynamically, specifically focusing on the capabilities of the ROS2 framework. Simulation and Testing: Utilizing robotic simulators to evaluate algorithm performance before real-world application. This phase will allow for parameter tuning and validation of the models in a simulated environment, followed by evaluation on the Kinova Gen 3 Lite to ensure practical applicability. Prototype Development: Implementing the algorithms on the Kinova Gen 3 Lite equipped with the 2-DOF gripper to perform grasping tasks and gather empirical data. Expected Contributions: Through the successful completion of this dissertation, it is anticipated that the project will contribute to: A set of innovative contact-based algorithms implemented in ROS2 that significantly improve robot grasping capabilities. New theoretical insights into contact dynamics between the Kinova Gen 3 Lite's gripper and various object types. Practical guidelines for implementing adaptable grasping strategies in real-world robotic systems. An extensive dataset of grasping performance across multiple conditions, which can serve as a reference for future research in the field. Conclusion: This dissertation will address a crucial gap in the existing robotic grasping technologies by focusing on contact-based methodologies implemented within the ROS2 framework. The outcomes are expected to enhance the functional capabilities of robots in diverse applications, paving the way for more intelligent and reliable robotic systems capable of operating in complex and unstructured environments. Furthermore, by integrating real-time feedback into the grasping process, this research will contribute to the broader field of robotics and automation, ultimately enhancing the interaction between robots and their environments. The evaluation, whether conducted in simulation or on the Kinova Gen 3 Lite robot, will ensure that findings are practically applicable and beneficial for advancing robot grasping technologies.

Difficulty: High