View Proposal


Proposer
John See
Title
Drone-based Navigation using Vision-Language Models (VLMs)
Goal
Development and deployment of a prototype system/software for drone-based navigation using vision-language models
Description
As we enter a new era of autonomous navigation for drones (also known as unmanned aerial vehicles or UAVs), the challenge lies in providing intelligent interfacing components and a deeper knowledge of the environment in which the drone operates. It is essential to equip the drone with visual perception and understanding of the surroundings, and to locate suitable paths to navigate and accomplish mission goals. This project aims to develop a prototype system/software for drone-based navigation using vision-language models (VLM). By leveraging VLMs, it would be possible to incorporate more layers of multimodal information, including map information and instruction sets, to enhance navigation in drone missions. Steps to achieve this include matching text phrases to visual objects, detecting scene elements, converting instructions into geometric goals, using a planner to determine safe waypoints and actions, and finally, executing and deploying it on a drone.
Resources
DJI Tello drone, Tello SDK, Open source VLMs, GPU compute (access to MACS Malaysia server OR supervisor's GPU workstation)
Background
Strong competency in algorithms and programming, especially Python; Some familiarity with foundational models (LLMs/VLMs) would be an added advantage
Url
Difficulty Level
High
Ethical Approval
None
Number Of Students
1
Supervisor
John See
Keywords
drone navigation, scene understanding, vision-language models
Degrees
Bachelor of Science in Computing Science