View Proposal
-
Proposer
-
John See
-
Title
-
Drone-based Navigation using Vision-Language Models (VLMs)
-
Goal
-
Development and deployment of a prototype system/software for drone-based navigation using vision-language models
-
Description
- As we enter a new era of autonomous navigation for drones (also known as unmanned aerial vehicles or UAVs), the challenge lies in providing intelligent interfacing components and a deeper knowledge of the environment in which the drone operates. It is essential to equip the drone with visual perception and understanding of the surroundings, and to locate suitable paths to navigate and accomplish mission goals.
This project aims to develop a prototype system/software for drone-based navigation using vision-language models (VLM). By leveraging VLMs, it would be possible to incorporate more layers of multimodal information, including map information and instruction sets, to enhance navigation in drone missions. Steps to achieve this include matching text phrases to visual objects, detecting scene elements, converting instructions into geometric goals, using a planner to determine safe waypoints and actions, and finally, executing and deploying it on a drone.
- Resources
-
DJI Tello drone, Tello SDK, Open source VLMs, GPU compute (access to MACS Malaysia server OR supervisor's GPU workstation)
-
Background
-
Strong competency in algorithms and programming, especially Python; Some familiarity with foundational models (LLMs/VLMs) would be an added advantage
-
Url
-
-
Difficulty Level
-
High
-
Ethical Approval
-
None
-
Number Of Students
-
1
-
Supervisor
-
John See
-
Keywords
-
drone navigation, scene understanding, vision-language models
-
Degrees
-
Bachelor of Science in Computing Science