View Proposal


Proposer
Zi Hau Chin
Title
Enhancing Voice Synthesis Control in a Transformer-Based Speech Synthesis System through Probabilistic Latent Space Manipulation
Goal
To advance control and realism in voice synthesising.
Description
This project explores an innovative method to enhance voice synthesis in a transformer-based speech synthesis system. Instead of discrete tokens, the speech synthesis component will receive probability distributions from the language model's latent space. This aims to provide finer-grained control over speaker characteristics. The core work involves: 1. Modifying the system to generate and accept these probabilistic inputs. 2. Synthesizing speech using both the original and new methods. 3. Comparing speaker identity preservation and voice quality quantitatively using speaker recognition models (e.g., EER, speaker embedding similarity) and potentially human evaluation.
Resources
Dedicated server for model training, evaluation, and application hosting.
Background
Python, NLP, TTS
Url
Difficulty Level
Challenging
Ethical Approval
None
Number Of Students
1
Supervisor
Zi Hau Chin
Keywords
ai, voice synthesizer, speaker recognition
Degrees
Bachelor of Science in Computing Science