View Proposal
-
Proposer
-
Zi Hau Chin
-
Title
-
Enhancing Voice Synthesis Control in a Transformer-Based Speech Synthesis System through Probabilistic Latent Space Manipulation
-
Goal
-
To advance control and realism in voice synthesising.
-
Description
- This project explores an innovative method to enhance voice synthesis in a transformer-based speech synthesis system. Instead of discrete tokens, the speech synthesis component will receive probability distributions from the language model's latent space. This aims to provide finer-grained control over speaker characteristics. The core work involves:
1. Modifying the system to generate and accept these probabilistic inputs.
2. Synthesizing speech using both the original and new methods.
3. Comparing speaker identity preservation and voice quality quantitatively using speaker recognition models (e.g., EER, speaker embedding similarity) and potentially human evaluation.
- Resources
-
Dedicated server for model training, evaluation, and application hosting.
-
Background
-
Python, NLP, TTS
-
Url
-
-
Difficulty Level
-
Challenging
-
Ethical Approval
-
None
-
Number Of Students
-
1
-
Supervisor
-
Zi Hau Chin
-
Keywords
-
ai, voice synthesizer, speaker recognition
-
Degrees
-
Bachelor of Science in Computing Science