Details - MACS Project System

Proposer: Zi Hau Chin
Title: Enhancing Voice Synthesis Control in a Transformer-Based Speech Synthesis System through Probabilistic Latent Space Manipulation
Goal: To advance control and realism in voice synthesising.
Description: This project explores an innovative method to enhance voice synthesis in a transformer-based speech synthesis system. Instead of discrete tokens, the speech synthesis component will receive probability distributions from the language model's latent space. This aims to provide finer-grained control over speaker characteristics. The core work involves: 1. Modifying the system to generate and accept these probabilistic inputs. 2. Synthesizing speech using both the original and new methods. 3. Comparing speaker identity preservation and voice quality quantitatively using speaker recognition models (e.g., EER, speaker embedding similarity) and potentially human evaluation.
Resources: Dedicated server for model training, evaluation, and application hosting.
Background: Python, NLP, TTS
Url
Difficulty Level: Challenging
Ethical Approval: None
Number Of Students: 1
Supervisor: Zi Hau Chin
Keywords: ai, voice synthesizer, speaker recognition
Degrees: Bachelor of Science in Computing Science