This is a piece of text-to-speech I have done using XTTS_v2: you can find it here. The sentence is “It brings Vedanta and other spiritual philosophies to common man.”.
My configuration for the XTTS was:
def process_sentence(self, sentence, idx):
print(f"Processing index {idx}...", flush=True)
audio_file = f"temp_{idx}.wav"
self.model.tts_to_file(text=sentence,
file_path=audio_file,
language=self.language,
speed=0.9,
speaker="Ana Florence",
emotion="Happy")
Can anybody help me out with the crack in sound when the speaker says “Vedanta”? How can I fix it? I am a noob to TTS and I was hoping to get some help regarding this.