Understanding usage of HiFi-GAN by Vits I’m (trying to) learn AI/ML for speech synthesis and trying to undestand how HiFi-GAN is used by Vits.