I am using Azure Text to Speech, part of the Cognitive Services.
I compose my request as SSML, and then call the function SpeakSsmlAsync.
If I choose the output format Audio24Khz160KBitRateMonoMp3, the function returns almost immediately with the speech data. But if I choose the output format Riff24Khz16BitMonoPcm, the functions plays the speech back through my speakers before returning with the speech data.
Is there a way to call Riff24Khz16BitMonoPcm silently, so that the speech data is returned but without hearing it first?