I’m developing a Discord bot in TypeScript that uses the Google Cloud Speech API to transcribe speech to text in real-time. I use the @discordjs/voice
libraries to handle voice connections in Discord and @google-cloud/speech
for voice transcription. I aim to capture user audio in a voice channel and send it to the Google Cloud API for transcription.
Although the bot can detect when a user starts speaking (using speaking
events), there seems to be an issue with how the audio is captured and transmitted to the Google Cloud Speech API. I get a timeout error from the API indicating that the audio is not received on time or as expected. The exact error is:
ApiError: Audio Timeout Error: Long duration elapsed without audio. Audio should be sent close to real-time.
Here is the main segment of the code that handles the voice connection and sending audio:
const connection = joinVoiceChannel({
channelId: voiceChannel.id,
guildId: voiceChannel.guild.id,
adapterCreator: voiceChannel.guild.voiceAdapterCreator,
selfDeaf: false,
selfMute: false,
});
const audioPlayer = createAudioPlayer({
behaviors: {
noSubscriber: NoSubscriberBehavior.Pause,
},
});
connection.subscribe(audioPlayer);
connection.receiver.speaking.on('start', (userId) => {
const audioStream = connection.receiver.subscribe(userId, { mode: 'pcm' });
const request = {
config: {
encoding: 'LINEAR16',
sampleRateHertz: 48000,
languageCode: 'es-ES',
},
interimResults: false,
};
const recognizeStream = client.streamingRecognize(request)
.on('data', (data) => {
const transcription = data.results[0].alternatives[0].transcript;
console.log(`Transcripción: ${transcription}`);
});
audioStream.on('data', (chunk) => {
recognizeStream.write(chunk);
});
audioStream.on('close', () => {
recognizeStream.end();
});
});
- I have verified that the Google Cloud Speech API credentials are correct and that the project has all the necessary APIs enabled.
- I have tested with different
encoding
andsampleRateHertz
settings. - I have ensured that the
speaking
events are working as expected and that the audio streams are created and closed properly.
How can I ensure that audio captured from Discord is effectively streamed in real time to the Google Cloud Speech API to avoid the timeout error? Are there any settings or best practices in stream handling or API configuration that I may be overlooking?
Any help or suggestions would be greatly appreciated. Thanks in advance.