haven’t posted on here but a few times, but this one is tough. I’ve seen A TON of Transcription API questions that have no answers even though they have a ton of views, so it looks like this topic is one of the most challenging on stackoverflow. so if you’re looking for a challenge that few can solve, this may be the post for you.
So I’m having some issues with OpenAI’s endpoint: https://api.openai.com/v1/audio/transcriptions
I have an audio file that I’m pulling from telegram’s api. it’s a .m4a file. I’m not getting errors when I’m downloading the file, but when I try to transcribe it, it keeps saying it’s not in a supported format.
My cargo.toml:
[dependencies]
warp = "0.3"
tokio = { version = "1", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
dotenv = "0.15"
reqwest = { version = "0.12", features = ["json", "multipart", "stream"] }
teloxide = "0.12"
log = "0.4.22"
env_logger = "0.11.3"
log4rs = "1.2.0"
lazy_static = "1.4"
anyhow = "1.0"
uuid = { version = "1.9.1", features = ["v4"] } #for unique file names
tokio-util = "0.7.11"
mime_guess = "2.0.3"
My transcribe_audio function:
async fn transcribe_audio(openai_key: &str, file_name: &str, mime_type: Option<&str>) -> Result<String, anyhow::Error> {
log::info!("Audio: step 4: in transcribe_audio.");
let client = Client::new();
log::info!("File name: {}", file_name);
// Open file
log::info!("Audio: step 4 initializing: opening file");
let file_handle = tokio::fs::File::open(file_name).await
.context("Failed to open the file")?;
// Create a stream from the file
let bytes_stream = tokio_util::codec::FramedRead::new(file_handle, tokio_util::codec::BytesCodec::new());
// Create the multipart form
let file_part = reqwest::multipart::Part::stream(reqwest::Body::wrap_stream(bytes_stream))
.file_name(file_name.to_string()) // Use the original file name
.mime_str(mime_type.expect("couldn't give it a mime type"))?; // Use the provided MIME type, or a default one
let form = reqwest::multipart::Form::new()
.text("model", "whisper-1")
.part("file", file_part);
log::info!("Audio: step 4: successfully made file_part");
log::info!("beginning to send request to transcriptions");
let response = client
.post("https://api.openai.com/v1/audio/transcriptions")
.header("Authorization", format!("Bearer {}", openai_key))
.multipart(form)
.send()
.await
.context("Failed to send the request to OpenAI")?;
if !response.status().is_success() {
let status = response.status();
let text = response.text().await.unwrap_or_else(|_| String::from("Failed to read response text"));
anyhow::bail!("Received non-200 status code ({}) from OpenAI: {}", status, text);
}
let response_json: serde_json::Value = response.json().await
.context("Failed to parse the response from OpenAI")?;
let transcription = response_json["text"]
.as_str()
.ok_or_else(|| anyhow::anyhow!("Transcription not found in response"))?
.to_string();
Ok(transcription)
}
My download_file function:
async fn download_file(url: &str, file_id: &str, mime_type: Option<&str>) -> Result<String, anyhow::Error> {
log::info!("Audio: step 3: in download_file fn");
let client = Client::new();
log::info!("Audio: step 3 initializing");
// Send POST request to the URL to GET the file_path
let response = client
.get(url)
.send()
.await
.with_context(|| format!("Failed to send GET request to URL: {}", url))?;
// Ensure the request was successful
if !response.status().is_success() {
let error_message = format!("Received non-200 status code ({}) when trying to access URL: {}", response.status(), url);
log::error!("{}", error_message);
anyhow::bail!(error_message);
}
// Determine the file extension based on the MIME type
let file_extension = match mime_type {
Some("audio/flac") => "flac",
Some("audio/m4a") => "m4a",
Some("audio/mp3") => "mp3",
Some("audio/mp4") => "mp4",
Some("audio/mpeg") => "mpeg",
Some("audio/mpga") => "mpga",
Some("audio/oga") => "oga",
Some("audio/webm") => "webm",
Some("audio/wav") => "wav",
Some("audio/ogg") => "ogg",
// Add more MIME types and their corresponding file extensions as needed
_ => "unknown",
};
let filename = format!("{}.{}", Uuid::new_v4(), file_extension);
log::info!("Audio: step 3: in download_file. filename is {filename}");
// Create the file
let mut file = File::create(&filename)
.await
.with_context(|| format!("Failed to create file: {}", filename))?;
// Extract the response content
let content = response
.bytes()
.await
.with_context(|| "Failed to read content from response".to_string())?;
// Write content to the file
file.write_all(&content)
.await
.with_context(|| format!("Failed to write content to file: {}", filename))?;
log::info!("Audio: step 3 completed successfully");
Ok(filename)
}
My handle_audio_message function so you can see how everything is pieced together:
async fn handle_audio_message(bot_token: &str, chat_id: &u64, audio: &Audio, openai_key: &str) -> Result<(), anyhow::Error> {
log::info!("Audio: step 2: In handle_audio_message fn");
// Get the file path from Telegram using the get_file function
let file_path_on_telegram = get_file(bot_token, &audio.file_id).await?;
log::info!("Audio: step 2: In handle_audio_message. got file path");
log::info!("Audio: step 2: In handle_audio_message. File_path is {file_path_on_telegram}");
// Download the audio file from Telegram
let file_url = format!("https://api.telegram.org/file/bot{}/{}", bot_token, &file_path_on_telegram);
log::info!("Audio: step 3: about to download audio file");
let file_name = download_file(&file_url, &audio.file_id, audio.mime_type.as_deref()).await?;
// Call OpenAI API to transcribe audio
log::info!("Audio: step 4: about to transcribe the audio message");
let transcription = transcribe_audio(openai_key, &file_name, audio.mime_type.as_deref()).await?;
log::info!("audio message transcribed to: {}", transcription);
let bot = Client::new();
bot.post(&format!("https://api.telegram.org/bot{}/sendMessage", bot_token))
.json(&serde_json::json!({
"chat_id": chat_id,
"text": transcription,
}))
.send()
.await?;
Ok(())
}
and if you want, how I’m actually getting the file’s name from telegram:
async fn get_file(bot_token: &str, file_id: &str) -> Result<String> {
log::info!("Audio: step 2 initializing. in get_file right now");
let client = Client::new();
let res: Value = client.post(&format!("https://api.telegram.org/bot{}/getFile", bot_token))
.form(&[("file_id", file_id)])
.send()
.await?
.json()
.await?;
let file_path = res["result"]["file_path"].as_str().unwrap().to_string();
Ok(file_path)
}
Okay, and here are the relevant logs:
2024-07-04T06:01:31.482835598+00:00 - INFO - Audio: step 2: In handle_audio_message fn
2024-07-04T06:01:31.482843609+00:00 - INFO - Audio: step 2 initializing. in get_file right now
2024-07-04T06:01:31.894554705+00:00 - INFO - Audio: step 2: In handle_audio_message. got file path
2024-07-04T06:01:31.894688175+00:00 - INFO - Audio: step 2: In handle_audio_message. File_path is music/file_0.m4a
2024-07-04T06:01:31.894730325+00:00 - INFO - Audio: step 3: about to download audio file
2024-07-04T06:01:31.894759266+00:00 - INFO - Audio: step 3: in download_file fn
2024-07-04T06:01:31.929841708+00:00 - INFO - Audio: step 3 initializing
2024-07-04T06:01:32.425059459+00:00 - INFO - Audio: step 3: in download_file. filename is 0a76013b-bafe-4bd1-9a42-613c116a8904.m4a
2024-07-04T06:01:32.548868885+00:00 - INFO - Audio: step 3 completed successfully
2024-07-04T06:01:32.548991745+00:00 - INFO - Audio: step 4: about to transcribe the audio message
2024-07-04T06:01:32.549003405+00:00 - INFO - Audio: step 4: in transcribe_audio.
2024-07-04T06:01:32.583884887+00:00 - INFO - File name: 0a76013b-bafe-4bd1-9a42-613c116a8904.m4a
2024-07-04T06:01:32.583975949+00:00 - INFO - Audio: step 4 initializing: opening file
2024-07-04T06:01:32.584274150+00:00 - INFO - Audio: step 4: successfully made file_part
2024-07-04T06:01:32.584331379+00:00 - INFO - beginning to send request to transcriptions
2024-07-04T06:01:32.998999770+00:00 - ERROR - Error handling message: Received non-200 status code (400 Bad Request) from OpenAI: {
"error": {
"message": "Invalid file format. Supported formats: ['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']",
"type": "invalid_request_error",
"param": null,
"code": null
}
}
I’ve tried many variations of doing the multipart form. I’ve tried changing the name as well. I know the file isn’t corrupt because I recorded the audio file on my phone and I can play it. Well, I guess you could say it’s corrupting when I’m downloading it, but wouldn’t I get an error?
Apologies if I provided too much code, I’m never sure whether I’m providing too much or too little, but I always try to add code if it’s related. I’ve looked at other posts to see if I can get some help, but they seem to be mostly javascript. Even so, I’ve tried to see if I can piece what they have, but it doesn’t seem to work.