Thiết kế website giá rẻ

Question

haven’t posted on here but a few times, but this one is tough. I’ve seen A TON of Transcription API questions that have no answers even though they have a ton of views, so it looks like this topic is one of the most challenging on stackoverflow. so if you’re looking for a challenge that few can solve, this may be the post for you.

So I’m having some issues with OpenAI’s endpoint: https://api.openai.com/v1/audio/transcriptions
I have an audio file that I’m pulling from telegram’s api. it’s a .m4a file. I’m not getting errors when I’m downloading the file, but when I try to transcribe it, it keeps saying it’s not in a supported format.
My cargo.toml:

[dependencies]
warp = "0.3"
tokio = { version = "1", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
dotenv = "0.15"
reqwest = { version = "0.12", features = ["json", "multipart", "stream"] }
teloxide = "0.12"
log = "0.4.22"
env_logger = "0.11.3"
log4rs = "1.2.0"
lazy_static = "1.4"
anyhow = "1.0"
uuid = { version = "1.9.1", features = ["v4"] } #for unique file names
tokio-util = "0.7.11"
mime_guess = "2.0.3"

My transcribe_audio function:

async fn transcribe_audio(openai_key: &str, file_name: &str, mime_type: Option<&str>) -> Result<String, anyhow::Error> {
    log::info!("Audio: step 4: in transcribe_audio.");
    let client = Client::new();
    log::info!("File name: {}", file_name);
    
    // Open file
    log::info!("Audio: step 4 initializing: opening file");
    let file_handle = tokio::fs::File::open(file_name).await
    .context("Failed to open the file")?;

    // Create a stream from the file
    let bytes_stream = tokio_util::codec::FramedRead::new(file_handle, tokio_util::codec::BytesCodec::new());

    // Create the multipart form
    let file_part = reqwest::multipart::Part::stream(reqwest::Body::wrap_stream(bytes_stream))
        .file_name(file_name.to_string())  // Use the original file name
        .mime_str(mime_type.expect("couldn't give it a mime type"))?; // Use the provided MIME type, or a default one

    let form = reqwest::multipart::Form::new()
        .text("model", "whisper-1")
        .part("file", file_part);

    log::info!("Audio: step 4: successfully made file_part");
    log::info!("beginning to send request to transcriptions");

    let response = client
        .post("https://api.openai.com/v1/audio/transcriptions")
        .header("Authorization", format!("Bearer {}", openai_key))
        .multipart(form)
        .send()
        .await
        .context("Failed to send the request to OpenAI")?;

    if !response.status().is_success() {
        let status = response.status();
        let text = response.text().await.unwrap_or_else(|_| String::from("Failed to read response text"));
        anyhow::bail!("Received non-200 status code ({}) from OpenAI: {}", status, text);
    }

    let response_json: serde_json::Value = response.json().await
        .context("Failed to parse the response from OpenAI")?;
    let transcription = response_json["text"]
        .as_str()
        .ok_or_else(|| anyhow::anyhow!("Transcription not found in response"))?
        .to_string();

    Ok(transcription)
}

My download_file function:

async fn download_file(url: &str, file_id: &str, mime_type: Option<&str>) -> Result<String, anyhow::Error> {
    log::info!("Audio: step 3: in download_file fn");
    
    let client = Client::new();
    log::info!("Audio: step 3 initializing");
    
    // Send POST request to the URL to GET the file_path
    let response = client
        .get(url)
        .send()
        .await
        .with_context(|| format!("Failed to send GET request to URL: {}", url))?;
    
    // Ensure the request was successful
    if !response.status().is_success() {
        let error_message = format!("Received non-200 status code ({}) when trying to access URL: {}", response.status(), url);
        log::error!("{}", error_message);
        anyhow::bail!(error_message);
    }

    // Determine the file extension based on the MIME type
    let file_extension = match mime_type {
        Some("audio/flac") => "flac",
        Some("audio/m4a") => "m4a",
        Some("audio/mp3") => "mp3",
        Some("audio/mp4") => "mp4",
        Some("audio/mpeg") => "mpeg",
        Some("audio/mpga") => "mpga",
        Some("audio/oga") => "oga",
        Some("audio/webm") => "webm",
        Some("audio/wav") => "wav",
        Some("audio/ogg") => "ogg",
        // Add more MIME types and their corresponding file extensions as needed
        _ => "unknown",
    };

    let filename = format!("{}.{}", Uuid::new_v4(), file_extension);
    log::info!("Audio: step 3: in download_file. filename is {filename}");
    
    // Create the file
    let mut file = File::create(&filename)
        .await
        .with_context(|| format!("Failed to create file: {}", filename))?;
    
    // Extract the response content
    let content = response
        .bytes()
        .await
        .with_context(|| "Failed to read content from response".to_string())?;
    
    // Write content to the file
    file.write_all(&content)
        .await
        .with_context(|| format!("Failed to write content to file: {}", filename))?;
    
    log::info!("Audio: step 3 completed successfully");
    Ok(filename)
}

My handle_audio_message function so you can see how everything is pieced together:

async fn handle_audio_message(bot_token: &str, chat_id: &u64, audio: &Audio, openai_key: &str) -> Result<(), anyhow::Error> {
    log::info!("Audio: step 2: In handle_audio_message fn");

    // Get the file path from Telegram using the get_file function
    let file_path_on_telegram = get_file(bot_token, &audio.file_id).await?;
    log::info!("Audio: step 2: In handle_audio_message. got file path");
    log::info!("Audio: step 2: In handle_audio_message. File_path is {file_path_on_telegram}");

    // Download the audio file from Telegram
    let file_url = format!("https://api.telegram.org/file/bot{}/{}", bot_token, &file_path_on_telegram);
    log::info!("Audio: step 3: about to download audio file");
    let file_name = download_file(&file_url, &audio.file_id, audio.mime_type.as_deref()).await?;

    // Call OpenAI API to transcribe audio
    log::info!("Audio: step 4: about to transcribe the audio message");
    let transcription = transcribe_audio(openai_key, &file_name, audio.mime_type.as_deref()).await?;
    log::info!("audio message transcribed to: {}", transcription);

    let bot = Client::new();
    bot.post(&format!("https://api.telegram.org/bot{}/sendMessage", bot_token))
        .json(&serde_json::json!({
            "chat_id": chat_id,
            "text": transcription,
        }))
        .send()
        .await?;

    Ok(())
}

and if you want, how I’m actually getting the file’s name from telegram:

async fn get_file(bot_token: &str, file_id: &str) -> Result<String> {
    log::info!("Audio: step 2 initializing. in get_file right now");
    let client = Client::new();
    let res: Value = client.post(&format!("https://api.telegram.org/bot{}/getFile", bot_token))
        .form(&[("file_id", file_id)])
        .send()
        .await?
        .json()
        .await?;
    let file_path = res["result"]["file_path"].as_str().unwrap().to_string();
    Ok(file_path)
}

Okay, and here are the relevant logs:

2024-07-04T06:01:31.482835598+00:00 - INFO - Audio: step 2: In handle_audio_message fn
2024-07-04T06:01:31.482843609+00:00 - INFO - Audio: step 2 initializing. in get_file right now
2024-07-04T06:01:31.894554705+00:00 - INFO - Audio: step 2: In handle_audio_message. got file path
2024-07-04T06:01:31.894688175+00:00 - INFO - Audio: step 2: In handle_audio_message. File_path is music/file_0.m4a
2024-07-04T06:01:31.894730325+00:00 - INFO - Audio: step 3: about to download audio file
2024-07-04T06:01:31.894759266+00:00 - INFO - Audio: step 3: in download_file fn
2024-07-04T06:01:31.929841708+00:00 - INFO - Audio: step 3 initializing
2024-07-04T06:01:32.425059459+00:00 - INFO - Audio: step 3: in download_file. filename is 0a76013b-bafe-4bd1-9a42-613c116a8904.m4a
2024-07-04T06:01:32.548868885+00:00 - INFO - Audio: step 3 completed successfully
2024-07-04T06:01:32.548991745+00:00 - INFO - Audio: step 4: about to transcribe the audio message
2024-07-04T06:01:32.549003405+00:00 - INFO - Audio: step 4: in transcribe_audio.
2024-07-04T06:01:32.583884887+00:00 - INFO - File name: 0a76013b-bafe-4bd1-9a42-613c116a8904.m4a
2024-07-04T06:01:32.583975949+00:00 - INFO - Audio: step 4 initializing: opening file
2024-07-04T06:01:32.584274150+00:00 - INFO - Audio: step 4: successfully made file_part
2024-07-04T06:01:32.584331379+00:00 - INFO - beginning to send request to transcriptions
2024-07-04T06:01:32.998999770+00:00 - ERROR - Error handling message: Received non-200 status code (400 Bad Request) from OpenAI: {
  "error": {
    "message": "Invalid file format. Supported formats: ['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

I’ve tried many variations of doing the multipart form. I’ve tried changing the name as well. I know the file isn’t corrupt because I recorded the audio file on my phone and I can play it. Well, I guess you could say it’s corrupting when I’m downloading it, but wouldn’t I get an error?
Apologies if I provided too much code, I’m never sure whether I’m providing too much or too little, but I always try to add code if it’s related. I’ve looked at other posts to see if I can get some help, but they seem to be mostly javascript. Even so, I’ve tried to see if I can piece what they have, but it doesn’t seem to work.

Thiết kế website giá rẻ

Danh mục

Rust: Issue with OpenAI’s transcription API