python - Speech-to-text from an audio file in Streamlit

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I'm working on web app that turns audio into text using streamlit. I am using the SpeechRecognition library, which has a limit of 3 minutes, but I am working on a fix that splits the video up into 3 minute chunks. I am testing this on a 15-minute audio file, and the first two chunks work perfectly. But when it comes to the chunks after that, I get this error:

FileNotFoundError: [WinError 2] The system cannot find the file specified
Traceback:
File "C:\Users\marcu\AppData\Roaming\Python\Python39\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
File "C:\Users\marcu\OneDrive\Desktop\Coding\auto notes\test.py", line 51, in <module>
    main()
File "C:\Users\marcu\OneDrive\Desktop\Coding\auto notes\test.py", line 26, in main
    audio = pydub.AudioSegment.from_file(temp_audio_file.name)
File "C:\Users\marcu\AppData\Roaming\Python\Python39\site-packages\pydub\audio_segment.py", line 728, in from_file
    info = mediainfo_json(orig_file, read_ahead_limit=read_ahead_limit)
File "C:\Users\marcu\AppData\Roaming\Python\Python39\site-packages\pydub\utils.py", line 274, in mediainfo_json
    res = Popen(command, stdin=stdin_parameter, stdout=PIPE, stderr=PIPE)
File "c:\program files\python39\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
File "c:\program files\python39\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args)
Here is the script:
import streamlit as st
import speech_recognition as sr
import os
import math
def file_selector(folder_path='.'):
    filenames = os.listdir(folder_path)
    selected_filename = st.selectbox('Select a file', filenames)
    return os.path.join(folder_path, selected_filename)
def main():
    st.title("Audio to Text Converter")
    # Upload the audio file
    audio_file = st.file_uploader("Upload an audio file", type=["mp3", "wav", "ogg"])
    if audio_file is not None:
        # Split the audio file into 5-minute chunks
        CHUNK_DURATION = 5 * 60 # 5 minutes
        r = sr.Recognizer()
        with sr.AudioFile(audio_file) as source:
            audio_duration = math.ceil(source.DURATION)
            num_chunks = math.ceil(audio_duration / CHUNK_DURATION)
            for i in range(num_chunks):
                chunk_start = i * CHUNK_DURATION
                chunk_end = min((i + 1) * CHUNK_DURATION, audio_duration)
                audio_text = r.record(source, offset=chunk_start, duration=chunk_end-chunk_start)
                text = r.recognize_google(audio_text)
                # Display the text for this chunk
                st.header(f"Text from Audio (Chunk {i+1}/{num_chunks})")
                st.write(text)
if __name__ == '__main__':
    main()
I have asked around on Discord and in other places, but no one seemed to know the fix. I was wondering if this was due to a miss-calculation of how many chunks there should be, but when I print num_chunks it returns 5, which is correct for a 15 minute audio file. I also tested this with another file, but got the same error after the first 2 chunks. Thanks for the help in advance!
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.