This script uses the Google Cloud Speech API to perform simple speech transcription (speech-to-text) from .wav audio files. It supports all the many languages the Google Cloud Speech API supports and it should support any .wav file.
The Google Cloud Speech API is free to use for up to 60 minutes of audio per month. After those 60 minutes, it costs $0.006 USD per 15 seconds of audio. That's 2.4 cents per minute, rounded up to the next 15 seconds. That's just $1.44 for an hour of audio. This code should probably support audio files that long, but also it might break, so let me know when that happens. Or when something else weird happens. Or if you want to help me improve my code. Feel free to submit issues or tweet at me.
- You'll need to register with Google Cloud Platform and create a project
- In the project, enable the Google Cloud Speech API
- In the Credentials section of your Google Cloud Platform project, create a Service Account Key (JSON version)
- Place the JSON file in the same folder as this code and rename it to
googlecredentials.json
- Install the required packages using
pip install -r requirements.txt
- You'll also need to install ffmpeg and set up a Google Cloud Storage bucket (these instructions TODO. For now the way to set the right bucket is changing the
bucket_name
variable in line 12 ofspeechrec.py
to the name of your bucket.)
From the command line, run python speechrec.py -i <path_to_input_file> -l <language_code>
. Make sure the input file is a .wav audio file. Use this list to find the right code for your language. For quick reference:
- English (USA): en_US
- English (UK): en_UK
- Dutch: nl_NL
- French (France): fr_FR
- Etc...
- Handle different GCS Bucket names (setup.py?)
- Remove files from computer and Google Cloud Storage after use
- Add Google Cloud Storage instructions to Readme
- Clean up and refactor code
- Add a check if file is supported
- Add support for other audio file types
- Someday create a web based version of this
- Etc...