Skip to content

This repository contains the source code for a realtime transcriber for various whisper models, published on huggingface.

License

Notifications You must be signed in to change notification settings

nico-byte/whisper-realtime-transcriber

Repository files navigation

Whisper Realtime Transcriber

Overview

This repository contains the source code of a realtime transcriber for various whisper models, published on huggingface.

Prerequisites

Before you begin, make sure you meet the following prerequisites:

  • Python 3.10.12 installed on your machine.
  • Microphone connected to your machine.

Installation

  1. Install torch with CUDA support (optional)
  • Follow the instructions here and install version >=2.4.0
  1. Install the package:
    pip install --upgrade whisper-realtime-transcriber

Usage

After completing the installation, you can now use the transcriber:

  • Necessary imports
import asyncio

from whisper_realtime_transcriber.InputStreamGenerator import InputStreamGenerator
from whisper_realtime_transcriber.WhisperModel import WhisperModel
from whisper_realtime_transcriber.RealtimeTranscriber import RealtimeTranscriber
  • Standard way - model and generator are initialized by the RealtimeTranscriber and all outputs get printed directly to the console.
transcriber = RealtimeTranscriber()

asyncio.run(transcriber.execute_event_loop())
  • Executing a custom function inside the RealtimeTranscriber. All transcriptions are saved to a list.
def print_transcription(some_transcriptions: list):
  print(some_transcriptions)

# Specifying a function will set continuous to False - this will allow one
# to execute a custom function during the coroutine, that is doing something with the transcriptions.
# After the function finished it's work the coroutine will restart.
transcriber = RealtimeTranscriber(func=print_transcription)
  
asyncio.run(transcriber.execute_event_loop())
  • Loading the InputStreamGenerator and/or Whisper Model with custom values.
inputstream_generator = InputStreamGenerator(samplerate=8000, blocksize=2000, min_chunks=2)
asr_model = WhisperModel(inputstream_generator, model_id="openai/whisper-tiny", device="cuda")

transcriber = RealtimeTranscriber(inputstream_generator, asr_model)

asyncio.run(transcriber.execute_event_loop())

Feel free to reach out if you encounter any issues or have questions!

How it works

  • The transcriber consists of two modules: a Inputstream Generator and a Whisper Model.
  • The implementation of the Inputstream Generator is based on this implemantation.
  • The Inputstream Generator reads the microphone input and passes it to the Whisper Model. The Whisper Model then generates the transcription.
  • This is happening in an async event loop so that the Whsiper Model can continuously generate transcriptions from the provided audio input, generated and processed by the Inputstream Generator.
  • On a machine with a 12GB Nvidia RTX 3060 the distilled large-v3 model runs at a realtime-factor of about 0.4, this means 10s of audio input get transcribed in 4s - the longer the input the bigger is the realtime-factor.

ToDos

  • Add functionality to transcribe from audio files.
  • Get somehow rid of the hallucinations of the whisper models when no voice is active.

About

This repository contains the source code for a realtime transcriber for various whisper models, published on huggingface.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published