Skip to content

Includes transcription and llm backend applications

Notifications You must be signed in to change notification settings

OpenThematic/Transcription_Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speaker Identification and Transcription Pipeline

This project provides a Python-based pipeline for extracting audio from video files, performing speaker diarization, and transcribing audio using OpenAI's Whisper. It identifies different speakers in the audio and aligns transcriptions with these speakers.

Features

  • Audio extraction from video files
  • Audio trimming to a specified duration
  • Speaker diarization to identify different speakers in the audio
  • Audio transcription using OpenAI's Whisper
  • Matching transcriptions with speaker diarization data

Prerequisites

  • Python 3.x
  • Libraries: pydub, yt_dlp, pyannote.audio, whisper
  • ffmpeg (for audio extraction from video)
  • Ideally CUDA GPU with 10GB for whisper large or 6-8gb for whisper medium - will run on CPU, just slower.
  • See docker file

Installation

Ideally docker compose a dev environment via the compose yaml / dockerfile

Usage Configuration is made in config.ini - there you can set a path to your input directory (where audio or video files are stored). Make changes (other than the model) to the transcription settings within transcript.py. Run the main.py script.

audio_processing.py: Handles audio extraction and trimming, noise reduction, normalization, produces a mp4 with embedded subtitles. speaker_diarization.py: Performs speaker diarization using pyannote.audio. transcription.py: Transcribes audio using Whisper. main.py: Integrates all modules and runs the pipeline - produces various output formats.

License [Your chosen license]

Acknowledgments OpenAI for providing the Whisper model. Contributors and maintainers of pyannote.audio, pydub, and yt_dlp.

About

Includes transcription and llm backend applications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published