Skip to content

πŸ“– <--> 🎧 Comprehensive PDF-to-Audio-Visual Processing System: (1) Extracts text from PDFs, (2) Generates audiobooks, and (3) Creates highly-customizable scene visuals using AI Chatbot. Powered by PyMuPDF, Tortoise TTS, and DALL-E.

Notifications You must be signed in to change notification settings

ankushgpta2/ConvertMyBook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

41 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“š PDF Processing and Audio Visualization System

🌐 Overview

This project is a comprehensive system for processing PDFs, generating audio content, and creating scene visualizations. It provides end-to-end functionality from text extraction to audio generation and scene visualization.

✨ Key Features:

  • Preserves complex text formatting
  • Supports chapter-specific audio processing
  • Detects and handles quotes with unique voice characteristics
  • Maintains text structure in intermediate JSON format
  • Interactive chatbot for scene visualization (DALL-E)

🧩 Components

πŸ“„ PDF Processor (pdf_processor.py)

  • Extracts text from PDFs using PyMuPDF
  • Preserves advanced text formatting (headings, quotes, italics)
  • Generates metadata-rich text outputs

πŸ”Š Audio Processor (audio_processor.py)

  • Integrates Tortoise TTS for speech generation
  • Converts formatted text to speech
  • Dynamically adjusts voice characteristics based on text formatting
  • Creates audiobooks with contextually appropriate pauses

πŸ–ΌοΈ Scene Visualizer (scene_visualizer.py)

  • Implements scene visualization using DALL-E
  • Provides interactive chatbot interface
  • Generates images based on textual descriptions

πŸš€ Main Application (main.py)

  • Central application entry point
  • Supports multiple operational modes
  • Handles command-line arguments
  • Offers interactive chat functionality

πŸ› οΈ Prerequisites

  • Python 3.8+
  • OpenAI API Key (for DALL-E integration)

πŸ’Ύ Installation

Clone the repository + install dependencies:

pip install -r requirements.txt
python -m spacy download en_core_web_sm

πŸ–₯️ Usage

PDF Text Extraction:

python main.py --pdf_path book.pdf --output_dir output --mode pdf2text

Audiobook Generation:

python main.py --output_dir output --mode text2audio

Chat Interface:

python main.py --output_dir output --mode chat --openai_key YOUR_API_KEY

About

πŸ“– <--> 🎧 Comprehensive PDF-to-Audio-Visual Processing System: (1) Extracts text from PDFs, (2) Generates audiobooks, and (3) Creates highly-customizable scene visuals using AI Chatbot. Powered by PyMuPDF, Tortoise TTS, and DALL-E.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages