Skip to content

brevdev/pdf2podcast

Repository files navigation

AI Research Assistant

Overview

A microservice driven implementation for transforming PDFs into engaging audio content. For a deeper dive into the system architecture, please see the diagram below:

You can view a mermaid diagram of our system here.

Quick Start Guide

  1. Environment Variables: We require the following environment variables to be set:

    # Create .env file with required variables
    echo "ELEVENLABS_API_KEY=your_key" > .env
    echo "NIM_KEY=your_key" >> .env
    echo "MAX_CONCURRENT_REQUESTS=1" >> .env

    Note that in production we use the NVIDIA Eleven Labs API key which can handle concurrent requests. For local development, you may want to set MAX_CONCURRENT_REQUESTS=1 to avoid rate limiting issues. You can generate your own testing API key for free here.

  2. Install Dependencies: We use UV to manage python dependencies.

    make uv

    This will:

    • Install UV if not present
    • Create virtual environment
    • Install project dependencies

    If you open up a new terminal window and want to quickly re-use the same environment, you can run make uv again.

  3. Start Development Server: You can start the entire stack with:

    make all-services

    This command will:

    • Verify environment variables are set
    • Create necessary directories
    • Start all services using Docker Compose in --build mode.

    Note: The first time you run make all-services, the docling service may take 10-15 minutes to pull and build. Subsequent runs will be much faster.

    You can also set DETACH=1 to run the services in detached mode, which allows you to continue using your terminal while the services are running.

  4. Run Podcast Generation:

    source .venv/bin/activate
    python tests/test.py --target <pdf1.pdf> --context <pdf2.pdf>

    This will generate a 2-person podcast. In order to generate a 1-person monologue, you can add the --monologue flag. Check out the test file for more examples. If you are not on a GPU machine, the PDF service might take a while to run.

Hosting the PDF service on a separate machine

As stated above, we use docling as our default PDF service. When you spin up the stack, docling will be built and run automatically.

If you would like to run the PDF service on a separate machine, you can add the following to your .env file:

echo "MODEL_API_URL=<pdf-model-service-url" >> .env

Using nv-ingest

We also support using a fork of NVIDIA's NV-Ingest as our PDF service. This requires 2 A100-SXM machines. See the repo for more information. If you would like to use this, you can add the following to your .env file:

echo "MODEL_API_URL=<nv-ingest-url>/v1" >> .env

Note the use of v1 in the URL.

Here is the workflow that we use for running this in production (disaggregated core services and pdf model service):

# On an L40s machine
make model-prod

# On a different machine
make prod

Selecting LLMs

We currently use an ensemble of 3 LLMS to generate these podcasts. Out of the box, we recommend using the LLama 3.1-70B NIM. If you would like to use a different model, you can update the models.json file with the desired model. The default models.json calls a NIM that I have currently hosted. Feel free to use it as you develop locally. When you deploy, please use our NIM API Catalog endpoints.

Optimizing for GPU usage

Due to our design, it is relatively easy to swap out different pieces of our stack to optimize for GPU usage and available hardware. For example, you could swap each model with the smaller LLama 3.1-8B NIM and disable GPU usage for docling in docker-compose.yaml.

Development Tools

Tracing

We expose a Jaeger instance at http://localhost:16686/ for tracing. This is useful for debugging and monitoring the system.

Code Quality

The project uses ruff for linting and formatting. You must run make ruff before your PR can be merged:

make ruff  # Runs both lint and format

CI/CD

We use GitHub Actions for CI/CD. We run the following actions:

  • ruff: Runs linting and formatting
  • pr-test: Runs an e2e podcast test on the PR
  • build-and-push: Builds and pushes a new container image to the remote repo. This is used to update production deployments

Production Deployment

For production deployment:

make prod

This uses the remote Docker Compose configuration and pulls pre-built images from the registry.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests: python tests/test.py <pdf1> <pdf2>
  5. Run linting: make ruff
  6. Submit a pull request

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •