AI Research Assistant

Overview

A microservice driven implementation for transforming PDFs into engaging audio content. For a deeper dive into the system architecture, please see the diagram below:

You can view a mermaid diagram of our system here.

Quick Start Guide

Environment Variables: We require the following environment variables to be set:
```
# Create .env file with required variables
echo "ELEVENLABS_API_KEY=your_key" > .env
echo "NIM_KEY=your_key" >> .env
echo "MAX_CONCURRENT_REQUESTS=1" >> .env
```
Note that in production we use the NVIDIA Eleven Labs API key which can handle concurrent requests. For local development, you may want to set MAX_CONCURRENT_REQUESTS=1 to avoid rate limiting issues. You can generate your own testing API key for free here.
Install Dependencies: We use UV to manage python dependencies.
```
make uv
```
This will:
- Install UV if not present
- Create virtual environment
- Install project dependencies
If you open up a new terminal window and want to quickly re-use the same environment, you can run make uv again.
Start Development Server: You can start the entire stack with:
```
make all-services
```
This command will:
- Verify environment variables are set
- Create necessary directories
- Start all services using Docker Compose in --build mode.
Note: The first time you run make all-services, the docling service may take 10-15 minutes to pull and build. Subsequent runs will be much faster.

You can also set DETACH=1 to run the services in detached mode, which allows you to continue using your terminal while the services are running.
Run Podcast Generation:
```
source .venv/bin/activate
python tests/test.py --target <pdf1.pdf> --context <pdf2.pdf>
```
This will generate a 2-person podcast. In order to generate a 1-person monologue, you can add the --monologue flag. Check out the test file for more examples. If you are not on a GPU machine, the PDF service might take a while to run.

Hosting the PDF service on a separate machine

As stated above, we use docling as our default PDF service. When you spin up the stack, docling will be built and run automatically.

If you would like to run the PDF service on a separate machine, you can add the following to your .env file:

echo "MODEL_API_URL=<pdf-model-service-url" >> .env

Using `nv-ingest`

We also support using a fork of NVIDIA's NV-Ingest as our PDF service. This requires 2 A100-SXM machines. See the repo for more information. If you would like to use this, you can add the following to your .env file:

echo "MODEL_API_URL=<nv-ingest-url>/v1" >> .env

Note the use of v1 in the URL.

Here is the workflow that we use for running this in production (disaggregated core services and pdf model service):

# On an L40s machine
make model-prod

# On a different machine
make prod

Selecting LLMs

We currently use an ensemble of 3 LLMS to generate these podcasts. Out of the box, we recommend using the LLama 3.1-70B NIM. If you would like to use a different model, you can update the models.json file with the desired model. The default models.json calls a NIM that I have currently hosted. Feel free to use it as you develop locally. When you deploy, please use our NIM API Catalog endpoints.

Optimizing for GPU usage

Due to our design, it is relatively easy to swap out different pieces of our stack to optimize for GPU usage and available hardware. For example, you could swap each model with the smaller LLama 3.1-8B NIM and disable GPU usage for docling in docker-compose.yaml.

Development Tools

Tracing

We expose a Jaeger instance at http://localhost:16686/ for tracing. This is useful for debugging and monitoring the system.

Code Quality

The project uses ruff for linting and formatting. You must run make ruff before your PR can be merged:

make ruff  # Runs both lint and format

CI/CD

We use GitHub Actions for CI/CD. We run the following actions:

ruff: Runs linting and formatting
pr-test: Runs an e2e podcast test on the PR
build-and-push: Builds and pushes a new container image to the remote repo. This is used to update production deployments

Production Deployment

For production deployment:

make prod

This uses the remote Docker Compose configuration and pulls pre-built images from the registry.

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests: python tests/test.py <pdf1> <pdf2>
Run linting: make ruff
Submit a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
samples		samples
services		services
shared		shared
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose-remote.yaml		docker-compose-remote.yaml
docker-compose.yaml		docker-compose.yaml
models.json		models.json
requirements.txt		requirements.txt
ruff.toml		ruff.toml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Research Assistant

Overview

Quick Start Guide

Hosting the PDF service on a separate machine

Using `nv-ingest`

Selecting LLMs

Optimizing for GPU usage

Development Tools

Tracing

Code Quality

CI/CD

Production Deployment

Contributing

About

Releases

Packages

Contributors 3

Languages

brevdev/pdf2podcast

Folders and files

Latest commit

History

Repository files navigation

AI Research Assistant

Overview

Quick Start Guide

Hosting the PDF service on a separate machine

Using nv-ingest

Selecting LLMs

Optimizing for GPU usage

Development Tools

Tracing

Code Quality

CI/CD

Production Deployment

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Using `nv-ingest`

Packages