Resume Parser API

A robust FastAPI-based service that processes resumes in PDF format, extracting structured information and generating text embeddings. This API combines PDF processing capabilities with AI-powered information extraction to convert unstructured resume data into structured, analyzable formats.

Features

PDF text extraction with OCR support
Resume validation and verification
Structured information extraction using OpenAI
Optional text embedding generation
Hyperlink extraction
Table and image processing
Experience calculation and fresher detection
Comprehensive logging and error handling
Configurable through environment variables

Flow of the code:

|----------------|     |----------------|     |----------------|
|   PDF Upload   | --> |  PDF Service   | --> |   Validation   |
|----------------|     |----------------|     |----------------|
                           |      ^
                           v      |
                      |----------------|
                      |  OCR Service   |
                      |----------------|
                           |      ^
                           v      |
|----------------|     |----------------|     |----------------|
|   AI Service   | <-- | Text Process   | --> |   Embedding    |
|----------------|     |----------------|     |   Generation   |
       |                                      |----------------|
       v
|----------------|     |----------------|
|  Structure     | --> |    Final       |
|  Generation    |     |   Response     |
|----------------|     |----------------|

Project Structure

resume_parser_api/
├── app/
│   ├── __init__.py
│   ├── core/
│   │   ├── __init__.py
│   │   ├── config.py        # Configuration management
│   │   ├── dependencies.py  # Service dependencies
│   │   └── logging.py       # Logging setup
│   ├── api/
│   │   ├── __init__.py
│   │   └── routes/
│   │       ├── __init__.py
│   │       └── resume.py    # API endpoints
│   ├── services/
│   │   ├── __init__.py
│   │   ├── pdf.py          # PDF processing service
│   │   └── ai.py           # AI service for OpenAI interactions
│   └── models/
│       ├── __init__.py
│       └── schemas.py       # Data models and schemas
├── Tesseract-OCR/          # Tesseract binaries
├── poppler-23.11.0/        # Poppler binaries
├── logs/                   # Application logs
├── main.py                 # Application entry point
├── .env                    # Environment configuration
└── requirements.txt        # Project dependencies

Core Components

1. PDF Service (`app/services/pdf.py`)

Handles PDF file processing
Extracts text, tables, and images
Performs OCR on image-based content
Validates if the document is a resume
Extracts hyperlinks from the document

2. AI Service (`app/services/ai.py`)

Manages OpenAI API interactions
Generates text embeddings
Extracts structured information from resume text
Calculates total experience
Determines fresher/experienced status
Handles token counting and rate limiting

3. Configuration (`app/core/config.py`)

Manages environment variables
Configures application settings
Sets up logging and paths
Handles external tool configurations

4. API Routes (`app/api/routes/resume.py`)

Defines API endpoints
Handles file uploads
Orchestrates services
Manages response formatting
Implements error handling

Development Setup

Prerequisites

Python 3.8 or higher
Tesseract OCR
Poppler
OpenAI API key

Environment Setup

Create and activate virtual environment:

For Windows:

# Create virtual environment
python -m venv resume_api_env

# Activate virtual environment
.\resume_api_env\Scripts\activate

For Linux/Mac:

# Create virtual environment
python -m venv resume_api_env

# Activate virtual environment
source resume_api_env/bin/activate

Install dependencies:

pip install -r requirements.txt

Set up external tools:

Download Tesseract OCR
Download Poppler
Place them in the project root directory under Tesseract-OCR and poppler-23.11.0 respectively

Create .env file:

# OpenAI Configuration
OPENAI_API_KEY=your_api_key_here

# Application Configuration
DEBUG=True
LOG_LEVEL=INFO

# API Settings
MAX_RETRIES=3
REQUEST_TIMEOUT=30.0
MAX_FILE_SIZE=10485760

# Feature Flags
ENABLE_OCR=True
ENABLE_TABLE_EXTRACTION=True
ENABLE_LINK_EXTRACTION=True
GENERATE_EMBEDDINGS=False
ENABLE_BACKGROUND_TASKS=True

Start the application:

python main.py

The API will be available at http://localhost:8000

API Documentation

Once running, access the interactive API documentation at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Main Endpoint

POST /api/v1/resume/parse

Accepts PDF file uploads
Returns structured resume information
Validates resume content
Optional embedding generation

Example response structure:

{
    "status": true,
    "process_id": "unique-id",
    "structured_data": {
        "name": "John Doe",
        "email": ["john@example.com"],
        "skills": ["Python", "FastAPI", "AI"],
        "is_fresher": false,
        "total_experience_in_months": 36
        // ... other extracted information
    },
    "embeddings": [...],  // If enabled
    "token_metrics": {
        "extraction": 1000,
        "embedding": 500
    }
}

Error Handling

The API implements comprehensive error handling:

Invalid file types (400)
Non-resume documents (422)
Processing errors (500)
Rate limiting
Token usage tracking

Logging

Detailed logging is implemented throughout the application:

Request tracking with process IDs
Error logging with stack traces
Performance metrics
Token usage tracking
Service status logging

Logs are stored in the logs/ directory and rotated automatically.

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Make sure to:

Follow the existing code style
Add tests if applicable
Update documentation as needed
Run tests before submitting

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Tesseract-OCR		Tesseract-OCR
app		app
poppler-23.11.0		poppler-23.11.0
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume Parser API

Features

Flow of the code:

Project Structure

Core Components

1. PDF Service (`app/services/pdf.py`)

2. AI Service (`app/services/ai.py`)

3. Configuration (`app/core/config.py`)

4. API Routes (`app/api/routes/resume.py`)

Development Setup

Prerequisites

Environment Setup

API Documentation

Main Endpoint

Error Handling

Logging

Contributing

About

Releases

Packages

Contributors 2

Languages

License

vishwajeetdabholkar/resume_parser_api

Folders and files

Latest commit

History

Repository files navigation

Resume Parser API

Features

Flow of the code:

Project Structure

Core Components

1. PDF Service (app/services/pdf.py)

2. AI Service (app/services/ai.py)

3. Configuration (app/core/config.py)

4. API Routes (app/api/routes/resume.py)

Development Setup

Prerequisites

Environment Setup

API Documentation

Main Endpoint

Error Handling

Logging

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

1. PDF Service (`app/services/pdf.py`)

2. AI Service (`app/services/ai.py`)

3. Configuration (`app/core/config.py`)

4. API Routes (`app/api/routes/resume.py`)

Packages