PDF-Extractor

About

PDF-Extractor is a web application that allows users to upload PDF files, extract text from them, and correct the text using language tools. The frontend is built with React, and the backend is built with FastAPI. This application supports both English and French languages.

Getting Started

Prerequisites

Node.js and npm
Python 3.12.3
FastAPI
pdfplumber
language_tool_python
uvicorn
langdetect

Installation

Clone the repository:

git clone https://github.com/arij01/PDF-Extractor.git
cd PDF-Extractor

Install requirements:

pip install -r requirements.txt

Running the Application

Start the backend server:

uvicorn app:app --reload

Install the required npm packages:

npm install

Start the frontend development server:

cd frontend
npm start

Open your browser and navigate to http://localhost:3000 to view the application.

Usage

A sample PDF file is provided in the sample directory. You can use this file to test the application.

Uploading a PDF

Open the application in your browser.
Drag and drop a PDF file into the designated area or click to select a file.
Click the "Submit" button to upload the file.
Wait for the text extraction and correction process to complete.
The corrected text will be displayed on the screen.

Correcting Text

The application uses language_tool_python to correct the text extracted from the PDF. The language is automatically detected using langdetect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PDF-Extractor

About

Getting Started

Prerequisites

Installation

Running the Application

Usage

Uploading a PDF

Correcting Text

Files

README.md

Latest commit

History

README.md

File metadata and controls

PDF-Extractor

About

Getting Started

Prerequisites

Installation

Running the Application

Usage

Uploading a PDF

Correcting Text