PDF-Extractor is a web application that allows users to upload PDF files, extract text from them, and correct the text using language tools. The frontend is built with React, and the backend is built with FastAPI. This application supports both English and French languages.
- Node.js and npm
- Python 3.12.3
- FastAPI
- pdfplumber
- language_tool_python
- uvicorn
- langdetect
- Clone the repository:
git clone https://github.com/arij01/PDF-Extractor.git
cd PDF-Extractor
- Install requirements:
pip install -r requirements.txt
- Start the backend server:
uvicorn app:app --reload
- Install the required npm packages:
npm install
- Start the frontend development server:
cd frontend
npm start
- Open your browser and navigate to http://localhost:3000 to view the application.
A sample PDF file is provided in the sample
directory. You can use this file to test the application.
- Open the application in your browser.
- Drag and drop a PDF file into the designated area or click to select a file.
- Click the "Submit" button to upload the file.
- Wait for the text extraction and correction process to complete.
- The corrected text will be displayed on the screen.
The application uses language_tool_python
to correct the text extracted from the PDF. The language is automatically detected using langdetect
.