Skip to content

sThapaswi/Automated-Document-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Automated-Document-Analysis

Designed and implemented a comprehensive Python-based solution for automating the extraction, processing, and analysis of text from unstructured PDF and DOCX documents. This system utilizes Natural Language Processing (NLP) techniques to perform tasks such as sentence tokenization, pattern matching, and identifying specific text elements like numeric lists and references.

• Leveraged regular expressions (regex) and libraries like NLTK and pandas to clean and filter document content. The solution is capable of processing large volumes of data efficiently, extracting key information while filtering out irrelevant content.

• Applied transformer models (pre-trained sentence embeddings) to analyze and compute semantic similarity between different sections of text, enabling accurate content comparison and context understanding across documents.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published