GitHub - devpatel30/web-crawler

Introduction: This Python script performs web crawling on a specified domain, clusters the crawled pages based on text similarity using K-means clustering, and conducts sentiment analysis on the clusters. The project offers insights into the content structure, prevalent themes, and emotional tones of a website.

Prerequisites: Python 3.x Required Python libraries: requests, beautifulsoup4, scikit-learn, matplotlib, afinn, nltk

Install the required libraries using: pip install requests beautifulsoup4 scikit-learn matplotlib afinn nltk

Usage: After going to the directory of the downloaded project Run the script: python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
output_files		output_files
.gitignore		.gitignore
.readme		.readme
README.md		README.md
kmeans_plot_k3.png		kmeans_plot_k3.png
kmeans_plot_k6.png		kmeans_plot_k6.png
main.py		main.py
top_terms_3_clusters.txt		top_terms_3_clusters.txt
top_terms_6_clusters.txt		top_terms_6_clusters.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

devpatel30/web-crawler

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages