ScrapEZ Version 2.0
We are excited to announce Version 2.0 of ScrapEZ! This major update introduces a host of new features, improvements, and enhancements to make your web scraping tasks more efficient and effective. Here’s what’s new:
Major Changes
- Introduced User-Agent Rotation: Added support for rotating user-agents to reduce the risk of being blocked by websites.
- Enhanced Error Handling: Improved error handling and logging for better tracking of issues and more robust error management.
- New Scraping Features: Added several new functionalities to enhance your scraping capabilities.
- Improved Data Storage: Added the ability to store scraped data into a file for better organization and accessibility.
- Updated User Interface: Expanded the menu with new scraping options and improved user prompts and input handling for a better user experience.
Detailed Changes
New Features
- Extract Metadata: Added functionality to extract and display the title and description meta tags from a webpage using the
get_metadata
function. - Analyze Content: Introduced a feature to analyze and extract headers (
h1
,h2
) and main content paragraphs with theget_content_analysis
function. - Check Broken Links: Added a method to check and report broken links on a page with the
check_links
function. - Performance Metrics: Added functionality to measure page load time and page size using the
get_performance_metrics
function. - Handle Cookies: Implemented functionality to handle and retrieve cookies with the
handle_cookies
function. - Parse Sitemap: Added support to parse and retrieve URLs from the sitemap with the
parse_sitemap
function. - Detect Language: Included language detection based on the page content using
langdetect
with thedetect
function. - Get JavaScript Content: Enabled retrieval and saving of JavaScript-rendered content using Playwright and Selenium with the
get_js_content
function.
Improvements
- Enhanced Logging and Retry Logic: Improved logging and retry mechanisms for more reliable handling of request failures.
Previous Releases
- v1.0.1: Fixed an issue with the multiple display of the application banner.
- v1.0: Initial release of ScrapEZ, including basic scraping methods.
Future Plans
- Add support for more advanced scraping techniques and further performance improvements.
License
- Licensed under a Creative Commons Attribution 4.0 International License.
Contribution Guidelines
- Please fork the repository and submit a pull request. You can also report issues or suggest new features on the GitHub issues page.