Skip to content

Version 2.0

Latest
Compare
Choose a tag to compare
@jakk-er jakk-er released this 15 Sep 04:15
efdc57c

ScrapEZ Version 2.0

We are excited to announce Version 2.0 of ScrapEZ! This major update introduces a host of new features, improvements, and enhancements to make your web scraping tasks more efficient and effective. Here’s what’s new:

Major Changes

  • Introduced User-Agent Rotation: Added support for rotating user-agents to reduce the risk of being blocked by websites.
  • Enhanced Error Handling: Improved error handling and logging for better tracking of issues and more robust error management.
  • New Scraping Features: Added several new functionalities to enhance your scraping capabilities.
  • Improved Data Storage: Added the ability to store scraped data into a file for better organization and accessibility.
  • Updated User Interface: Expanded the menu with new scraping options and improved user prompts and input handling for a better user experience.

Detailed Changes

New Features

  • Extract Metadata: Added functionality to extract and display the title and description meta tags from a webpage using the get_metadata function.
  • Analyze Content: Introduced a feature to analyze and extract headers (h1, h2) and main content paragraphs with the get_content_analysis function.
  • Check Broken Links: Added a method to check and report broken links on a page with the check_links function.
  • Performance Metrics: Added functionality to measure page load time and page size using the get_performance_metrics function.
  • Handle Cookies: Implemented functionality to handle and retrieve cookies with the handle_cookies function.
  • Parse Sitemap: Added support to parse and retrieve URLs from the sitemap with the parse_sitemap function.
  • Detect Language: Included language detection based on the page content using langdetect with the detect function.
  • Get JavaScript Content: Enabled retrieval and saving of JavaScript-rendered content using Playwright and Selenium with the get_js_content function.

Improvements

  • Enhanced Logging and Retry Logic: Improved logging and retry mechanisms for more reliable handling of request failures.

Previous Releases

  • v1.0.1: Fixed an issue with the multiple display of the application banner.
  • v1.0: Initial release of ScrapEZ, including basic scraping methods.

Future Plans

  • Add support for more advanced scraping techniques and further performance improvements.

License

Contribution Guidelines

  • Please fork the repository and submit a pull request. You can also report issues or suggest new features on the GitHub issues page.