How do I change the browser to Firefox? #772

chung1912 · 2024-10-28T13:10:30Z

How do I change the browser to Firefox?

SwapnilSonker · 2024-12-16T07:45:47Z

@chung1912 what kind of issue you are having specifically, want to know more about it.

SwapnilSonker · 2024-12-17T17:20:11Z

@VinciGit00 is there any detail about this issue? If there is any I want to work on this issue.

VinciGit00 · 2024-12-17T18:21:49Z

@SwapnilSonker Please add just Firefox on that with the docloader

SwapnilSonker · 2024-12-18T03:47:26Z

PR - #848
@VinciGit00, have a look at it and tell me if anything else has to be done about it.

PeriniM · 2025-01-12T12:25:50Z

Hey @chung1912 from the new version v1.36 you can change it with

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
        "model_tokens": 4096
    },
    "loader_kwargs": {
        "backend": "selenium",
        "browser_name": "firefox"
    },
    "verbose": True,
    "headless": False
}

chung1912 · 2025-01-12T12:49:50Z

Hey @chung1912 from the new version v1.36 you can change it with

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
        "model_tokens": 4096
    },
    "loader_kwargs": {
        "backend": "selenium",
        "browser_name": "firefox"
    },
    "verbose": True,
    "headless": False
}

AttributeError: 'ChromiumLoader' object has no attribute 'ascrape_selenium'

SwapnilSonker · 2025-01-13T20:06:07Z

@chung1912 looking into it, will fix the bug most probably.

mark-antal-csizmadia · 2025-01-20T22:42:45Z

Hey! Great package, good experience using it.

I think the issue is that in the ChromiumLoader's lazy_load and alazy_load function, when the getattr method is used to get the function for a given backend (scraping_fn), this scraping_fn method is always only called with the url and the browser_name parameter is never passed from the ChromiumLoader's self namespace. You can see it here.

So I guess, in the code below:

def lazy_load(self) -> Iterator[Document]:
        """
        Lazily load text content from the provided URLs.

        This method yields Documents one at a time as they're scraped,
        instead of waiting to scrape all URLs before returning.

        Yields:
            Document: The scraped content encapsulated within a Document object.
        """
        scraping_fn = (
            self.ascrape_with_js_support
            if self.requires_js_support
            else getattr(self, f"ascrape_{self.backend}")
        )

        for url in self.urls:
            html_content = asyncio.run(scraping_fn(url))
            metadata = {"source": url}
            yield Document(page_content=html_content, metadata=metadata)

the scraping_fn = ( self.ascrape_with_js_support if self.requires_js_support else getattr(self, f"ascrape_{self.backend}") ) line should be changed so that for some of the scraping functions such as ascrape_playwright (but not ascrape_undetected_chromedriver), the browser_name is also passed (so it can for instance be set to firefox).

Great work, thanks a lot!

Edit

A possible temporary solution while the fix is on the way is to patch the scrapegraphai.docloaders.chromium.ChromiumLoader.lazy_load function yourself. This function is an intermediary between the backend-specific scraper functions and your code. When I look at the code, the firefox browser is meant to be used with both selenium and playwright - I'll use playwright in my code below. So for instance, if you add a new file called my_patch.py with the code below:

""" This is a patch for the ScrapeGraphAI library. As soon as a fix is in the main library, this file can be removed. 
See more at: https://github.com/ScrapeGraphAI/Scrapegraph-ai/issues/772
Particularly, note the comment: https://github.com/ScrapeGraphAI/Scrapegraph-ai/issues/772#issuecomment-2603320393
"""
import asyncio
from functools import partial
from typing import Iterator
from langchain_core.documents import Document


def lazy_load_patched(self) -> Iterator[Document]:
    """
    Patches https://github.com/ScrapeGraphAI/Scrapegraph-ai/blob/31087937bef20eadcb83e28688077eff13ed2780/scrapegraphai/docloaders/chromium.py#L438
    so that self.browser_name is passed to ascrape_playwright.
    See discussion at https://github.com/ScrapeGraphAI/Scrapegraph-ai/issues/772#issuecomment-2603320393.
    """
    scraping_fn = (
        self.ascrape_with_js_support
        if self.requires_js_support
        else getattr(self, f"ascrape_{self.backend}")
    )
    
    # the patch: the partial function is used to pass the browser_name argument
    # to ascrape_playwright
    if self.backend == "playwright":
        scraping_fn = partial(scraping_fn, browser_name=self.browser_name)
    # end of patch

    for url in self.urls:
        html_content = asyncio.run(scraping_fn(url))
        metadata = {"source": url}
        yield Document(page_content=html_content, metadata=metadata)

and then whenever you call the scraper (for intance in a main.py script) you patch the mentioned method as shown below:

from unittest.mock import patch
from my_patch import lazy_load_patched
from scrapegraphai.graphs import SmartScraperGraph

# your code, define prompt, source, schema, etc.

graph_config = {
       # your config such as llm name, temperature, headless, etc.
        "loader_kwargs": {
            "backend": "playwright",
            "browser_name": "firefox"
        }
    }
smart_scraper_graph = SmartScraperGraph(
        prompt=prompt,
        source=source,
        config=config,
        schema=schema
    )

with patch('scrapegraphai.docloaders.chromium.ChromiumLoader.lazy_load', new=lazy_load_patched):
    some_data = smart_scraper_graph.run()

This should patch the original code so that the firefox browser can be used with playwright backend. Feel free to modify the code to accomodate your needs.

I hope this helps! Happy hacking!

SwapnilSonker added a commit to SwapnilSonker/Scrapegraph-ai that referenced this issue Dec 18, 2024

ScrapeGraphAI#772 added functionality to change browser to firefox

4914928

SwapnilSonker mentioned this issue Dec 18, 2024

#772 added functionality to change browser to firefox #848

Merged

PeriniM mentioned this issue Jan 12, 2025

New Features, Examples Refactoring and Bug Fix #879

Merged

PeriniM closed this as completed Jan 12, 2025

PeriniM reopened this Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I change the browser to Firefox? #772

How do I change the browser to Firefox? #772

chung1912 commented Oct 28, 2024

SwapnilSonker commented Dec 16, 2024

SwapnilSonker commented Dec 17, 2024

VinciGit00 commented Dec 17, 2024

SwapnilSonker commented Dec 18, 2024

PeriniM commented Jan 12, 2025

chung1912 commented Jan 12, 2025

SwapnilSonker commented Jan 13, 2025

mark-antal-csizmadia commented Jan 20, 2025 •

edited

Loading

How do I change the browser to Firefox? #772

How do I change the browser to Firefox? #772

Comments

chung1912 commented Oct 28, 2024

SwapnilSonker commented Dec 16, 2024

SwapnilSonker commented Dec 17, 2024

VinciGit00 commented Dec 17, 2024

SwapnilSonker commented Dec 18, 2024

PeriniM commented Jan 12, 2025

chung1912 commented Jan 12, 2025

SwapnilSonker commented Jan 13, 2025

mark-antal-csizmadia commented Jan 20, 2025 • edited Loading

Edit

mark-antal-csizmadia commented Jan 20, 2025 •

edited

Loading