More control / settings from GPT Researcher to Tavily as a retriever #923

cwang · 2024-10-18T10:13:06Z

Is your feature request related to a problem? Please describe.
Right now the use of Tavily is pretty basic in the sense that many arguments in

https://docs.tavily.com/docs/python-sdk/tavily-search/api-reference#methods

are not been used. In particular, useful settings such as include_domains and exclude_domains are impossible to be set via GPTResearcher initialisation.

Describe the solution you'd like

If feature parity between retrievers are never going to be possible (which I think is the case), then find a way to allow setting of retriever-specific kwargs from top-level GPTResearcher initialisation phase.

I don't have a clear idea what's the best approach here though, open to suggestions.

Describe alternatives you've considered

Once we establish a pattern of setting additional arguments per retriever, it could help the use of all the retrievers going forward with a finer control of their behaviour.

Additional context

Happy to author the PR once we have a clear direction.

The text was updated successfully, but these errors were encountered:

assafelovic · 2024-10-18T13:27:46Z

Hey @cwang this is a great point and definitely agree we need to consider a way to allow this. My idea is to start building out features that might have feature parity. For example, including_domains and exclude_domains are very much likely to have feature parity across all search providers, and then a PR would include those options as env var on the top level gptr config and passed down to each of the different search provider based on their API. I know it's a bit tedious, but perhaps worth it. I do see great value in preserving feature parity to keep GPTR generic and not coupled to a specific provider. Lmk wdyt and would love your help with a PR!

Another including thought is that if certain API don't have it, we can catch it on the specific API level and throw a warning.

ElishaKay · 2024-11-03T11:11:18Z

Welcome @cwang,

Agreed - this is a common use case for me as well (especially when I want to extract context from official API docs & ask questions about the sources). It would be great to abstract this away to the GPTResearcher(config) stage.

What do you guys think about a nested object:

config = {
  retriever: {
   domains_to_include: x,
   domains_to_exclude: y,
  }
}

report = GPTResearcher(config)...

In my case, I'm using Google Search as the Retriever.
For that provider, the function of including and excluding domains is by appending a string to the query param, like so:

"machine learning (site:wikipedia.org OR site:medium.com) -site:buzzfeed.com -site:vogue.com"

Feel free to get the party started on your end with a PR.
I can build on your commits & also add the filter to the NextJS frontend.

Looking forward to it 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More control / settings from GPT Researcher to Tavily as a retriever #923

More control / settings from GPT Researcher to Tavily as a retriever #923

cwang commented Oct 18, 2024

assafelovic commented Oct 18, 2024 •

edited

Loading

ElishaKay commented Nov 3, 2024 •

edited

Loading

More control / settings from GPT Researcher to Tavily as a retriever #923

More control / settings from GPT Researcher to Tavily as a retriever #923

Comments

cwang commented Oct 18, 2024

assafelovic commented Oct 18, 2024 • edited Loading

ElishaKay commented Nov 3, 2024 • edited Loading

assafelovic commented Oct 18, 2024 •

edited

Loading

ElishaKay commented Nov 3, 2024 •

edited

Loading