Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More control / settings from GPT Researcher to Tavily as a retriever #923

Open
cwang opened this issue Oct 18, 2024 · 2 comments
Open

More control / settings from GPT Researcher to Tavily as a retriever #923

cwang opened this issue Oct 18, 2024 · 2 comments

Comments

@cwang
Copy link

cwang commented Oct 18, 2024

Is your feature request related to a problem? Please describe.
Right now the use of Tavily is pretty basic in the sense that many arguments in

https://docs.tavily.com/docs/python-sdk/tavily-search/api-reference#methods

are not been used. In particular, useful settings such as include_domains and exclude_domains are impossible to be set via GPTResearcher initialisation.

Describe the solution you'd like

If feature parity between retrievers are never going to be possible (which I think is the case), then find a way to allow setting of retriever-specific kwargs from top-level GPTResearcher initialisation phase.

I don't have a clear idea what's the best approach here though, open to suggestions.

Describe alternatives you've considered

Once we establish a pattern of setting additional arguments per retriever, it could help the use of all the retrievers going forward with a finer control of their behaviour.

Additional context

Happy to author the PR once we have a clear direction.

@assafelovic
Copy link
Owner

assafelovic commented Oct 18, 2024

Hey @cwang this is a great point and definitely agree we need to consider a way to allow this. My idea is to start building out features that might have feature parity. For example, including_domains and exclude_domains are very much likely to have feature parity across all search providers, and then a PR would include those options as env var on the top level gptr config and passed down to each of the different search provider based on their API. I know it's a bit tedious, but perhaps worth it. I do see great value in preserving feature parity to keep GPTR generic and not coupled to a specific provider. Lmk wdyt and would love your help with a PR!

Another including thought is that if certain API don't have it, we can catch it on the specific API level and throw a warning.

@ElishaKay
Copy link
Collaborator

ElishaKay commented Nov 3, 2024

Welcome @cwang,

Agreed - this is a common use case for me as well (especially when I want to extract context from official API docs & ask questions about the sources). It would be great to abstract this away to the GPTResearcher(config) stage.

What do you guys think about a nested object:

config = {
  retriever: {
   domains_to_include: x,
   domains_to_exclude: y,
  }
}

report = GPTResearcher(config)...

In my case, I'm using Google Search as the Retriever.
For that provider, the function of including and excluding domains is by appending a string to the query param, like so:

"machine learning (site:wikipedia.org OR site:medium.com) -site:buzzfeed.com -site:vogue.com"

Feel free to get the party started on your end with a PR.
I can build on your commits & also add the filter to the NextJS frontend.

Looking forward to it 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants