Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bypass OpenAI server overload #11

Closed
wants to merge 41 commits into from
Closed

Bypass OpenAI server overload #11

wants to merge 41 commits into from

Conversation

iQuxLE
Copy link
Member

@iQuxLE iQuxLE commented Dec 2, 2023

When loading ontologies into CurateGPT the insertion of the data into chromaDB is very often interrupted because of a server overload on the API side.

openai.error.ServiceUnavailableError: The server is overloaded or not ready yet.

Implementing the exponential_backoff_request helped me to bypass this by trying again with an additional small sleep every time it would fail.
Its not a fancy solution but it can get the job done.

Additionally, when playing around for me it helped adding less sleep (from 60 to 10s) if the doc length reaches more than 3000000 chars.

Also a batch size of 1000 worked better than a batch size of 100 for me. Having smaller batch sizes than 100 mostly ended in this:
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

This was all pretty experimental and it might also depend on other factors.

@cmungall
@justaddcoffee

cmungall added 30 commits May 26, 2023 17:56
Added flow between UI components.

Added in_memory_adapter.

Simplified collection model.
additional evaluation methods
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:bad link http://catalogueoflife.org/content/terms-use for catalogueoflife
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link https://civic.genome.wustl.edu/about for civicdb
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link TODO for dictybase
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:bad link http://flybase.org/wiki/FlyBase:About#FlyBase_Copyright for flybase
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link ftp://ftp.nextprot.org/pub/README for nextprot
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link TODO for nextstrain
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:bad link http://www.orphadata.org/cgi-bin/contact.php for orphanet-academic
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:bad link http://www.orphadata.org/cgi-bin/inc/legal.inc.php for orphanet-open
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:bad link http://www.rhea-db.org/licensedisclaimer for rhea
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link http://www.supfam.org/about for supfam
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link ftp://ftp.jcvi.org/pub/data/TIGR
@justaddcoffee
Copy link
Member

+1 I ran into this issue that @iQuxLE mentions above:

openai.error.ServiceUnavailableError: The server is overloaded or not ready yet.

a fix here would help me a lot, as I can't seem to get make all to complete b/c of this issue

Copy link
Member

@cmungall cmungall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I wish chromadb would handle this natively...

Can you remove the .idea file?

@cmungall
Copy link
Member

Hi Carlo @iQuxLE!

It looks like the .idea changes are still a part of the diff - ideally a PR only has a single concern, see the monarch/bbop best practice doc https://berkeleybop.github.io/best_practice

If it's too much of a hassle to change we can merge and then delete later but I prefer to keep the history clear

@justaddcoffee
Copy link
Member

justaddcoffee commented Jan 17, 2024

@iQuxLE could you remove the .idea/ per @cmungall's comment above?

Also on this ticket, @iQuxLE is observing a different error from OpenAI now when retrieving LLM embeddings - a 500 server error. @iQuxLE possibly we could catch this error too when we are doing the exponential backoff

@iQuxLE iQuxLE closed this Jan 22, 2024
@iQuxLE
Copy link
Member Author

iQuxLE commented Jan 22, 2024

Hi @justaddcoffee and @cmungall,

It seemed to be fairly complicated. I thought I figured it out, but I think I broke something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants