-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bypass OpenAI server overload #11
Conversation
Added flow between UI components. Added in_memory_adapter. Simplified collection model.
additional evaluation methods
Adding a gsheets wrapper
WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:bad link http://catalogueoflife.org/content/terms-use for catalogueoflife WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link https://civic.genome.wustl.edu/about for civicdb WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link TODO for dictybase WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:bad link http://flybase.org/wiki/FlyBase:About#FlyBase_Copyright for flybase WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link ftp://ftp.nextprot.org/pub/README for nextprot WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link TODO for nextstrain WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:bad link http://www.orphadata.org/cgi-bin/contact.php for orphanet-academic WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:bad link http://www.orphadata.org/cgi-bin/inc/legal.inc.php for orphanet-open WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:bad link http://www.rhea-db.org/licensedisclaimer for rhea WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link http://www.supfam.org/about for supfam WARNING:curate_gpt.wrappers.legal.reusabledata_wrapper:base link ftp://ftp.jcvi.org/pub/data/TIGR
Adding a reusabledata.org wrapper
Adding a wrapper for mediadive
removing replicate
Refactored notebooks
+1 I ran into this issue that @iQuxLE mentions above:
a fix here would help me a lot, as I can't seem to get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I wish chromadb would handle this natively...
Can you remove the .idea file?
Hi Carlo @iQuxLE! It looks like the .idea changes are still a part of the diff - ideally a PR only has a single concern, see the monarch/bbop best practice doc https://berkeleybop.github.io/best_practice If it's too much of a hassle to change we can merge and then delete later but I prefer to keep the history clear |
Hi @justaddcoffee and @cmungall, It seemed to be fairly complicated. I thought I figured it out, but I think I broke something |
When loading ontologies into CurateGPT the insertion of the data into chromaDB is very often interrupted because of a server overload on the API side.
openai.error.ServiceUnavailableError: The server is overloaded or not ready yet.
Implementing the
exponential_backoff_request
helped me to bypass this by trying again with an additional small sleep every time it would fail.Its not a fancy solution but it can get the job done.
Additionally, when playing around for me it helped adding less sleep (from 60 to 10s) if the doc length reaches more than 3000000 chars.
Also a batch size of 1000 worked better than a batch size of 100 for me. Having smaller batch sizes than 100 mostly ended in this:
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
This was all pretty experimental and it might also depend on other factors.
@cmungall
@justaddcoffee