Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code optimizations #20

Open
LazaroHurtado opened this issue Mar 6, 2024 · 0 comments
Open

Code optimizations #20

LazaroHurtado opened this issue Mar 6, 2024 · 0 comments

Comments

@LazaroHurtado
Copy link
Contributor

LazaroHurtado commented Mar 6, 2024

There are a few optimizations we can make for the llm tester and I laid them out below:

  • the results_exists method checks if a task has finished for a specific context length and document depth by iterating over every file in results/. This can be optimized by looking for a specific file since we know the file name formatting being used.
  • the insert_needle method finds the most recent . token in the context and inserts the needle right after it. This search is done with a while loop that always overwrites tokens_new_context, which can be large. An optimization, which wont give much of a performance boost but still worth doing, is indexing directly to the . token after the search is complete.
  • the read_context_files method finds the length of the context, in tokens, for every file it has appended. Instead, we can find the length of the newest file's content to avoid tokenizing the same pieces of text.
  • Moving from asyncio.gather(*tasks) to using async with asyncio.TaskGroup() as tg as suggested here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant