Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations and readability improvements #14

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

LazaroHurtado
Copy link
Contributor

@LazaroHurtado LazaroHurtado commented Mar 5, 2024

This PR introduces several small optimizations and changes as described below:

  • added some helper methods to llm_needle_haystack_tester to increase readability
  • switched from asyncio.gather(*tasks) to using async with asyncio.TaskGroup() as tg as suggested here
  • small optimizations when it comes to determining if a result exists, find the most recent '.' token from an insertion point, and loading context files

This PR also targets this issue: #20

cc: @pavelkraleu , @kedarchandrayan
sorry for the tag but I can't assign reviewers so just trying to increase visibility :)

Comment on lines 230 to 234
filename = self.result_file_format.format(model_name=self.model_name,
context_length=context_length,
depth_percent=depth_percent)
file_path = os.path.join(self.results_dir, f'{filename}.json')
if not os.path.exists(file_path):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know the exact format of the filename, so instead of looping over every file lets see if the specific file exists, if it does then check the version number else return false.

@kedarchandrayan
Copy link
Collaborator

Hi @LazaroHurtado, We'll address the points here separately and raise individual issues for each relevant one. We'll send these issues to you, allowing you to select the changes and create separate pull requests (PRs) accordingly. This approach ensures that each PR serves a single purpose.

@LazaroHurtado
Copy link
Contributor Author

@kedarchandrayan I have created issues and broken this PR up. This PR now only includes work for optimizations and readability (readability doesn't deserve an issue ticket)

@kedarchandrayan
Copy link
Collaborator

@LazaroHurtado , I said we will raise the issues and let you know 😄

I like the enthusiasm but I am strongly opposed to suggesting opinionated changes. Every dev will have their own opinion and if we allow such changes, they will keep coming endlessly. For example, there is not much difference between following versions of code:

Version 1:

context += f.read()

Version 2:

file_content = f.read()

context += file_content

I would suggest that you go through your proposed changes again and remove such proposals. Let the changes be ones which add definite value. Please let me know if you agree with the point.

@kedarchandrayan kedarchandrayan changed the title small optimizations Optimizations and readability improvements Mar 6, 2024
@LazaroHurtado
Copy link
Contributor Author

LazaroHurtado commented Mar 6, 2024

@LazaroHurtado , I said we will raise the issues and let you know 😄

I like the enthusiasm but I am strongly opposed to suggesting opinionated changes. Every dev will have their own opinion and if we allow such changes, they will keep coming endlessly. For example, there is not much difference between following versions of code:

Version 1:

context += f.read()

Version 2:

file_content = f.read()

context += file_content

I would suggest that you go through your proposed changes again and remove such proposals. Let the changes be ones which add definite value. Please let me know if you agree with the point.

Issues are normally opened by users of the open source project so that didnt seem too far fetched. Some changes in this PR do include readability improves which is subjective but im always open to discuss them and the value they bring. Most changes include optimizations for speed or updating how a library is used to follow the recommended docs, both provide definite and measurable value. For the example you pointed out, file_content was used to not have to call f.read() twice. I understand somethings might come as opinionated but code cleanliness and removing code smells must be taken into account to obtain a code base that is maintainable in the long run

@kedarchandrayan
Copy link
Collaborator

Sure @LazaroHurtado! I never meant that all the changes are opinionated. I am sorry if my comment meant that in anyways.

And I am also 100% up for code cleanliness.

Also, I missed the way you have used file_content. I will go through the PR in detail again.

@kedarchandrayan
Copy link
Collaborator

Hi @LazaroHurtado, a big change was made today (#33) that has made this pull request outdated. I apologize for the inconvenience. If it is easier, please feel free to open a new pull request with the changes using the latest code.

@LazaroHurtado
Copy link
Contributor Author

Hi @LazaroHurtado, a big change was made today (#33) that has made this pull request outdated. I apologize for the inconvenience. If it is easier, please feel free to open a new pull request with the changes using the latest code.

No worries, I have update this PR to have no conflicts with the recent changes. Awesome work on #33 btw!

@pavelkraleu pavelkraleu added the enhancement New feature or request label Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants