-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation performance extremely poor #818
Comments
Seems similar to #790 |
Hi @billie-alsup, thanks for using the spdx-tools extensively enough to notice such issues! As @jspeed-meyers pointed out, this has already been raised but not yet fixed. There are two PRs open currently (#792 and #800) that could benefit from your input. |
I am currently hot patching the code with new functions that build a couple indices (python dictionaries) at the start of the validation, and use those to check for existence, rather than sequential scans through arrays. I don't know if this is the best approach, as opposed to maintaining the dictionaries automatically, but it seemed like the safest approach. I can look into contributing such changes, but I will need to get formal approval from our company legal team first. I should be able to comment on PRs in advance of approval though. |
I can start contributing now. I do not seem to have permissions to assign issues to myself though. I thought I would start with a really simple change to introduce myself to the workflow. #819 should be quite simple for example. |
Hi @billie-alsup, glad to hear that! :) |
I am finding that validation is extremely slow (taking 3+ hours to validate a document that took a fraction of the time to create). A sample cProfile run shows
The implementation seems to always use sequential scanning over arrays when performing validation, when it might more sense to create an index for a particular array, and use that during validation. Is there any plan to improve performance, at least the validation performance? There the necessary indices for speed improvement could be created dynamically and used during validation. Having such indices available while building the document would be fine as well, although would put some restrictions on how the document is updated. I realize that the arrays are providing a strict ordering that should probably be maintained, so it is not as simple as replacing the arrays with a dictionary.
In the example profile, the function get_list_of_all_spdx_ids is called every time is_spdx_id_present_in_document is invoked. This is a prime example where generating an index at the start of validation and using it during validation would be a win. Perhaps we could add some private members to Document class as the index in this case?
The text was updated successfully, but these errors were encountered: