-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interning strings #139
Comments
That's a helpful idea thanks! |
How do you choose which strings get interned? |
I was thinking users would choose depending on the kind of data they're loading. Basically, P.S. I have no immediate need for this feature and don't plan to work on it myself in the foreseeable future, I only opened this issue to share the idea. |
I've been reading up on interning here: https://stackabuse.com/guide-to-string-interning-in-python/ and the disadvantages might slow down the decoder. It depends very much on your data, If there are multiple dicts with duplicate keys and you need key lookup to be fast, then it makes sense to do this. Otherwise, storing dict keys in the I'll have to do some testing. Thanks for bringing this up @Changaco it was a good excuse for me to learn more cPython internals. |
The possibilities of significant slowdowns I can think of are:
Of course these caveats can be documented so that users can choose the best options for their use cases. |
It would be nice if
CBORDecoder
had an option to intern decoded strings, especially dict keys since they're the most likely to appear multiple times both within the CBOR data and in the Python code exploiting that data. The C version of the module would provide a small performance boost by calling PyUnicode_InternInPlace instead of the Python function.The text was updated successfully, but these errors were encountered: