Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert UTF8toUTF16 to TryUTF8toUTF16 #1024

Open
paulirwin opened this issue Nov 15, 2024 · 1 comment · May be fixed by #1057
Open

Convert UTF8toUTF16 to TryUTF8toUTF16 #1024

paulirwin opened this issue Nov 15, 2024 · 1 comment · May be fixed by #1057
Labels
is:enhancement New feature or request pri:normal

Comments

@paulirwin
Copy link
Contributor

I also noticed when searching for IndexOutOfRangeException that UTF8toUTF16 can throw it, also. This is unusual to have to deal with, so we should change this method to TryUTF8toUTF16 so we can eliminate this exception that is clearly exclusively meant for control flow when the UTF8 format is invalid. It is caught in several places to do a fallback, and we should fix this. Note that we will most likely be converting the byte[] overload to use ReadOnlySpan<byte> and getting rid of the offset and length parameters.

Originally posted by @NightOwl888 in #1018 (comment)

@paulirwin paulirwin changed the title I was looking at BitVector to see whether it is sensible to make these into this[int] { get; set; } instead of (or in addition to) Get(int) and Set(int). Being that there is a GetAndSet() method (much like the atomic classes), I am thinking not. Convert UTF8toUTF16 to TryUTF8toUTF16 Nov 15, 2024
@paulirwin paulirwin mentioned this issue Nov 15, 2024
4 tasks
@paulirwin paulirwin added this to the 4.8.0-beta00018 milestone Nov 18, 2024
@paulirwin paulirwin added is:enhancement New feature or request pri:normal labels Nov 21, 2024
paulirwin added a commit to paulirwin/lucene.net that referenced this issue Dec 4, 2024
@paulirwin
Copy link
Contributor Author

@NightOwl888 Can you point me to where "It is caught in several places to do a fallback"? I reviewed all usages of UTF8toUTF16(byte[], int, int, CharsRef), as well as UTF8toUTF16(BytesRef, CharsRef) that calls it, and BytesRef.Utf8ToString() that calls it as well, and the only case I see is in BlockTreeTermsWriter.PendingBlock.PendingBlocksFormatter which catches IndexOutOfRangeException if thrown by PendingBlock.ToString(). I have fixed that case, but since you mentioned "several" I want to make sure I'm not missing some.

paulirwin added a commit to paulirwin/lucene.net that referenced this issue Dec 17, 2024
paulirwin added a commit to paulirwin/lucene.net that referenced this issue Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:enhancement New feature or request pri:normal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant