Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized UTF-8 #1051

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Optimized UTF-8 #1051

wants to merge 4 commits into from

Conversation

bettio
Copy link
Collaborator

@bettio bettio commented Feb 18, 2024

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@bettio bettio changed the base branch from main to release-0.6 February 18, 2024 00:04
@bettio bettio changed the title Optimized utf8 WIP Optimized utf8 Feb 18, 2024
@bettio bettio changed the base branch from release-0.6 to main January 22, 2025 18:11
@bettio bettio changed the title WIP Optimized utf8 Optimized UTF-8 Jan 22, 2025
@bettio bettio force-pushed the optimized-utf8 branch 2 times, most recently from 6982bf3 to 1193de2 Compare January 22, 2025 18:26
Add utf8 decoder taken from
http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ by Björn Höhrmann, under
MIT license.

Signed-off-by: Davide Bettio <davide@uninstall.it>
This change removes a shift operation and shorten the table.

Signed-off-by: Davide Bettio <davide@uninstall.it>
There was some code duplicated for validating UTF-8 strings, that was
even performing full decode. Use instead new `unicode_is_valid_utf8_buf`
function that is also based on highly optimized code.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Use it as a drop-in replacement instead of bitstring_utf8_decode.

This function is based on highly optimized UTF-8 decode found here:
http://bjoern.hoehrmann.de/utf-8/decoder/dfa/

Signed-off-by: Davide Bettio <davide@uninstall.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant