Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimise some of the bounds checks #34

Merged
merged 6 commits into from
Jan 11, 2024

Conversation

alindima
Copy link
Contributor

@alindima alindima commented Dec 18, 2023

This brings a performance improvement of 20-30%.

Where possible, compiler is aided to optimise away the bounds checks without any unsafe code. No unsafe code was used.

This PR does not touch AVX code, because when testing, I did not see a noticeable improvement for that case.

Numbers before:

~~~ [ Benchmark case: 1000000 bytes ] ~~~
Encode RUST (10 cycles): 397.182 ms
Decode RUST (10 cycles): 961.006 ms
Encode C++ (10 cycles): 221.003 ms
Decode C++ (10 cycles): 572.489 ms

~~~ [ Benchmark case: 2500000 bytes ] ~~~
Encode RUST (10 cycles): 1018.27 ms
Decode RUST (10 cycles): 2424.6 ms
Encode C++ (10 cycles): 602.386 ms
Decode C++ (10 cycles): 1459.45 ms

~~~ [ Benchmark case: 5000000 bytes ] ~~~
Encode RUST (10 cycles): 2043.65 ms
Decode RUST (10 cycles): 4813.27 ms
Encode C++ (10 cycles): 1208.36 ms
Decode C++ (10 cycles): 2892.14 ms

~~~ [ Benchmark case: 10000000 bytes ] ~~~
Encode RUST (10 cycles): 4095.26 ms
Decode RUST (10 cycles): 9.61965 s
Encode C++ (10 cycles): 2416.67 ms
Decode C++ (10 cycles): 5.76792 s

Numbers now:

~~~ [ Benchmark case: 1000000 bytes ] ~~~
Encode RUST (10 cycles): 335.291 ms -> 18.5% better than master
Decode RUST (10 cycles): 739.528 ms -> 30% better than master
Encode C++ (10 cycles): 211.608 ms
Decode C++ (10 cycles): 562.939 ms

~~~ [ Benchmark case: 2500000 bytes ] ~~~
Encode RUST (10 cycles): 855.59 ms -> 19% better than master
Decode RUST (10 cycles): 1830.64 ms -> 32% better than master
Encode C++ (10 cycles): 559.997 ms
Decode C++ (10 cycles): 1434.1 ms

~~~ [ Benchmark case: 5000000 bytes ] ~~~
Encode RUST (10 cycles): 1730.34 ms -> 18% better than master
Decode RUST (10 cycles): 3633.53 ms -> 32% better than master
Encode C++ (10 cycles): 1177.38 ms
Decode C++ (10 cycles): 2863.14 ms

~~~ [ Benchmark case: 10000000 bytes ] ~~~
Encode RUST (10 cycles): 3475.36 ms -> 17.8% better than master
Decode RUST (10 cycles): 7.25712 s -> 32.5% better than master
Encode C++ (10 cycles): 2372.91 ms
Decode C++ (10 cycles): 5.7262 s

The only thing preventing from this implementation being as fast as kagome's C++ impl are the few lines annotated with:

// TODO: Optimising bounds checks on this line will yield a great performance improvement.

I couldn't yet manage to get the compiler to ellide the bounds checks in those cases. Another way of achieving this would be to add a bit of unsafe code.

This brings a performance improvement of 40-100%,
making this implementation as fast as the C++ alternative in kagome.

Where possible, compiler is aided to optimise away the bounds checks without
any unsafe code. However, a fair amount of unsafe code was needed,
but it doesn't lower the security posture as the needed assertions
were already being made.

Signed-off-by: alindima <alin@parity.io>
@alindima alindima requested a review from ordian December 18, 2023 13:49
@alindima alindima changed the title optimise bounds checks optimise some of the bounds checks Jan 11, 2024
@ordian ordian merged commit 886be0e into master Jan 11, 2024
9 checks passed
@ordian ordian deleted the alindima/bounds-checks-optimisation branch January 11, 2024 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants