Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AVX2 simd support on x86 #328

Merged
merged 1 commit into from
Nov 12, 2023
Merged

Add AVX2 simd support on x86 #328

merged 1 commit into from
Nov 12, 2023

Conversation

zherczeg
Copy link
Collaborator

The downside is SSE4.1 is the minimum for simd support. I don't see much perf difference with AVX2, but maybe it is just my machine.

@zherczeg
Copy link
Collaborator Author

Btw adding the following a the beginning of pcre2_jit_simd_inc.h can disable avx2 support:

#undef SLJIT_HAS_AVX2
#define SLJIT_HAS_AVX2 -1

This crude hack can be used for benchmarking the difference.

The downside is SSE4.1 is the minimum for simd support
@zherczeg zherczeg merged commit fbe045b into master Nov 12, 2023
12 checks passed
@zherczeg zherczeg deleted the avx2 branch November 12, 2023 08:03
@carenas
Copy link
Contributor

carenas commented Nov 12, 2023

So SSE2 -> AVX2 didn't show any performance improvements but will get the SIMD optimization disabled in CPUs that don't support SSE4.1.

How is this an improvement?. I am observing about 5% performance degradation in a system running 32bit Linux with AMD Athlon(tm) II X4 640, after I had to tweak the code so it will build.

I also presume that there might be power/cooling differences so at least updating the ChangeLog would be required to ensure that the impact is known and accounted for by the end users.

@zherczeg
Copy link
Collaborator Author

Good question. The code maintainability is definitely improved (supporting more cpus should be easier). Newer cpus should benefit from this (at some point at least). I don't think old systems are used for high perf computing.

@carenas
Copy link
Contributor

carenas commented Nov 12, 2023

Don't get me wrong, I can see how using SLJIT SIMD support would improve the code as you mention, what I am concerned about, is replacing working code with a copy of that, specially as doing so introduces performance regressions.

While we have data that shows the expected performance degradation in old CPUs because SIMD support cant be enabled there anymore, there is no data that shows any benefit in newer CPUs from this code, so wouldn't it be safer to keep using the old code for all of them?

Ideally, once SLJIT SIMD support is complete, then a version of the code that uses it instead could be produced and enabled in the newer CPUs that had the required hardware to use it, and mature that way, and at that point the old code could be safely retired, even if it makes old CPUs slower, since as you mention no one that cares about performance would be using those by then.

@zherczeg
Copy link
Collaborator Author

The key point is maintainability, and not just x86. The majority of the code is generic sljit instructions now, so ARM32/64 comes free, and s390x, longsoon can follow this method as well. The next step is supporting logical instructions in sljit, and then only a few instructions remain, which can be done by some platform specific helper. Then, a large amount of duplicated code can be dropped as well. I hope in a few years the simd part will be significantly simpler.

@carenas
Copy link
Contributor

carenas commented Nov 12, 2023

Indeed, I'll be far more comfortable replacing the arm64 code that is known to have issues and where there is a clear improvement.

LoongArch will be the next best option, if only because it will probably also encourage us to build a sound fallback that should allow evolving the sljit native code and validating it against a known working version. LoongArch has indeed 2 vector implementations similar to SSE2 vs AVX2, with the bigger one requiring an extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants