-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AVX2 simd support on x86 #328
Conversation
Btw adding the following a the beginning of pcre2_jit_simd_inc.h can disable avx2 support:
This crude hack can be used for benchmarking the difference. |
The downside is SSE4.1 is the minimum for simd support
So SSE2 -> AVX2 didn't show any performance improvements but will get the SIMD optimization disabled in CPUs that don't support SSE4.1. How is this an improvement?. I am observing about 5% performance degradation in a system running 32bit Linux with AMD Athlon(tm) II X4 640, after I had to tweak the code so it will build. I also presume that there might be power/cooling differences so at least updating the ChangeLog would be required to ensure that the impact is known and accounted for by the end users. |
Good question. The code maintainability is definitely improved (supporting more cpus should be easier). Newer cpus should benefit from this (at some point at least). I don't think old systems are used for high perf computing. |
Don't get me wrong, I can see how using SLJIT SIMD support would improve the code as you mention, what I am concerned about, is replacing working code with a copy of that, specially as doing so introduces performance regressions. While we have data that shows the expected performance degradation in old CPUs because SIMD support cant be enabled there anymore, there is no data that shows any benefit in newer CPUs from this code, so wouldn't it be safer to keep using the old code for all of them? Ideally, once SLJIT SIMD support is complete, then a version of the code that uses it instead could be produced and enabled in the newer CPUs that had the required hardware to use it, and mature that way, and at that point the old code could be safely retired, even if it makes old CPUs slower, since as you mention no one that cares about performance would be using those by then. |
The key point is maintainability, and not just x86. The majority of the code is generic sljit instructions now, so ARM32/64 comes free, and s390x, longsoon can follow this method as well. The next step is supporting logical instructions in sljit, and then only a few instructions remain, which can be done by some platform specific helper. Then, a large amount of duplicated code can be dropped as well. I hope in a few years the simd part will be significantly simpler. |
Indeed, I'll be far more comfortable replacing the arm64 code that is known to have issues and where there is a clear improvement. LoongArch will be the next best option, if only because it will probably also encourage us to build a sound fallback that should allow evolving the sljit native code and validating it against a known working version. LoongArch has indeed 2 vector implementations similar to SSE2 vs AVX2, with the bigger one requiring an extension. |
The downside is SSE4.1 is the minimum for simd support. I don't see much perf difference with AVX2, but maybe it is just my machine.