-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows 10] [C LIBRARY] : PNG decoding is slower than OpenCV #72
Comments
Are you configuring Visual Studio with |
If that doesn't help, can you attach the |
Oh, also, for MSVC, make sure that you're compiling an optimized build, not a debug build. I think this is the |
Thanks, I'm trying to configure Visual Studio with avx2. And maybe clang is indispensable. |
I configure Visual Studio with (/arch:AVX2), but it doesn't work. |
Does "it doesn't work" mean that it didn't get faster, or does it mean that you got a compiler error message, or does it mean something else? If it's an error message, can you copy/paste it here? |
It didn't get faster. |
OK. Does Is clang faster or is it also as slow? |
It is 1.2x faster than opencv with clang |
Hi. I tried it on large data. Program has some internal overhead, but anyways ... First dataset 1984x1984x1540/16bit grayscale (all times including overhead, series of 1540 images)): Second dataset 2048x2048x2048/8bit grayscale synthetic data, each PNG roughly 14kB - basically repeating b&w patterns. About /arch:AVX ... it may do something, but MSVC is very good at finding reasons why it won't optimize loops and reasons can be printed using /Qvec-report:2 option in C++/All options/Additional Options Bottleneck is obviously
And from https://docs.microsoft.com/en-us/cpp/error-messages/tool-errors/vectorizer-and-parallelizer-messages?view=msvc-170#BKMK_ReasonCode130x , 1301 = Loop stride isn't +1. Example of code which it can optimize (if outputtype is shorter or the same, otherwise it fails with code 1203, but code logic chooses outputtype that won't overflow)
Top-down function times for realistic dataset: https://i.imgur.com/UD5a7MF.jpg compiled with /02 /arch:AVX and comparison with other decoders: https://imgur.com/a/ZEtojo9 TL;DR: either write/generate code using AVX intristic instructions or don't pre-optimize it for MSVC. Windows Imaging Components seems fastest, but it works only on WIndows (since Vista, Seven ... idk) |
FWIW, this patch: diff --git a/release/c/wuffs-unsupported-snapshot.c b/release/c/wuffs-unsupported-snapshot.c
index 717414f8..ef2105cb 100644
--- a/release/c/wuffs-unsupported-snapshot.c
+++ b/release/c/wuffs-unsupported-snapshot.c
@@ -11743,13 +11743,8 @@ wuffs_base__io_writer__limited_copy_u32_from_history_fast(uint8_t** ptr_iop_w,
uint32_t distance) {
uint8_t* p = *ptr_iop_w;
uint8_t* q = p - distance;
- uint32_t n = length;
- for (; n >= 3; n -= 3) {
- *p++ = *q++;
- *p++ = *q++;
- *p++ = *q++;
- }
- for (; n; n--) {
+ size_t n = length;
+ for (size_t i = 0; i < n; i++) {
*p++ = *q++;
}
*ptr_iop_w = p; looks like your
In any case, I'm not sure if AVX-ness (or not) would really help here. The destination and source byte slices can overlap, often by only a few bytes, in which case you can't just do a simple memcpy 32 bytes at a atime. |
Wuffs should be able to decode to |
I don't have a Windows machine readily available, but according to https://godbolt.org/z/q4MfjzTPh and the https://imgur.com/UD5a7MF profile mentioned in #72, this could improve inner loop performance. Updates #72
I don't have MSVC myself, but for those who do, I'm curious if commit c226ed6 noticably improves PNG decode speed. |
I'm sorry, little busy this week, hopefully will get to this issue next week. |
@pavel-perina any news? |
Hello, thanks for helping:
I try to use wuffs to open png files within a c++ project. I use vs2017 to compile this code, but PNG decoding is slower than OpenCV.
OpenCV: 65ms
wuffs: 93ms
The text was updated successfully, but these errors were encountered: