-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RVV] add qs8-dwconv support for risc-v #7638
base: master
Are you sure you want to change the base?
Conversation
ken-unger
commented
Jan 3, 2025
- Added support for qs8-dwconv and qs8-qc8w-dwconv for RVV
- Generator script can support qu8 but I've left those kernels out given past comments on qu8 being deprecated.
- Tested on qemu and Spacemit K1.
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
src/qs8-dwconv/unipass-rvv.c.in
Outdated
vint32m${LMUL}_t vacc = __riscv_vle32_v_i32m${LMUL}(w, vl); | ||
w = (const void*) ((uintptr_t) w + vlmax * sizeof(int32_t)); | ||
|
||
for (int k=0; k<${KERNEL_TILE}; k++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code style... should have space around binary operators. check all operators have a space around them
for (int k = 0; k < ${KERNEL_TILE}; k++) {
typically we use size_t for most integers... I'd check if other kernels use size_t or int
personal nit - ++k preferred
for (size_t k = 0; k < ${KERNEL_TILE}; ++k) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XNNPACK has a .clang-format
file, would it suffice just to follow its suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall ok. can you quote a performance number compared to scalar?
would prefer to see variations for LMUL so we can test against a few variations, depending on the exact RISCV code and choose the best.
Although qu8 is being discouraged, it hasnt gone away.
In general QD8 is a preferred format... but we dont have a dwconv for that.
5x5 seems to be rare... only on mobilenet v3. 3x3 is far more common.
On many cpus 5x5 would be an issue for register pressure.
In terms of performance are you referring to the qs8-dwconv-bench results? Below is a snippet comparing the mobilenet v1 subset on a K1 (vlen=256). Around 10x faster for this subset. Running ./qs8-dwconv-bench
|
In general my experience has been that a higher LMUL is always preferred. I wasn't completely sure about the philosophy in adding the non-production kernels, so committed just the production ones. I've also been running within tflite (benchmark_model) to see the net result. For mobilenet V1, and on the same K1 platform, running single threaded, we were previously at 105ms and now at 10ms. [Node type] [count] [avg ms] |
@fbarchard I'll do the coding style fix and also add qu8 support to this PR since that touches mostly the same files. |
New commit added;
|