Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x64: utils: jit_io_helper: fix xf16 store from Xmm (fixes MFDNN-12635) #2509

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tczeszun
Copy link
Contributor

This is a short fix for MFDNN-12635.

Function jit_io_helper_t<Vmm>::store_bf16 doesn't support Xmm, that's why it crashed when using bf16 with blocked format with channels blocked by 4.
I changed a way it works to use jit_io_helper_t<Vmm>::store_byte_by_byte function instead.
Though, I have other idea that maybe using store_bf16 could work if we introduced another Opmask to filter out the lower half of Xmm. It would require some changes to jit_io_helper_t class, so I prefer to not submit it now as a bugfix, but after some time (as it requires some analysis), that's why I left a TODO comment.

Also I'm adding reorder regression input to benchdnn, as it was likely not covered on CPU side.

@tczeszun tczeszun requested review from a team as code owners January 24, 2025 18:33
@github-actions github-actions bot added platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 component:tests Codeowner: @oneapi-src/onednn-arch labels Jan 24, 2025
@tczeszun tczeszun added the bug A confirmed library bug label Jan 24, 2025
@tczeszun
Copy link
Contributor Author

make test
enable benchdnn_nightly
disable test_device_gpu

host_->vcvtneps2bf16(cvt_lower_vmm, vmm, Xbyak::VexEncoding);
host_->vcvtneps2bf16(cvt_lower_vmm, vmm,
mayiuse(avx512_core) ? Xbyak::EvexEncoding
: Xbyak::VexEncoding);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come an instruction that supports only Evex encoding has an option for Vex encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, but according to xbyak it has parameter for encoding:
https://github.com/oneapi-src/oneDNN/blob/main/third_party/xbyak/xbyak_mnemonic.h#L1260
And I see it's not the only place where Vex encoding is used, for example pooling:
https://github.com/oneapi-src/oneDNN/blob/main/src/cpu/x64/jit_uni_pool_kernel.cpp#L931

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should limit that instruction to Evex only, regardless if xbyak provides an opportunity for encoding argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do separate PR for this, as it occurs in many files.


# Test bf16 with aBcde4b format
--reset
--skip-impl=ref,simple # ! test jit version only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--skip-impl=ref,simple # ! test jit version only
--skip-impl=simple #skip non-jit version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tczeszun tczeszun force-pushed the tczeszun/fix_io_helper_xf16_xmm branch from c1c72ef to 3d10aca Compare January 24, 2025 19:45
@tczeszun
Copy link
Contributor Author

make test
enable benchdnn_nightly
disable test_device_gpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A confirmed library bug component:tests Codeowner: @oneapi-src/onednn-arch platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants