Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm 用到的 amdkfd 驱动在 6.6 内核上的问题修复 #1641

Open
xmzzz opened this issue Jan 22, 2025 · 1 comment
Open

ROCm 用到的 amdkfd 驱动在 6.6 内核上的问题修复 #1641

xmzzz opened this issue Jan 22, 2025 · 1 comment

Comments

@xmzzz
Copy link
Contributor

xmzzz commented Jan 22, 2025

说明

问题背景:

该驱动程序在内核目录 drivers/gpu/drm/amd/amdkfd,在 openEuler 2403 SP1 镜像中使用 7900xtx GPU 测试 ROCm 时会触发内核态错误,需要排查和修复该驱动对 RISC-V 架构的支持缺陷。

所需技能:

其它:

@xmzzz
Copy link
Contributor Author

xmzzz commented Jan 22, 2025

Image

Image

LOONG:

munmap(0x7ffff3805000, 4096)            = 0
munmap(0x7ffff3808000, 8192)            = 0
ioctl(3, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x7ffffbc0d9f8) = 0
mmap(0x7ffff3806000, 8192, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 3 /dev/kfd, 0xecae800000000000) = 0x7ffff3806000
ioctl(3, AMDKFD_IOC_MAP_MEMORY_TO_GPU, 0x7ffffbc0da40) = 0
ioctl(3, AMDKFD_IOC_CREATE_EVENT, 0x7ffffbc0de00) = 0
ioctl(3, AMDKFD_IOC_CREATE_EVENT, 0x7ffffbc0de00) = 0
ioctl(3, AMDKFD_IOC_GET_CLOCK_COUNTERS, 0x7ffffbc0e180) = 0


RV:

munmap(0x3fb36c1000, 4096)              = 0
munmap(0x3fb36c4000, 8192)              = 0
ioctl(3, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, 0x3ff04314d0) = 0
mmap(0x3fb36c2000, 8192, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 3 /dev/kfd, 0xef18800000000000) = -1 EOVERFLOW (Value too large for defined data type)



相关区域 Linux内核
mm/mmap.c
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:kfd_mmap (KFD_MMAP_TYPE_DOORBELL)
drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c:kfd_doorbell_mmap

上面是已知的部分信息,更多信息可以看 discord 群组里的讨论。
如果对相关驱动比较熟悉,可以重点看下 wptr 相关的部分,来自龙芯社区的老师分享的信息:
https://elixir.bootlin.com/linux/v6.6.67/source/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c#L344

另外可以联系刘鑫老师申请远程调试环境

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants