Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use also upper counter for more accurate benchmarking #22

Open
giordano opened this issue Jul 21, 2023 · 0 comments
Open

Use also upper counter for more accurate benchmarking #22

giordano opened this issue Jul 21, 2023 · 0 comments
Labels
code generation Related to GPUCompiler code generation infrastructure

Comments

@giordano
Copy link
Collaborator

We can use __builtin_ipu_get_scount_u, together with __builtin_ipu_get_scount_l, to get the full 64-bit cycle counter. The challenge is that when calling both builtins in a row (ideally first the upper and the lower) may overflow the lower and flip the upper, so we need to manually deal with the case where the lower counter is less than 12 (or 6?).

@giordano giordano added the code generation Related to GPUCompiler code generation infrastructure label Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code generation Related to GPUCompiler code generation infrastructure
Projects
None yet
Development

No branches or pull requests

1 participant