-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Refactor memset()
to be generic
#3577
base: nightly
Are you sure you want to change the base?
[stdlib] Refactor memset()
to be generic
#3577
Conversation
…eneric memset and adding helpers to DType Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
String
__add__
with StringSlice
and refactor List.resize()
String
__add__
with StringSlice
and refactor memset()
to be generic
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Thanks for the contribution and congrats on the performance improvements! However, I do wonder if |
Hi @soraros the implementation in alias simd_width = simdwidthof[Scalar[type]]()
var vector_end = _align_down(count, simd_width)
for i in range(0, vector_end, simd_width):
ptr.store(i, SIMD[type, simd_width](value))
for i in range(vector_end, count):
ptr.store(i, value) And the original signature of fn memset[
type: AnyType, address_space: AddressSpace
](ptr: UnsafePointer[type, address_space], value: UInt8, count: Int):
... the only thing this new version assumes differently than the original is that you are able to copy a value of the same type as the pointer into that address range. IMO these two branches are doing the same thing, copy a thing if dt is not DType.invalid:
var p = ptr.bitcast[Scalar[dt]]()
_memset_impl[dt](p, rebind[Scalar[dt]](value), count)
else:
for i in range(count):
(ptr + i).init_pointee_copy(value) Even # Copy in 32-byte chunks.
alias chunk_size = 32
var vector_end = _align_down(n, chunk_size)
for i in range(0, vector_end, chunk_size):
dest_ptr.store(i, src_ptr.load[width=chunk_size](i))
for i in range(vector_end, n):
dest_ptr.store(i, src_ptr.load(i)) |
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
…jo into add-string-add-stringslice
String
__add__
with StringSlice
and refactor memset()
to be genericmemset()
to be generic
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <110240700+martinvuyk@users.noreply.github.com>
!sync |
FYI we'll want to do some benchmarking here (both in comp and runtime) with this change internally, so that's why it has not been merged internally yet. Thanks for your patience and contribution! 🔥 |
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@jackos to reduce the scope of this PR I reverted the changes to make memset generic over memory only types (I have a better idea to not bloat compile times). I also removed the |
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
!sync |
Refactor
memset
to be generic over scalars and trivial types to be filled with scalar values of the same bitwidth.