Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Refactor memset() to be generic #3577

Open
wants to merge 28 commits into
base: nightly
Choose a base branch
from

Conversation

martinvuyk
Copy link
Contributor

@martinvuyk martinvuyk commented Sep 30, 2024

Refactor memset to be generic over scalars and trivial types to be filled with scalar values of the same bitwidth.

…eneric memset and adding helpers to DType

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@martinvuyk martinvuyk changed the title [stdlib] Add String __add__ with StringSlice and refactor List.resize() [stdlib] Add String __add__ with StringSlice and refactor memset() to be generic Sep 30, 2024
@martinvuyk martinvuyk marked this pull request as ready for review September 30, 2024 19:33
@martinvuyk martinvuyk requested a review from a team as a code owner September 30, 2024 19:33
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
martinvuyk added a commit to martinvuyk/mojo that referenced this pull request Sep 30, 2024
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@soraros
Copy link
Contributor

soraros commented Sep 30, 2024

Thanks for the contribution and congrats on the performance improvements! However, I do wonder if memset is right place to add such features. It has the semantics of copying raw memory, and init_pointee_copy is not that.

@martinvuyk
Copy link
Contributor Author

Hi @soraros the implementation in _memset_impl also just copies a value on a vector and then stores it in the memory address range:

alias simd_width = simdwidthof[Scalar[type]]()
var vector_end = _align_down(count, simd_width)

for i in range(0, vector_end, simd_width):
    ptr.store(i, SIMD[type, simd_width](value))

for i in range(vector_end, count):
    ptr.store(i, value)

And the original signature of memset also made me consider doing this as I interpreted it like it is supposed to be generic:

fn memset[
    type: AnyType, address_space: AddressSpace
](ptr: UnsafePointer[type, address_space], value: UInt8, count: Int):
  ...

the only thing this new version assumes differently than the original is that you are able to copy a value of the same type as the pointer into that address range.

IMO these two branches are doing the same thing, copy a thing count times from the pointer start

if dt is not DType.invalid:
    var p = ptr.bitcast[Scalar[dt]]()
    _memset_impl[dt](p, rebind[Scalar[dt]](value), count)
else:
    for i in range(count):
        (ptr + i).init_pointee_copy(value)

Even memcpy also does copies onto vectors before storing in an address range, so there is no "raw" bit-mapping on an electrical level either:

# Copy in 32-byte chunks.
alias chunk_size = 32
var vector_end = _align_down(n, chunk_size)
for i in range(0, vector_end, chunk_size):
    dest_ptr.store(i, src_ptr.load[width=chunk_size](i))
for i in range(vector_end, n):
    dest_ptr.store(i, src_ptr.load(i))

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@martinvuyk martinvuyk changed the title [stdlib] Add String __add__ with StringSlice and refactor memset() to be generic [stdlib] Refactor memset() to be generic Oct 2, 2024
martinvuyk and others added 3 commits October 2, 2024 09:34
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <110240700+martinvuyk@users.noreply.github.com>
@JoeLoser
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Oct 14, 2024
@JoeLoser
Copy link
Collaborator

FYI we'll want to do some benchmarking here (both in comp and runtime) with this change internally, so that's why it has not been merged internally yet. Thanks for your patience and contribution! 🔥

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@martinvuyk
Copy link
Contributor Author

@jackos to reduce the scope of this PR I reverted the changes to make memset generic over memory only types (I have a better idea to not bloat compile times). I also removed the DType.get_dtype() to its own PR #3810. This PR now only makes memset generic over scalars and trivial types to be filled with scalar values of the same bitwidth.

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@JoeLoser
Copy link
Collaborator

JoeLoser commented Dec 4, 2024

!sync

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants