Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bit-packed GF2 #583

Draft
wants to merge 20 commits into
base: release/0.4.x
Choose a base branch
from
Draft

Conversation

amirebrahimi
Copy link

@amirebrahimi amirebrahimi commented Dec 12, 2024

Initially, I started with an approach that defined bit-packing at ArrayMeta, so it could be used in Array and FieldArray. Then, I would handle implicit bit packing and unpacking as necessary depending on the operation. This soon turned into a game of whack-a-mole where I was needing to override a lot of numpy functions on top of ndarray methods.

This version is much simpler, smaller scope, and has limited functionality: basically arithmetic operations and matrix-vector/matrix-matrix multiplication. I figure let's start with this and then expand scope as needed.

Rather than introduce fields higher up in the class/meta hierarchy I've opted for overriding specific functionality in a custom GF2BP class. There are simple conversions for going between GF2 and GF2BP via .astype, which implies that you will receive a new array.

The numpy v2 and numba v0.60 bump was necessary, so that I could make use of np.bitwise_count. This allows a roughly 4x speed-up in matrix-matrix multiplication.

I started to work on tests and tried to introduce GF2BP under the FIELDS fixture construction, but there won't be an easy way to xfail the tests I know would fail. Ideally, I'd like to piggyback on the tests in test_arithmetic.py and test_linalg.py, but only in comparison to GF2 after unpacking. So, let me know if you have any good ideas for this without having to duplicate tests.

@amirebrahimi amirebrahimi changed the base branch from main to release/0.4.x December 12, 2024 02:15
@mhostetter
Copy link
Owner

Thanks for working on / thinking through this. Here are some initial thoughts.

I think a good API would be to have GF2 and GF2BP conversions happen through np.packbits() and np.unpackbits(). Doing this would avoid the need for the .astype() method. For example:

x = galois.GF2([1, 0, 1, 1])
print(repr(x))
# GF([1, 0, 1, 1], order=2)
x_bp = np.packbits(x)
print(repr(x_bp))
# GF([176], order=2, bitpacked=True)

Whereas, creating a GF2BP assumes the input is already bit-packed. For example:

x_bp = galois.GF2BP([176])
print(repr(x_bp))
# GF([176], order=2, bitpacked=True)
x = np.unpackbits(x_bp)
print(repr(x))
# GF([1, 0, 1, 1, 0, 0, 0, 0], order=2)

It's an open question whether to use only np.uint8, as np.packbits() does, or support larger dtypes like np.uint64. For the moment, let's assume the former.

If we use np.packbits(), I don't think it's necessary to keep track of the unpacked shape, do you? Also, in the code snippet below, I don't believe the output unpacked shape equals the first input's unpacked shape. This is because of NumPy's broadcasting rules.

    def __call__(self, ufunc, method, inputs, kwargs, meta):
        output = super().__call__(ufunc, method, inputs, kwargs, meta)
        output._unpacked_shape = inputs[0]._unpacked_shape
        return output

Looking at the ufuncs for GF2, most of them use bitwise arithmetic (XOR, AND, etc).

class UFuncMixin_2_1(UFuncMixin):
    """
    A mixin class that provides explicit calculation arithmetic for GF(2).
    """

    def __init_subclass__(cls) -> None:
        super().__init_subclass__()
        cls._add = add_ufunc(cls, override=np.bitwise_xor)
        cls._negative = negative_ufunc(cls, override=np.positive)
        cls._subtract = subtract_ufunc(cls, override=np.bitwise_xor)
        cls._multiply = multiply_ufunc(cls, override=np.bitwise_and)
        cls._reciprocal = reciprocal(cls)
        cls._divide = divide(cls)
        cls._power = power(cls)
        cls._log = log(cls)
        cls._sqrt = sqrt(cls)

We can probably make all of the ufuncs for GF2 (reciprocal, divide, etc) also work for bit-packed GF2 arrays. If that's the case, maybe a more elegant solution is to not have a separate class for GF2BP, but to have a bitpacked property in GF2 itself.

I'm uncomfortable bumping the minimum NumPy version to 2.0. It's too minimally adopted and too much infrastructure relies on v1. A compromise could be having a v1 and v2 implementation of matmul for GF2BP arrays. If the present NumPy version is 2.0 or greater, then we can use np.bitwise_count().

@amirebrahimi
Copy link
Author

amirebrahimi commented Dec 12, 2024

Thanks, @mhostetter. I'll work on incorporating this feedback in. One thing I wanted to clarify since it seems to be a misunderstanding. GF2BP does the bit-packing for you, so it doesn't require your matrices to already be bit-packed. With that information, does that change any of your feedback?

@mhostetter
Copy link
Owner

One thing I wanted to clarify since it seems to be a misunderstanding. GF2BP does the bit-packing for you, so it doesn't require your matrices to already be bit-packed. With that information, does that change any of your feedback?

That is my understanding of your current implementation, but also something I'm suggesting we change. To me, having an internal data conversion seems dubious. What is the motivation for doing the conversions internally? In my proposal, if a user wanted to start with an unpacked array, they would x = galois.GF2(unpacked_array) and then x_bp = np.packbits(x), which returns a GF2BP. If they want to start with a bit-packed array, they would x_bp = galois.GF2BP(packed_array).

My issue with the internal (mandatory) data conversion is that it is not idempotent. For instance, doing x_bp = galois.GF2BP(x_bp) repeatedly could change the value of x_bp. This isn't true of any of the other Galois field arrays. Here's an example:

x = np.packbits([0, 0, 0, 1])
print(x)
# [16]
x = np.packbits(x)
print(x)
# [128]
x = np.packbits(x)
print(x)
# [128]

@mhostetter
Copy link
Owner

Also, I'll try to think about an easy way to plug into the test infrastructure.

@amirebrahimi
Copy link
Author

To your question about whether it's necessary to track the unpacked shape. I think so because otherwise when you unpack you won't be able to supply the count property and will wind up with potentially extra padded zeros.

@mhostetter
Copy link
Owner

To your question about whether it's necessary to track the unpacked shape. I think so because otherwise when you unpack you won't be able to supply the count property and will wind up with potentially extra padded zeros.

Yes, I agree with that. I was thinking we could populate a property like ._count (or whatever name makes sense) when the array is created inside the override to np.packbits().

So the user has x = galois.GF2.Random((100, 6)) and then calls x_bp = np.packbits(x). Inside the intercepted call to packbits() we can x_bp = np.packbits(x) and then x_bp = x_bp.view(galois.GF2BP) and then set x_bp._count = -2.

Later when the user wants to unpack it, calling y = np.unpackbits(x_bp), we can use the internal ._count property if count=None. This way the unpacked array has the correct original shape. So rather than tracking the shape of the unpacked array, we just need to know how many zeros were appended on the last axis when the array was packed.

I can help you implement the interception of packbits() and unpackbits(), if you'd like. It might be a bit tricky.

@amirebrahimi
Copy link
Author

Yes, I agree with that. I was thinking we could populate a property like ._count (or whatever name makes sense) when the array is created inside the override to np.packbits().

I've implemented packbits and unpackbits.

So the user has x = galois.GF2.Random((100, 6)) and then calls x_bp = np.packbits(x). Inside the intercepted call to packbits() we can x_bp = np.packbits(x) and then x_bp = x_bp.view(galois.GF2BP) and then set x_bp._count = -2.

Later when the user wants to unpack it, calling y = np.unpackbits(x_bp), we can use the internal ._count property if count=None. This way the unpacked array has the correct original shape. So rather than tracking the shape of the unpacked array, we just need to know how many zeros were appended on the last axis when the array was packed.

I'm now tracking the axis element count because it seemed more straightforward. Is there a specific reason why you wanted to track the appended zero count?

@amirebrahimi
Copy link
Author

amirebrahimi commented Dec 14, 2024

If we use np.packbits(), I don't think it's necessary to keep track of the unpacked shape, do you? Also, in the code snippet below, I don't believe the output unpacked shape equals the first input's unpacked shape. This is because of NumPy's broadcasting rules.

    def __call__(self, ufunc, method, inputs, kwargs, meta):
        output = super().__call__(ufunc, method, inputs, kwargs, meta)
        output._unpacked_shape = inputs[0]._unpacked_shape
        return output

Coming back to this. I was trying to find a case of broadcasting where the output shape would not be the same as the input shape and couldn't. Numpy will complain about the broadcasting if the shapes aren't compatible both for the unpacked and packed case.

a = GF2.Random((10, 10))
b = GF2.Random((1, 10))
x = np.packbits(a)
y = np.packbits(b)
print((a + b).shape)
print((x + y).shape)

If the shapes aren't broadcastable, then it will throw a ValueError:

a = GF2.Random((10, 10))
b = GF2.Random((2, 10))
x = np.packbits(a)
y = np.packbits(b)
print((b + a).shape)
print((y + x).shape)

Was your concern with how I'm choosing the first input, which would cause an issue if the operands were switched (as in the last example)? If so, then I think a max over the inputs ._axis_count would work.

@mhostetter
Copy link
Owner

Was your concern with how I'm choosing the first input, which would cause an issue if the operands were switched (as in the last example)? If so, then I think a max over the inputs ._axis_count would work.

Yes, that was a concern.

Coming back to this. I was trying to find a case of broadcasting where the output shape would not be the same as the input shape and couldn't. Numpy will complain about the broadcasting if the shapes aren't compatible both for the unpacked and packed case.

Using ufunc methods, like outer or accumulate, can change the broadcasted shape.

import numpy as np

a = np.random.randint(0, 2, 10)
b = np.random.randint(0, 2, 10)
x = np.packbits(a)
y = np.packbits(b)
print((a * b).shape)
print((x * y).shape)
print(np.multiply.outer(a, b).shape)
print(np.multiply.outer(x, y).shape)
# (10,)
# (2,)
# (10, 10)
# (2, 2)

Also, here's crazy example of broadcasting over multiple axes.

import numpy as np

a = np.random.randint(0, 2, (1, 2, 3))
b = np.random.randint(0, 2, (2, 2, 1))
x = np.packbits(a)
y = np.packbits(b)
print((a * b).shape)
print((x * y).shape)
print(np.multiply.outer(a, b).shape)
print(np.multiply.outer(x, y).shape)
# (2, 2, 3)
# (1,)
# (1, 2, 3, 2, 2, 1)
# (1, 1)

I think the question is, can we track the axis elements (or appended zeros) through all of the broadcasting and manipulation?

@amirebrahimi
Copy link
Author

I think the question is, can we track the axis elements (or appended zeros) through all of the broadcasting and manipulation?

Yes, that is a good question. I'll explore that soon.

Just wanted to give you an update on where I'm at. I started looking at making a bit-packed version of inv, which then had me look at concatenate, which has ultimately led me to consider indexing for both getting items and setting items. At first I started with an explicit version (i.e. separate methods that would need to be called), leaving the default __getitem__ and __setitem__ intact, but now I've decided to see how things work if I move back towards an implicit unpacking.

Currently, I've got index update rules to handle indexing without defaulting to unpacking the whole matrix. I think I can simplify these rules if I convert them into a canonical form first. There's a test file I had Chat GPT generate to provide some examples of all the ways one can index.

My thought for going down this path is I don't see much utility to the class if there aren't at least some of the more common numpy functions supported in bit-packed form. For us, matrix inverses are one of those.

@amirebrahimi
Copy link
Author

amirebrahimi commented Dec 24, 2024

Finished a bitpacked implementation of np.linalg.inv:

# %%
a = GF2.Random((1024, 1024), seed=4)
x = np.packbits(a)
%timeit np.linalg.inv(a)  # ~3s
%timeit np.linalg.inv(x)  # ~734ms
assert np.array_equal(a, np.unpackbits(x))

So, there's a 4.5x speedup w/ 1k matrices.

Also, did some tests in general to see if unpacking portions of matrices was worthwhile over unpacking the full matrix (to validate having a custom __getitem__ and __setitem__:

# %%
M, N = 1800, 1000
a = np.random.randint(0, 2, size=(M, N), dtype=np.uint8)
A = np.packbits(a, axis=-1)  # shape: (M, ceil(N/8))

# ~434 us
%timeit np.unpackbits(A, axis=-1, count=N)

# %%
# ~3 us
%timeit np.unpackbits(A[np.random.randint(0, M)], axis=-1, count=N)

So, >100x speedup -- I think it's worthwhile to keep this approach around. I think if one wants to get at the rawdata, then they just perform a .view(np.ndarray)

I'll plan to look into broadcasting and clean up / normalize the indexing code when I'm back on Thursday.

# 9. Using np.newaxis (reshaped array assignment)
arr = GF([1, 0, 1, 1])
arr = np.packbits(arr)
reshaped = arr[:, np.newaxis] # should this be using arr's data (as would be the case without packbits) or a new array?
Copy link
Author

@amirebrahimi amirebrahimi Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure what to do with this case.

@amirebrahimi
Copy link
Author

I've cleaned up the indexing routines and have started to look at broadcasting. I found a bug in my inv_bitpacked implementation that I need to fix first though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants