Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release the constraint to enable e8mf8 in EEW=32 #1613

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

sequencer
Copy link

@sequencer sequencer commented Aug 24, 2024

Since T1 implemented LMUL=1/8 in EEW=32 case. we submit this PR for consideration:
making e8mf8 type to be allowed when VLEN>=64 and EEW=32, the common case of VLEN is VLEN>>EEW, e.g. VLEN=64/128/256, and EEW=32.
On the other hand, e8mf8 doesn't make sense in the architecture design that the pair of (e8mf8, e16mf4, e32mf2) can always be replaced by (e8mf4, e16mf2, e32m1).
However, when specification allows SEW_min=4 in the future, e4mf8 might find its place for some edge AI scenario.

But at least, if we don't change the specification(to disallow e8mf8 in EEW=32), I think we still need to change the reason why e8mf8 not being allowed in the specification.

that of the vector register width. In general, the requirement is to
support LMUL {ge} SEW~MIN~/VLEN, where SEW~MIN~ is the narrowest supported
SEW value and VLEN is the length of vector register. In the standard
extensions, SEW~MIN~=8. For standard vector extensions with VLEN=32,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed text expresses different requirements than the original text and, in essence, is a material change to the ratified spec. For example, it no longer expresses a requirement for implementations with VLEN>64 to support fractional LMULs.

At the same time, this change seem sunnecessary since an implementation with VLEN>64 is already free to support e8mf8. The requirement expressed in the ELEN=32 example does not imply a limit on mf8 support when at least one full 8-bit element can be supported (i.e. with VLEN>32). It is just expressing minimum requirements.

And, in fact, the initial statement "Implementations must provide ..." effectively expesses a requirement that e8mf8 must be supported in VLEN>32 implementations since such implementations can hold at least one 8-bit element in a vector register. And the second sentence expresses a general minimum requirement - which the further sentences expand on with concrete examples.

Now I would agree that the first sentence could be clarified a little bit. For example saying "Implementations must provide fractional LMUL settings that allow at least one element of the narrowest supported type to occupy a fraction of a vector register corresponding to the ratio of the narrowest supported type's width to that of the largest supported type's width

Copy link
Member

@aswaterman aswaterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd echo what Greg said.

I do foresee the addition of 4-bit types, but that support would come in the form of new ISA extensions. Those ISA extensions might impose additional constraints on what values vtype must support. It's neither necessary nor appropriate to impose those stricter requirements on all implementations.

Some other thoughts. Generic support for 4-bit types is probably not what's needed. Often we either want to perform mixed-precision arithmetic (e.g. 16b += 4b x 8b), which doesn't cleanly fit into this framework. Or we want to perform dot products, which do not rely on narrower SEW/LMUL. (Consider the vqdot proposal, which has both input EEW and output EEW the same; we perform 4-input dot products of 8b numbers into 32b accumulators using SEW=32.) In neither case is it clear that the proposed change is helpful.

@sequencer
Copy link
Author

I agree 16b(e16mf2) += 4b(e4mf8) x 8b(e8mf4) is a good point, the source of problem is vector datatype is encoded in instruction opcodes + vcsr, rather than providing tag to each elements(it is expensive), adding such instruction and csr might resolve this issue, but that's totally another isa.
I'm also not so sure how to add int4 support in our lane-based design, we cannot implementation the vqdot due it complicate the lane-based design datapath.(huge wire congestion...)

I updated this PR, and ask for another turn of review: clarify the constraints: not adding, but allowing the implementation of mf8 in EEW=32 when VLEN>=64. and adding the reason why e8mf8 not useful in the EEW=32.

e8mf8 type is allowed when VLEN>=64 and EEW=32, the common case of VLEN is VLEN>EEW, e.g. VLEN=64/128/256, and EEW=32.

Signed-off-by: Jiuyang Liu <liu@jiuyang.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants