-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #8 from rjiejie/master
xtheadvdot: Add new extension
- Loading branch information
Showing
10 changed files
with
384 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
[#xtheadvdot] | ||
== Vector four 8-bit multiply and add with 32-bit instructions | ||
|
||
[NOTE,caption=Frozen] | ||
The `XTheadVdot` extension is `stable`. | ||
|
||
The `XTheadVdot` ISA extension provides vector integer four 8-bit multiply and add with 32-bit element intructions. | ||
|
||
This extension depends on the availability of the `V` (vector) ISA extension. | ||
|
||
The table below gives an overview of the instructions: | ||
|
||
[cols="^3,^3,12,18",stripes=even,options="header"] | ||
|=== | ||
| RV32 | RV64 | Mnemonic | Instruction | ||
| Y | Y | th.vmaqa.vv _vd_, _vs1_, _vs2_ | <<#xtheadvdot-insns-vmaqa-vv>> | ||
| Y | Y | th.vmaqa.vx _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqa-vx>> | ||
| Y | Y | th.vmaqau.vv _vd_, _vs1_, _vs2_ | <<#xtheadvdot-insns-vmaqau-vv>> | ||
| Y | Y | th.vmaqau.vx _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqau-vx>> | ||
| Y | Y | th.vmaqasu.vv _vd_, _vs1_, _vs2_ | <<#xtheadvdot-insns-vmaqasu-vv>> | ||
| Y | Y | th.vmaqasu.vx _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqasu-vx>> | ||
| Y | Y | th.vmaqaus.vx _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqaus-vx>> | ||
|=== | ||
|
||
[#xtheadvdot-insns,reftext="Instructions"] | ||
=== Instructions | ||
include::xtheadvdot/vmaqa_vv.adoc[] | ||
<<< | ||
include::xtheadvdot/vmaqa_vx.adoc[] | ||
<<< | ||
include::xtheadvdot/vmaqau_vv.adoc[] | ||
<<< | ||
include::xtheadvdot/vmaqau_vx.adoc[] | ||
<<< | ||
include::xtheadvdot/vmaqasu_vv.adoc[] | ||
<<< | ||
include::xtheadvdot/vmaqasu_vx.adoc[] | ||
<<< | ||
include::xtheadvdot/vmaqaus_vx.adoc[] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
[#xtheadvdot-insns-vmaqa-vv,reftext=Four signed 8-bit multiply with 32-bit add(vector-vector)] | ||
==== th.vmaqa.vv | ||
|
||
Synopsis:: | ||
Four signed 8-bit multiply with 32-bit add. | ||
|
||
Mnemonic:: | ||
th.vmaqa.vv _vd_, _vs1_, _vs2_ | ||
|
||
Encoding:: | ||
[wavedrom, , svg] | ||
.... | ||
{reg:[ | ||
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] }, | ||
{ bits: 5, name: 'vd' }, | ||
{ bits: 3, name: 0x6, attr: ['vmaqa'] }, | ||
{ bits: 5, name: 'vs1' }, | ||
{ bits: 5, name: 'vs2' }, | ||
{ bits: 1, name: 'vm' }, | ||
{ bits: 6, name: '0x20' }, | ||
]} | ||
.... | ||
|
||
Description:: | ||
|
||
The four signed 8-bit elements of 32-bit of vs1 are multiplied with the four signed 8-bit elements of 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqa.vv vd, vs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqa.vv vd, vs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of vs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size. | ||
Operation:: | ||
[source,sail] | ||
-- | ||
vd[i] = vd[i] + vs1[i][7:0] * vs2[i][7:0] | ||
+ vs1[i][15:8] * vs2[i][15:8] | ||
+ vs1[i][23:16] * vs2[i][23:16] | ||
+ vs1[i][31:24] * vs2[i][31:24] | ||
-- | ||
Permission:: | ||
This instruction can be executed in all privilege levels. | ||
Exceptions:: | ||
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010. | ||
Included in:: | ||
[%header] | ||
|=== | ||
|Extension | ||
|XTheadvdot (<<#xtheadvdot>>) | ||
|=== | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
[#xtheadvdot-insns-vmaqa-vx,reftext=Four signed 8-bit multiply with 32-bit add(vector-scalar)] | ||
==== th.vmaqa.vx | ||
|
||
Synopsis:: | ||
Four signed 8-bit multiply with 32-bit add. | ||
|
||
Mnemonic:: | ||
th.vmaqa.vx _vd_, _rs1_, _vs2_ | ||
|
||
Encoding:: | ||
[wavedrom, , svg] | ||
.... | ||
{reg:[ | ||
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] }, | ||
{ bits: 5, name: 'vd' }, | ||
{ bits: 3, name: 0x6, attr: ['vmaqa'] }, | ||
{ bits: 5, name: 'rs1' }, | ||
{ bits: 5, name: 'vs2' }, | ||
{ bits: 1, name: 'vm' }, | ||
{ bits: 6, name: '0x21' }, | ||
]} | ||
.... | ||
|
||
Description:: | ||
|
||
The four signed 8-bit elements of the lower 32-bit of rs1 are multiplied with the four signed 8-bit elements of each 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqa.vx vd, rs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqa.vx vd, rs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of rs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size. | ||
Operation:: | ||
[source,sail] | ||
-- | ||
vd[i] = vd[i] + rs1[7:0] * vs2[i][7:0] | ||
+ rs1[15:8] * vs2[i][15:8] | ||
+ rs1[23:16] * vs2[i][23:16] | ||
+ rs1[31:24] * vs2[i][31:24] | ||
-- | ||
Permission:: | ||
This instruction can be executed in all privilege levels. | ||
Exceptions:: | ||
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010. | ||
Included in:: | ||
[%header] | ||
|=== | ||
|Extension | ||
|XTheadvdot (<<#xtheadvdot>>) | ||
|=== | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
[#xtheadvdot-insns-vmaqasu-vv,reftext=Four signed-unsigned and 8-bit multiply with 32-bit add(vector-vector)] | ||
==== th.vmaqasu.vv | ||
|
||
Synopsis:: | ||
Four signed-unsigned 8-bit multiply with 32-bit add. | ||
|
||
Mnemonic:: | ||
th.vmaqasu.vv _vd_, _vs1_, _vs2_ | ||
|
||
Encoding:: | ||
[wavedrom, , svg] | ||
.... | ||
{reg:[ | ||
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] }, | ||
{ bits: 5, name: 'vd' }, | ||
{ bits: 3, name: 0x6, attr: ['vmaqa'] }, | ||
{ bits: 5, name: 'vs1' }, | ||
{ bits: 5, name: 'vs2' }, | ||
{ bits: 1, name: 'vm' }, | ||
{ bits: 6, name: '0x24' }, | ||
]} | ||
.... | ||
|
||
Description:: | ||
|
||
The four signed 8-bit elements of 32-bit of vs1 are multiplied with the four unsigned 8-bit elements of 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqasu.vv vd, vs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqasu.vv vd, vs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of vs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size. | ||
Operation:: | ||
[source,sail] | ||
-- | ||
vd[i] = vd[i] + vs1[i][7:0] * vs2[i][7:0] | ||
+ vs1[i][15:8] * vs2[i][15:8] | ||
+ vs1[i][23:16] * vs2[i][23:16] | ||
+ vs1[i][31:24] * vs2[i][31:24] | ||
-- | ||
Permission:: | ||
This instruction can be executed in all privilege levels. | ||
Exceptions:: | ||
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010. | ||
Included in:: | ||
[%header] | ||
|=== | ||
|Extension | ||
|XTheadvdot (<<#xtheadvdot>>) | ||
|=== | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
[#xtheadvdot-insns-vmaqasu-vx,reftext=Four signed-unsigned and 8-bit multiply with 32-bit add(vector-scalar)] | ||
==== th.vmaqasu.vx | ||
|
||
Synopsis:: | ||
Four signed-unsigned 8-bit multiply with 32-bit add. | ||
|
||
Mnemonic:: | ||
th.vmaqasu.vx _vd_, _rs1_, _vs2_ | ||
|
||
Encoding:: | ||
[wavedrom, , svg] | ||
.... | ||
{reg:[ | ||
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] }, | ||
{ bits: 5, name: 'vd' }, | ||
{ bits: 3, name: 0x6, attr: ['vmaqa'] }, | ||
{ bits: 5, name: 'rs1' }, | ||
{ bits: 5, name: 'vs2' }, | ||
{ bits: 1, name: 'vm' }, | ||
{ bits: 6, name: '0x25' }, | ||
]} | ||
.... | ||
|
||
Description:: | ||
|
||
The four signed 8-bit elements of the lower 32-bit of rs1 are multiplied with the four unsigned 8-bit elements of each 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqasu.vx vd, rs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqasu.vx vd, rs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of rs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size. | ||
Operation:: | ||
[source,sail] | ||
-- | ||
vd[i] = vd[i] + rs1[7:0] * vs2[i][7:0] | ||
+ rs1[15:8] * vs2[i][15:8] | ||
+ rs1[23:16] * vs2[i][23:16] | ||
+ rs1[31:24] * vs2[i][31:24] | ||
-- | ||
Permission:: | ||
This instruction can be executed in all privilege levels. | ||
Exceptions:: | ||
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010. | ||
Included in:: | ||
[%header] | ||
|=== | ||
|Extension | ||
|XTheadvdot (<<#xtheadvdot>>) | ||
|=== | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
[#xtheadvdot-insns-vmaqau-vv,reftext=Four unsigned 8-bit multiply with 32-bit add(vector-vector)] | ||
==== th.vmaqau.vv | ||
|
||
Synopsis:: | ||
Four unsigned 8-bit multiply with 32-bit add. | ||
|
||
Mnemonic:: | ||
th.vmaqau.vv _vd_, _vs1_, _vs2_ | ||
|
||
Encoding:: | ||
[wavedrom, , svg] | ||
.... | ||
{reg:[ | ||
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] }, | ||
{ bits: 5, name: 'vd' }, | ||
{ bits: 3, name: 0x6, attr: ['vmaqa'] }, | ||
{ bits: 5, name: 'vs1' }, | ||
{ bits: 5, name: 'vs2' }, | ||
{ bits: 1, name: 'vm' }, | ||
{ bits: 6, name: '0x22' }, | ||
]} | ||
.... | ||
|
||
Description:: | ||
|
||
The four unsigned 8-bit elements of 32-bit of vs1 are multiplied with the four unsigned 8-bit elements of 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqau.vv vd, vs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqau.vv vd, vs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of vs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size. | ||
Operation:: | ||
[source,sail] | ||
-- | ||
vd[i] = vd[i] + vs1[i][7:0] * vs2[i][7:0] | ||
+ vs1[i][15:8] * vs2[i][15:8] | ||
+ vs1[i][23:16] * vs2[i][23:16] | ||
+ vs1[i][31:24] * vs2[i][31:24] | ||
-- | ||
Permission:: | ||
This instruction can be executed in all privilege levels. | ||
Exceptions:: | ||
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010. | ||
Included in:: | ||
[%header] | ||
|=== | ||
|Extension | ||
|XTheadvdot (<<#xtheadvdot>>) | ||
|=== | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
[#xtheadvdot-insns-vmaqau-vx,reftext=Four unsigned 8-bit multiply with 32-bit add(vector-scalar)] | ||
==== th.vmaqau.vx | ||
|
||
Synopsis:: | ||
Four unsigned 8-bit multiply with 32-bit add. | ||
|
||
Mnemonic:: | ||
th.vmaqau.vx _vd_, _rs1_, _vs2_ | ||
|
||
Encoding:: | ||
[wavedrom, , svg] | ||
.... | ||
{reg:[ | ||
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] }, | ||
{ bits: 5, name: 'vd' }, | ||
{ bits: 3, name: 0x6, attr: ['vmaqa'] }, | ||
{ bits: 5, name: 'rs1' }, | ||
{ bits: 5, name: 'vs2' }, | ||
{ bits: 1, name: 'vm' }, | ||
{ bits: 6, name: '0x23' }, | ||
]} | ||
.... | ||
|
||
Description:: | ||
|
||
The four unsigned 8-bit elements of the lower 32-bit of rs1 are multiplied with the four unsigned 8-bit elements of each 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqau.vx vd, rs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqau.vx vd, rs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of rs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size. | ||
Operation:: | ||
[source,sail] | ||
-- | ||
vd[i] = vd[i] + rs1[7:0] * vs2[i][7:0] | ||
+ rs1[15:8] * vs2[i][15:8] | ||
+ rs1[23:16] * vs2[i][23:16] | ||
+ rs1[31:24] * vs2[i][31:24] | ||
-- | ||
Permission:: | ||
This instruction can be executed in all privilege levels. | ||
Exceptions:: | ||
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010. | ||
Included in:: | ||
[%header] | ||
|=== | ||
|Extension | ||
|XTheadvdot (<<#xtheadvdot>>) | ||
|=== | ||
Oops, something went wrong.