Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRC big endian support #97

Merged
merged 25 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,7 @@ jobs:
id: test
uses: cross-platform-actions/action@v0.24.0
with:
#note: sanitizer seems to be broken on s390x (fails trying to allocate several tbs of mem) and package manager works slightly differently that on regular ubuntu
operating_system: freebsd
architecture: x86-64
version: '14.0'
Expand All @@ -243,6 +244,30 @@ jobs:
chmod a+x builder
./builder build -p ${{ env.PACKAGE_NAME }}

s390x: #big-endian
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- uses: uraimo/run-on-arch-action@v2
name: Run commands
id: runcmd
with:
arch: s390x
distro: ubuntu22.04
install: |
apt-get update -q -y
apt-get -y install sudo
apt-get -y install cmake
apt-get -y install make
apt-get -y install g++
apt-get -y install python3
apt-get -y install git
run: |
lscpu | grep Endian
python3 -c "from urllib.request import urlretrieve; urlretrieve('${{ env.BUILDER_HOST }}/${{ env.BUILDER_SOURCE }}/${{ env.BUILDER_VERSION }}/builder.pyz?run=${{ env.RUN }}', 'builder')"
chmod a+x builder
./builder build -p ${{ env.PACKAGE_NAME }} --variant s390x --cmake-extra=-DENABLE_SANITIZERS=OFF

openbsd:
runs-on: ubuntu-24.04 # latest
strategy:
Expand Down
12 changes: 11 additions & 1 deletion builder.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,15 @@
"test_steps": [
"test",
"{install_dir}/bin/checksum-profile{exe}"
]
],

"variants": {
"s390x": {
"hosts": {
"ubuntu": {
"!pkg_setup": []
}
}
}
}
}
29 changes: 29 additions & 0 deletions include/aws/checksums/private/crc_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
* SPDX-License-Identifier: Apache-2.0.
*/

#include <aws/common/byte_order.h>
#include <aws/common/stdint.h>
#include <limits.h>

Expand All @@ -21,4 +22,32 @@
return val; \
}

/* helper function to reverse byte order on big-endian platforms*/
static inline uint32_t aws_swap_bytes_if_needed_32(uint32_t x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: maybe I'm just not familiar with the literature, but to me "swap_bytes_if_needed" doesn't indicate which way they would swap, or what would be needed to make them swap

Suggested change
static inline uint32_t aws_swap_bytes_if_needed_32(uint32_t x) {
static inline uint32_t aws_swap_bytes_to_le_if_needed_32(uint32_t x) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to aws_bswap64/32_if_be.
swap bytes or byte swap (bswap is common intrinsic name) is a fairly typical name for reversing order of all bytes.

if (!aws_is_big_endian()) {
return x;
}

uint8_t c1 = x & 0xFF;
uint8_t c2 = (x >> 8) & 0xFF;
uint8_t c3 = (x >> 16) & 0xFF;
uint8_t c4 = (x >> 24) & 0xFF;

return ((uint32_t)c1 << 24) + ((uint32_t)c2 << 16) + ((uint32_t)c3 << 8) + c4;
DmitriyMusatkin marked this conversation as resolved.
Show resolved Hide resolved
}

/* Reverse the bytes in a 64-bit word. */
static inline uint64_t aws_swap_bytes_if_needed_64(uint64_t x) {
if (!aws_is_big_endian()) {
return x;
}

uint64_t m;
m = 0xff00ff00ff00ff;
DmitriyMusatkin marked this conversation as resolved.
Show resolved Hide resolved
x = ((x >> 8) & m) | (x & m) << 8;
m = 0xffff0000ffff;
DmitriyMusatkin marked this conversation as resolved.
Show resolved Hide resolved
x = ((x >> 16) & m) | (x & m) << 16;
return x >> 32 | x << 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this algorithm came from somewhere, include a link

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worked it out on paper just now ... 😮‍💨 seems right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if no link, add comments explaining like
// swap every other byte, now every 2-byte chunk is reversed
...
// swap every other 2-byte chunk, now every 4-byte chunk is reversed
...
// swap the two 4-byte chunks, now all 8 bytes are reversed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit twiddling trick is pretty popular in fast crc implementations in the wild. Don't know where it originated, but most libs i looked at included a variant of this.

Poking around more modern hash implementations, like xxHash, that trick seems to have fallen out of favor. Replaced it with compiler built ins and a typical posix byte swap implementation as fallback.

}

#endif /* AWS_CHECKSUMS_PRIVATE_CRC_UTIL_H */
588 changes: 576 additions & 12 deletions source/crc64_sw.c

Large diffs are not rendered by default.

15 changes: 8 additions & 7 deletions source/crc_sw.c
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
* SPDX-License-Identifier: Apache-2.0.
*/
#include <aws/checksums/private/crc_priv.h>
#include <aws/checksums/private/crc_util.h>
#include <stddef.h>

/* The Ethernet, gzip, et.al CRC32 polynomial (reverse of 0x04C11DB7) */
Expand Down Expand Up @@ -1149,7 +1150,7 @@ static uint32_t s_crc_generic_sb4(const uint8_t *input, int length, uint32_t crc
uint32_t(*table)[16][256] = (uint32_t(*)[16][256])table_ptr;

while (remaining >= 4) {
crc ^= *current++;
crc ^= aws_swap_bytes_if_needed_32(*current++);
crc = (*table)[3][crc & 0xff] ^ (*table)[2][(crc >> 8) & 0xff] ^ (*table)[1][(crc >> 16) & 0xff] ^
(*table)[0][crc >> 24];
remaining -= 4;
Expand All @@ -1165,8 +1166,8 @@ static uint32_t s_crc_generic_sb8(const uint8_t *input, int length, uint32_t crc
uint32_t(*table)[16][256] = (uint32_t(*)[16][256])table_ptr;

while (remaining >= 8) {
uint32_t c1 = *current++ ^ crc;
uint32_t c2 = *current++;
uint32_t c1 = aws_swap_bytes_if_needed_32(*current++) ^ crc;
uint32_t c2 = aws_swap_bytes_if_needed_32(*current++);
uint32_t t1 = (*table)[7][c1 & 0xff] ^ (*table)[6][(c1 >> 8) & 0xff] ^ (*table)[5][(c1 >> 16) & 0xff] ^
(*table)[4][(c1 >> 24) & 0xff];
uint32_t t2 = (*table)[3][c2 & 0xff] ^ (*table)[2][(c2 >> 8) & 0xff] ^ (*table)[1][(c2 >> 16) & 0xff] ^
Expand All @@ -1185,10 +1186,10 @@ static uint32_t s_crc_generic_sb16(const uint8_t *input, int length, uint32_t cr
uint32_t(*table)[16][256] = (uint32_t(*)[16][256])table_ptr;

while (remaining >= 16) {
uint32_t c1 = *current++ ^ crc;
uint32_t c2 = *current++;
uint32_t c3 = *current++;
uint32_t c4 = *current++;
uint32_t c1 = aws_swap_bytes_if_needed_32(*current++) ^ crc;
uint32_t c2 = aws_swap_bytes_if_needed_32(*current++);
uint32_t c3 = aws_swap_bytes_if_needed_32(*current++);
uint32_t c4 = aws_swap_bytes_if_needed_32(*current++);
uint32_t t1 = (*table)[15][c1 & 0xff] ^ (*table)[14][(c1 >> 8) & 0xff] ^ (*table)[13][(c1 >> 16) & 0xff] ^
(*table)[12][(c1 >> 24) & 0xff];
uint32_t t2 = (*table)[11][c2 & 0xff] ^ (*table)[10][(c2 >> 8) & 0xff] ^ (*table)[9][(c2 >> 16) & 0xff] ^
Expand Down
5 changes: 4 additions & 1 deletion tests/crc64_test.c
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

#include <aws/checksums/crc.h>
#include <aws/checksums/private/crc64_priv.h>
#include <aws/checksums/private/crc_util.h>
#include <aws/common/encoding.h>
#include <aws/testing/aws_test_harness.h>

// The polynomial used for CRC64NVME (in bit-reflected form)
Expand Down Expand Up @@ -54,7 +56,8 @@ static int s_test_known_crc(
ASSERT_HEX_EQUALS(expected_crc, result, "%s(%s)", func_name, data_name);

// Compute the residue of the buffer (the CRC of the buffer plus its CRC) - will always be a constant value
uint64_t residue = func((const uint8_t *)&result, 8, result); // assuming little endian
uint64_t result_swapped = aws_swap_bytes_if_needed_64(result);
uint64_t residue = func((const uint8_t *)&result_swapped, 8, result); // assuming little endian
ASSERT_HEX_EQUALS(expected_residue, residue, "len %d residue %s(%s)", length, func_name, data_name);

// chain the crc computation so 2 calls each operate on about 1/2 of the buffer
Expand Down
5 changes: 3 additions & 2 deletions tests/crc_test.c
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

#include <aws/checksums/crc.h>
#include <aws/checksums/private/crc_priv.h>
#include <aws/checksums/private/crc_util.h>

#include <aws/common/device_random.h>

Expand Down Expand Up @@ -39,7 +40,6 @@ typedef uint32_t(crc_fn)(const uint8_t *input, int length, uint32_t previousCrc3

// Slow reference implementation that computes a 32-bit bit-reflected/bit-inverted CRC using the provided polynomial.
static uint32_t s_crc_32_reference(const uint8_t *input, int length, const uint32_t previousCrc, uint32_t polynomial) {

uint32_t crc = ~previousCrc;
while (length-- > 0) {
crc ^= *input++;
Expand Down Expand Up @@ -73,8 +73,9 @@ static int s_test_known_crc_32(
uint32_t result = func(input, (int)length, 0);
ASSERT_HEX_EQUALS(expected_crc, result, "%s(%s)", func_name, data_name);

uint32_t result_swapped = aws_swap_bytes_if_needed_32(result);
DmitriyMusatkin marked this conversation as resolved.
Show resolved Hide resolved
// Compute the residue of the buffer (the CRC of the buffer plus its CRC) - will always be a constant value
uint32_t residue = (uint32_t)func((const uint8_t *)&result, 4, result); // assuming little endian
uint32_t residue = (uint32_t)func((const uint8_t *)&result_swapped, 4, result); // assuming little endian
ASSERT_HEX_EQUALS(expected_residue, residue, "len %d residue %s(%s)", length, func_name, data_name);

// chain the crc computation so 2 calls each operate on about 1/2 of the buffer
Expand Down