Skip to content

Commit

Permalink
improve compile times for unit tests
Browse files Browse the repository at this point in the history
  • Loading branch information
Jeffrey Hurchalla authored and Jeffrey Hurchalla committed Dec 6, 2024
1 parent fa90f20 commit 8b59368
Show file tree
Hide file tree
Showing 14 changed files with 1,561 additions and 123 deletions.
12 changes: 4 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

![Alt text](images/clockxtrasmall_border2.jpg?raw=true "Clock Gears, photo by Krzysztof Golik, licensed CC BY-SA 4.0")

Clockwork is a high performance, easy to use Modular Arithmetic (header-only) library for C++ for up to 128 bit integer types, with extensive support for Montgomery arithmetic. If you want or need Montgomery arithmetic in this range, or general modular arithmetic functions, Clockwork is almost certainly the fastest and easiest library you could use (*for best performance just make sure you define the standard C++ macro NDEBUG*).
Clockwork is a high performance, easy to use Modular Arithmetic (header-only) library for C++ for up to 128 bit integer types, with extensive support for Montgomery arithmetic. If you want or need Montgomery arithmetic in this range, or general modular arithmetic functions, Clockwork is almost certainly the fastest and easiest library you could use. For best performance make sure you define the standard C++ macro NDEBUG.

## Design goals

The goal for Clockwork was to create a flexible and bulletproof library with the best performance yet achieved for modular arithmetic of native (on the CPU) integer types. For integer types that are double the native bit width (e.g. 128 bit), performance should still be close to ideal, though not as completely optimized as for native types. Larger than 128 bit types are permissible; however a library like GMP is likely to be a better choice for such sizes.
Clockwork is designed to be a flexible and bulletproof library with the best performance achievable for modular arithmetic of native (on the CPU) integer types. For integer types that are double the native bit width (e.g. 128 bit), performance should still be close to ideal, though not as completely optimized as for native types. Larger than 128 bit types are permissible; however a library like GMP is likely to be a better choice for such sizes.

## Status

Expand Down Expand Up @@ -48,7 +48,7 @@ If you prefer, for the last command you could instead use CMake's default instal
This will copy all the header files needed for this modular arithmetic library to an "include" subfolder in the installation folder of your choosing.
When compiling your project, you'll of course need to ensure that you have that include subfolder as part of your include path.

For good performance you *must* ensure that the standard macro NDEBUG (see <cassert>) is defined when compiling. You can generally do this by adding the option flag -DNDEBUG to your compile command.
For best performance you *must* ensure that the standard macro NDEBUG (see <cassert>) is defined when compiling. You can generally do this by adding the option flag -DNDEBUG to your compile command.

It may help to see a simple [example](examples/example_without_cmake).

Expand All @@ -65,7 +65,7 @@ From the modular_arithmetic group, the files *absolute_value_difference.h*, *mod
*hurchalla::modular_multiplicative_inverse(T a, T modulus)*. Returns the multiplicative inverse of a if it exists, and otherwise returns 0.
*hurchalla::modular_pow(T base, T exponent, T modulus)*. Returns the modular exponentiation of base^exponent (mod modulus).

From the montgomery_arithmetic group, the file *MontgomeryForm.h* provides the easy to use (and zero cost abstraction) class *hurchalla::MontgomeryForm*, which has member functions for effortlessly performing operations in the Montgomery domain. These operations include converting to/from Montgomery domain, add, subtract, multiply, square, fused-multiply-add/sub, pow, gcd, and more. For improved performance in some situations, the file *montgomery_form_aliases.h* provides simple aliases for faster (with limitations on allowed modulus) instantiations of the class MontgomeryForm.
From the montgomery_arithmetic group, the file *MontgomeryForm.h* provides the easy to use (and zero cost abstraction) class *hurchalla::MontgomeryForm*, which has member functions for effortlessly performing operations in the Montgomery domain. These operations include converting to/from Montgomery domain, add, subtract, multiply, square, [fused-multiply-add/sub](https://jeffhurchalla.com/2022/05/01/the-montgomery-multiply-accumulate), pow, gcd, and more. For improved performance in some situations, the file *montgomery_form_aliases.h* provides simple aliases for faster (with limitations on allowed modulus) instantiations of the class MontgomeryForm.

For an easy demonstration of MontgomeryForm, you can see one of the [examples](examples/example_without_cmake).

Expand All @@ -74,7 +74,3 @@ If you prefer not to use the high level interface of MontgomeryForm, and instead
## Performance Notes

If you're interested in experimenting, predefining certain macros when compiling might improve performance - see [macros_for_performance.md](macros_for_performance.md).

## TODO

For the unit tests, solve the long compile time and high memory use (during compile) of the files test_MontgomeryForm.cpp and test_montgomery_pow.cpp.
84 changes: 63 additions & 21 deletions build_tests.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash
#!/usr/bin/env bash

# Copyright (c) 2020-2022 Jeffrey Hurchalla.
# This Source Code Form is subject to the terms of the Mozilla Public
Expand Down Expand Up @@ -172,6 +172,11 @@
# update-alternatives for both icc and icpc. ]


if [ "${BASH_VERSINFO:-0}" -lt 4 ]; then
>&2 echo "This script requires some verion of bash >= 4.0. Bash 3.2.57 is known to fail, but the minimum required version is unknown"
exit 1
fi


while getopts ":m:l:c:j:h-:raus" opt; do
case $opt in
Expand Down Expand Up @@ -239,58 +244,72 @@ if [ "${compiler,,}" = "gcc" ] || [ "${compiler,,}" = "g++" ]; then
cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=g++
cmake_c_compiler=-DCMAKE_C_COMPILER=gcc
compiler_name=gcc
compiler_version=0
elif [ "${compiler,,}" = "gcc-7" ] || [ "${compiler,,}" = "g++-7" ] ||
[ "${compiler,,}" = "gcc7" ] || [ "${compiler,,}" = "g++7" ]; then
cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=g++-7
cmake_c_compiler=-DCMAKE_C_COMPILER=gcc-7
compiler_name=gcc7
compiler_name=gcc
compiler_version=7
elif [ "${compiler,,}" = "gcc-10" ] || [ "${compiler,,}" = "g++-10" ] ||
[ "${compiler,,}" = "gcc10" ] || [ "${compiler,,}" = "g++10" ]; then
cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=g++-10
cmake_c_compiler=-DCMAKE_C_COMPILER=gcc-10
compiler_name=gcc10
compiler_name=gcc
compiler_version=10
elif [ "${compiler,,}" = "gcc-13" ] || [ "${compiler,,}" = "g++-13" ] ||
[ "${compiler,,}" = "gcc13" ] || [ "${compiler,,}" = "g++13" ]; then
cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=g++-13
cmake_c_compiler=-DCMAKE_C_COMPILER=gcc-13
compiler_name=gcc13
compiler_name=gcc
compiler_version=13
elif [ "${compiler,,}" = "clang" ] || [ "${compiler,,}" = "clang++" ]; then
cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=clang++
cmake_c_compiler=-DCMAKE_C_COMPILER=clang
compiler_name=clang
compiler_version=0
elif [ "${compiler,,}" = "clang-3" ] || [ "${compiler,,}" = "clang++-3" ] ||
[ "${compiler,,}" = "clang3" ] || [ "${compiler,,}" = "clang++3" ]; then
cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=clang++-3.9
cmake_c_compiler=-DCMAKE_C_COMPILER=clang-3.9
compiler_name=clang3
compiler_name=clang
compiler_version=3
elif [ "${compiler,,}" = "clang-6" ] || [ "${compiler,,}" = "clang++-6" ] ||
[ "${compiler,,}" = "clang6" ] || [ "${compiler,,}" = "clang++6" ]; then
cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=clang++-6.0
cmake_c_compiler=-DCMAKE_C_COMPILER=clang-6.0
compiler_name=clang6
compiler_name=clang
compiler_version=6
elif [ "${compiler,,}" = "clang-10" ] || [ "${compiler,,}" = "clang++-10" ] ||
[ "${compiler,,}" = "clang10" ] || [ "${compiler,,}" = "clang++10" ]; then
cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=clang++-10
cmake_c_compiler=-DCMAKE_C_COMPILER=clang-10
compiler_name=clang10
compiler_name=clang
compiler_version=10
elif [ "${compiler,,}" = "clang-18" ] || [ "${compiler,,}" = "clang++-18" ] ||
[ "${compiler,,}" = "clang18" ] || [ "${compiler,,}" = "clang++18" ]; then
cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=clang++-18
cmake_c_compiler=-DCMAKE_C_COMPILER=clang-18
compiler_name=clang18
compiler_name=clang
compiler_version=18
elif [ "${compiler,,}" = "icc" ] || [ "${compiler,,}" = "icpc" ]; then
cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=icpc
cmake_c_compiler=-DCMAKE_C_COMPILER=icc
compiler_name=icc
compiler_version=0
source /opt/intel/bin/compilervars.sh intel64
elif [ -n "$compiler" ]; then
echo "Invalid argument for option -c: $compiler"
exit 1
fi

echo Using compiler $compiler_name ...
echo Using build mode $mode ...

if [ "$compiler_version" = "0" ]; then
echo Using compiler $compiler_name \(default version\) ...
else
echo Using compiler $compiler_name v$compiler_version ...
fi
echo Using build mode $mode ...


cpp_standard="-std=c++11"
Expand Down Expand Up @@ -369,23 +388,36 @@ if [ "$compiler_name" = "gcc" ]; then

: # do nothing, at least for now

# note that clang-tidy includes the clang static analyzer
elif [ "$compiler_name" = "clang" ]; then
# clang_static_analysis=(-DCMAKE_CXX_CLANG_TIDY="clang-tidy;-checks=-*,clang-analyzer-*")
# clang_static_analysis=(-DCMAKE_CXX_CLANG_TIDY="clang-tidy;-extra-arg=-Wno-unknown-warning-option;-checks=*,clang-analyzer-*")
: # do nothing, at least for now
fi
#-analyzer-checker=core
#-analyzer-checker=cpp
#-analyzer-checker=unix
#-analyzer-checker=deadcode


#undefined behavior sanitizers
#-----
#-----------------------------
# note according to https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html:
# "[The] UndefinedBehaviorSanitizer ... test suite is integrated into the CMake
# build and can be run with check-ubsan command."
if [ "$compiler_name" = "gcc" ]; then
gcc_ubsan="-fsanitize=undefined -fno-sanitize-recover \
-fsanitize=float-divide-by-zero -fsanitize=float-cast-overflow"

elif [ "$compiler_name" = "clang" ]; then
# My installed version of clang doesn't support -fsanitize=implicit-conversion
clang_ubsan="-fsanitize=undefined -fsanitize=nullability -fsanitize=bounds \
-fsanitize=float-divide-by-zero"

elif [ "$compiler_name" = "clang" ] && [[ $compiler_version -ge 6 ]]; then
# clang6 doesn't support -fsanitize=implicit-conversion. Clang10 does support
# it. I don't know if clang7,8,9 support it.
if [[ $compiler_version -ge 10 ]]; then
clang_ubsan="-fsanitize=undefined -fsanitize=nullability -fsanitize=bounds \
-fsanitize=float-divide-by-zero -fsanitize=implicit-conversion"
else
clang_ubsan="-fsanitize=undefined -fsanitize=nullability -fsanitize=bounds \
-fsanitize=float-divide-by-zero"
fi
# The next line in a perfect world wouldn't be needed, but for some versions
# of clang (clang 10 for me), the linker doesn't find __muloti4 when using the
# undefined behavior sanitizers. __muloti4 is defined in compiler-rt.
Expand All @@ -395,7 +427,7 @@ fi


#address sanitizers
#-----
#------------------
clang_asan=""
gcc_asan="-fsanitize=address"
# clang -fsanitize=address -O1 -fno-omit-frame-pointer -g tests/use-after-free.c
Expand Down Expand Up @@ -442,6 +474,9 @@ gcc_asan="-fsanitize=address"
# endif()


#LeakSanitizer (LSan)
#ThreadSanitizer (TSan)
#MemorySanitizer (MSan)

#modes
# 1. Asan+UBsan+Lsan
Expand All @@ -451,7 +486,7 @@ gcc_asan="-fsanitize=address"

# a run of "splint" and/or cppcheck
# cpplint
# include what you use (iwyu), and lwyu
# include what you use (iwyu), and lwyu (link what you use)
# Clang-Tidy
# CppCoreCheck

Expand All @@ -463,6 +498,11 @@ gcc_asan="-fsanitize=address"
# <LANG>_INCLUDE_WHAT_YOU_USE
# LINK_WHAT_YOU_USE

# fuzz testing

# valgrind/purity

# code coverage tools - gcov



Expand All @@ -476,9 +516,11 @@ exit_on_failure () {
# https://stackoverflow.com/questions/59895/how-to-get-the-source-directory-of-a-bash-script-from-within-the-script-itself
script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"



if [ "${mode,,}" = "release" ]; then
pushd script_dir > /dev/null 2>&1
build_dir=build/release_$compiler_name
build_dir=build/release_$compiler_name$compiler_version
mkdir -p $build_dir
cmake -S. -B./$build_dir -DTEST_HURCHALLA_LIBS=ON \
-DCMAKE_BUILD_TYPE=Release \
Expand All @@ -493,7 +535,7 @@ if [ "${mode,,}" = "release" ]; then
popd > /dev/null 2>&1
elif [ "${mode,,}" = "debug" ]; then
pushd script_dir > /dev/null 2>&1
build_dir=build/debug_$compiler_name
build_dir=build/debug_$compiler_name$compiler_version
mkdir -p $build_dir
cmake -S. -B./$build_dir -DTEST_HURCHALLA_LIBS=ON \
-DCMAKE_BUILD_TYPE=Debug \
Expand Down
2 changes: 1 addition & 1 deletion examples/example_with_cmake/example.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash
#!/usr/bin/env bash

# Copyright (c) 2020-2022 Jeffrey Hurchalla.
# This Source Code Form is subject to the terms of the Mozilla Public
Expand Down
2 changes: 1 addition & 1 deletion examples/example_without_cmake/example.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash
#!/usr/bin/env bash

# Copyright (c) 2020-2022 Jeffrey Hurchalla.
# This Source Code Form is subject to the terms of the Mozilla Public
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,7 @@ class MontgomeryForm final {
// If you have already instantiated this MontgomeryForm, then calling
// remainder() should be faster than directly computing a % modulus,
// even if your CPU has extremely fast division (like many new CPUs).
T remainder(T a) const
HURCHALLA_FORCE_INLINE T remainder(T a) const
{
HPBC_PRECONDITION(a >= 0);
return static_cast<T>(impl.remainder(static_cast<U>(a)));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ MontyFullRangeMasked.h:
The class MontyFullRangeMasked is usable in the same situations and in the same way as MontyFullRange; i.e. any odd-value is permissable for the modulus of the constructor. It uses some interesting and unusual optimizations to the Montgomery arithmetic algorithms, in order to (in theory) perform faster multiply and square and fused-multiply/square-add/sub operations. The speedup comes at the cost of slightly slower simple add and subtract operations. The speedup also applies only to certain sizes of T. For a type T that is the same size as the CPU integer registers (e.g. uin64_t on a 64 bit computer) or a type T that is smaller than the register size, there is a decent chance that MontyFullRangeMasked<T> will perform better overall than MontyFullRange<T>, when both are given the same modulus. This is due to the improved multiply, square, and fused-multiply/square-add/sub functions. However, the plain add() and subtract() functions in MontyFullRangeMasked<T> will usually be slower than those in MontyFullRange<T>. For a type T that is larger than the CPU integer register size, you can usually expect MontyFullRangeMasked<T> will perform worse overall than MontyFullRange<T>, and to provide little or no benefit. If your modulus is small enough to allow use of MontyQuarterRange<T> or MontyHalfRange<T>, you can usually expect those classes to perform better than either MontyFullRange<T> or MontyFullRangeMasked<T>, regardless of the size of T.
To use MontyFullRangeMasked, you would ordinarily declare a variable (using an unsigned integral type T) as follows:
MontgomeryForm<T, MontyFullRangeMasked<T>> mf;

The unit_testing_helpers subdirectory contains classes that provide a run-time polymorphic version of MontgomeryForm for potentially much faster compile times during unit testing. These classes of course have a run-time performance penalty, so they're intended for use only in unit testing. At the moment, the class NoForceInlineMontgomeryForm (in the main test folder) seems to improve the compile times for the unit tests sufficiently, and so these extra classes remain here as experimental. Nevertheless, these extra classes compile correctly for me with clang16 (on macOS) and pass their tests in test_MontgomeryForm_extra.cpp.
Loading

0 comments on commit 8b59368

Please sign in to comment.