improve compile times for unit tests

hurchalla · Dec 6, 2024 · 8b59368 · 8b59368
1 parent fa90f20
commit 8b59368
Show file tree

Hide file tree

Showing 14 changed files with 1,561 additions and 123 deletions.
diff --git a/README.md b/README.md
@@ -2,11 +2,11 @@
 
 ![Alt text](images/clockxtrasmall_border2.jpg?raw=true "Clock Gears, photo by Krzysztof Golik, licensed CC BY-SA 4.0")
 
-Clockwork is a high performance, easy to use Modular Arithmetic (header-only) library for C++ for up to 128 bit integer types, with extensive support for Montgomery arithmetic.  If you want or need Montgomery arithmetic in this range, or general modular arithmetic functions, Clockwork is almost certainly the fastest and easiest library you could use (*for best performance just make sure you define the standard C++ macro NDEBUG*).
+Clockwork is a high performance, easy to use Modular Arithmetic (header-only) library for C++ for up to 128 bit integer types, with extensive support for Montgomery arithmetic.  If you want or need Montgomery arithmetic in this range, or general modular arithmetic functions, Clockwork is almost certainly the fastest and easiest library you could use.  For best performance make sure you define the standard C++ macro NDEBUG.
 
 ## Design goals
 
-The goal for Clockwork was to create a flexible and bulletproof library with the best performance yet achieved for modular arithmetic of native (on the CPU) integer types.  For integer types that are double the native bit width (e.g. 128 bit), performance should still be close to ideal, though not as completely optimized as for native types.  Larger than 128 bit types are permissible; however a library like GMP is likely to be a better choice for such sizes.
+Clockwork is designed to be a flexible and bulletproof library with the best performance achievable for modular arithmetic of native (on the CPU) integer types.  For integer types that are double the native bit width (e.g. 128 bit), performance should still be close to ideal, though not as completely optimized as for native types.  Larger than 128 bit types are permissible; however a library like GMP is likely to be a better choice for such sizes.
 
 ## Status
 
@@ -48,7 +48,7 @@ If you prefer, for the last command you could instead use CMake's default instal
 This will copy all the header files needed for this modular arithmetic library to an "include" subfolder in the installation folder of your choosing.
 When compiling your project, you'll of course need to ensure that you have that include subfolder as part of your include path.  
 
-For good performance you *must* ensure that the standard macro NDEBUG (see &lt;cassert&gt;) is defined when compiling.  You can generally do this by adding the option flag -DNDEBUG to your compile command.  
+For best performance you *must* ensure that the standard macro NDEBUG (see &lt;cassert&gt;) is defined when compiling.  You can generally do this by adding the option flag -DNDEBUG to your compile command.  
 
 It may help to see a simple [example](examples/example_without_cmake).
 
@@ -65,7 +65,7 @@ From the modular_arithmetic group, the files *absolute_value_difference.h*, *mod
 *hurchalla::modular_multiplicative_inverse(T a, T modulus)*.  Returns the multiplicative inverse of a if it exists, and otherwise returns 0.  
 *hurchalla::modular_pow(T base, T exponent, T modulus)*.  Returns the modular exponentiation of base^exponent (mod modulus).  
 
-From the montgomery_arithmetic group, the file *MontgomeryForm.h* provides the easy to use (and zero cost abstraction) class *hurchalla::MontgomeryForm*, which has member functions for effortlessly performing operations in the Montgomery domain.  These operations include converting to/from Montgomery domain, add, subtract, multiply, square, fused-multiply-add/sub, pow, gcd, and more.  For improved performance in some situations, the file *montgomery_form_aliases.h* provides simple aliases for faster (with limitations on allowed modulus) instantiations of the class MontgomeryForm.
+From the montgomery_arithmetic group, the file *MontgomeryForm.h* provides the easy to use (and zero cost abstraction) class *hurchalla::MontgomeryForm*, which has member functions for effortlessly performing operations in the Montgomery domain.  These operations include converting to/from Montgomery domain, add, subtract, multiply, square, [fused-multiply-add/sub](https://jeffhurchalla.com/2022/05/01/the-montgomery-multiply-accumulate), pow, gcd, and more.  For improved performance in some situations, the file *montgomery_form_aliases.h* provides simple aliases for faster (with limitations on allowed modulus) instantiations of the class MontgomeryForm.
 
 For an easy demonstration of MontgomeryForm, you can see one of the [examples](examples/example_without_cmake).
 
@@ -74,7 +74,3 @@ If you prefer not to use the high level interface of MontgomeryForm, and instead
 ## Performance Notes
 
 If you're interested in experimenting, predefining certain macros when compiling might improve performance - see [macros_for_performance.md](macros_for_performance.md).
-
-## TODO
-
-For the unit tests, solve the long compile time and high memory use (during compile) of the files test_MontgomeryForm.cpp and test_montgomery_pow.cpp.
diff --git a/build_tests.sh b/build_tests.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/usr/bin/env bash
 
 # Copyright (c) 2020-2022 Jeffrey Hurchalla.
 # This Source Code Form is subject to the terms of the Mozilla Public
@@ -172,6 +172,11 @@
 #   update-alternatives for both icc and icpc. ]
 
 
+if [ "${BASH_VERSINFO:-0}" -lt 4 ]; then
+   >&2 echo "This script requires some verion of bash >= 4.0.  Bash 3.2.57 is known to fail, but the minimum required version is unknown"
+   exit 1
+fi
+
 
 while getopts ":m:l:c:j:h-:raus" opt; do
   case $opt in
@@ -239,58 +244,72 @@ if [ "${compiler,,}" = "gcc" ] || [ "${compiler,,}" = "g++" ]; then
   cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=g++
   cmake_c_compiler=-DCMAKE_C_COMPILER=gcc
   compiler_name=gcc
+  compiler_version=0
 elif [ "${compiler,,}" = "gcc-7" ] || [ "${compiler,,}" = "g++-7" ] ||
      [ "${compiler,,}" = "gcc7" ] || [ "${compiler,,}" = "g++7" ]; then
   cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=g++-7
   cmake_c_compiler=-DCMAKE_C_COMPILER=gcc-7
-  compiler_name=gcc7
+  compiler_name=gcc
+  compiler_version=7
 elif [ "${compiler,,}" = "gcc-10" ] || [ "${compiler,,}" = "g++-10" ] ||
      [ "${compiler,,}" = "gcc10" ] || [ "${compiler,,}" = "g++10" ]; then
   cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=g++-10
   cmake_c_compiler=-DCMAKE_C_COMPILER=gcc-10
-  compiler_name=gcc10
+  compiler_name=gcc
+  compiler_version=10
 elif [ "${compiler,,}" = "gcc-13" ] || [ "${compiler,,}" = "g++-13" ] ||
      [ "${compiler,,}" = "gcc13" ] || [ "${compiler,,}" = "g++13" ]; then
   cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=g++-13
   cmake_c_compiler=-DCMAKE_C_COMPILER=gcc-13
-  compiler_name=gcc13
+  compiler_name=gcc
+  compiler_version=13
 elif [ "${compiler,,}" = "clang" ] || [ "${compiler,,}" = "clang++" ]; then
   cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=clang++
   cmake_c_compiler=-DCMAKE_C_COMPILER=clang
   compiler_name=clang
+  compiler_version=0
 elif [ "${compiler,,}" = "clang-3" ] || [ "${compiler,,}" = "clang++-3" ] ||
      [ "${compiler,,}" = "clang3" ] || [ "${compiler,,}" = "clang++3" ]; then
   cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=clang++-3.9
   cmake_c_compiler=-DCMAKE_C_COMPILER=clang-3.9
-  compiler_name=clang3
+  compiler_name=clang
+  compiler_version=3
 elif [ "${compiler,,}" = "clang-6" ] || [ "${compiler,,}" = "clang++-6" ] ||
      [ "${compiler,,}" = "clang6" ] || [ "${compiler,,}" = "clang++6" ]; then
   cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=clang++-6.0
   cmake_c_compiler=-DCMAKE_C_COMPILER=clang-6.0
-  compiler_name=clang6
+  compiler_name=clang
+  compiler_version=6
 elif [ "${compiler,,}" = "clang-10" ] || [ "${compiler,,}" = "clang++-10" ] ||
      [ "${compiler,,}" = "clang10" ] || [ "${compiler,,}" = "clang++10" ]; then
   cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=clang++-10
   cmake_c_compiler=-DCMAKE_C_COMPILER=clang-10
-  compiler_name=clang10
+  compiler_name=clang
+  compiler_version=10
 elif [ "${compiler,,}" = "clang-18" ] || [ "${compiler,,}" = "clang++-18" ] ||
      [ "${compiler,,}" = "clang18" ] || [ "${compiler,,}" = "clang++18" ]; then
   cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=clang++-18
   cmake_c_compiler=-DCMAKE_C_COMPILER=clang-18
-  compiler_name=clang18
+  compiler_name=clang
+  compiler_version=18
 elif [ "${compiler,,}" = "icc" ] || [ "${compiler,,}" = "icpc" ]; then
   cmake_cpp_compiler=-DCMAKE_CXX_COMPILER=icpc
   cmake_c_compiler=-DCMAKE_C_COMPILER=icc
   compiler_name=icc
+  compiler_version=0
   source /opt/intel/bin/compilervars.sh intel64
 elif [ -n "$compiler" ]; then
   echo "Invalid argument for option -c: $compiler"
   exit 1
 fi
 
-echo Using compiler $compiler_name ...
-echo Using build mode $mode ...
 
+if [ "$compiler_version" = "0" ]; then
+  echo Using compiler $compiler_name \(default version\) ...
+else
+  echo Using compiler $compiler_name v$compiler_version ...
+fi
+echo Using build mode $mode ...
 
 
 cpp_standard="-std=c++11"
@@ -369,23 +388,36 @@ if [ "$compiler_name" = "gcc" ]; then
 
   : # do nothing, at least for now
 
+# note that clang-tidy includes the clang static analyzer
 elif [ "$compiler_name" = "clang" ]; then
-#  clang_static_analysis=(-DCMAKE_CXX_CLANG_TIDY="clang-tidy;-checks=-*,clang-analyzer-*")
+#  clang_static_analysis=(-DCMAKE_CXX_CLANG_TIDY="clang-tidy;-extra-arg=-Wno-unknown-warning-option;-checks=*,clang-analyzer-*")
   : # do nothing, at least for now
 fi
+#-analyzer-checker=core
+#-analyzer-checker=cpp
+#-analyzer-checker=unix
+#-analyzer-checker=deadcode
 
 
 #undefined behavior sanitizers
-#-----
+#-----------------------------
+# note according to https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html:
+# "[The] UndefinedBehaviorSanitizer ... test suite is integrated into the CMake
+# build and can be run with check-ubsan command."
 if [ "$compiler_name" = "gcc" ]; then
   gcc_ubsan="-fsanitize=undefined -fno-sanitize-recover \
            -fsanitize=float-divide-by-zero -fsanitize=float-cast-overflow"
 
-elif [ "$compiler_name" = "clang" ]; then
-  # My installed version of clang doesn't support -fsanitize=implicit-conversion
-  clang_ubsan="-fsanitize=undefined -fsanitize=nullability -fsanitize=bounds \
-             -fsanitize=float-divide-by-zero"
-
+elif [ "$compiler_name" = "clang" ] && [[ $compiler_version -ge 6 ]]; then
+  # clang6 doesn't support -fsanitize=implicit-conversion.  Clang10 does support
+  # it.  I don't know if clang7,8,9 support it.
+  if [[ $compiler_version -ge 10 ]]; then
+    clang_ubsan="-fsanitize=undefined -fsanitize=nullability -fsanitize=bounds \
+                 -fsanitize=float-divide-by-zero -fsanitize=implicit-conversion"
+  else
+    clang_ubsan="-fsanitize=undefined -fsanitize=nullability -fsanitize=bounds \
+                 -fsanitize=float-divide-by-zero"
+  fi
   # The next line in a perfect world wouldn't be needed, but for some versions
   # of clang (clang 10 for me), the linker doesn't find __muloti4 when using the
   # undefined behavior sanitizers.  __muloti4 is defined in compiler-rt.
@@ -395,7 +427,7 @@ fi
 
 
 #address sanitizers
-#-----
+#------------------
 clang_asan=""
 gcc_asan="-fsanitize=address"
 # clang -fsanitize=address -O1 -fno-omit-frame-pointer -g   tests/use-after-free.c
@@ -442,6 +474,9 @@ gcc_asan="-fsanitize=address"
 # endif()
 
 
+#LeakSanitizer (LSan)
+#ThreadSanitizer (TSan)
+#MemorySanitizer (MSan)
 
 #modes
 # 1. Asan+UBsan+Lsan
@@ -451,7 +486,7 @@ gcc_asan="-fsanitize=address"
 
 # a run of "splint" and/or cppcheck
 # cpplint
-# include what you use (iwyu), and lwyu
+# include what you use (iwyu), and lwyu (link what you use)
 # Clang-Tidy
 # CppCoreCheck
 
@@ -463,6 +498,11 @@ gcc_asan="-fsanitize=address"
 # <LANG>_INCLUDE_WHAT_YOU_USE
 # LINK_WHAT_YOU_USE
 
+# fuzz testing
+
+# valgrind/purity
+
+# code coverage tools - gcov
 
 
 
@@ -476,9 +516,11 @@ exit_on_failure () {
 # https://stackoverflow.com/questions/59895/how-to-get-the-source-directory-of-a-bash-script-from-within-the-script-itself
 script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 
+
+
 if [ "${mode,,}" = "release" ]; then
     pushd script_dir > /dev/null 2>&1
-    build_dir=build/release_$compiler_name
+    build_dir=build/release_$compiler_name$compiler_version
     mkdir -p $build_dir
     cmake -S. -B./$build_dir -DTEST_HURCHALLA_LIBS=ON \
             -DCMAKE_BUILD_TYPE=Release \
@@ -493,7 +535,7 @@ if [ "${mode,,}" = "release" ]; then
     popd > /dev/null 2>&1
 elif [ "${mode,,}" = "debug" ]; then
     pushd script_dir > /dev/null 2>&1
-    build_dir=build/debug_$compiler_name
+    build_dir=build/debug_$compiler_name$compiler_version
     mkdir -p $build_dir
     cmake -S. -B./$build_dir -DTEST_HURCHALLA_LIBS=ON \
             -DCMAKE_BUILD_TYPE=Debug \

diff --git a/examples/example_with_cmake/example.sh b/examples/example_with_cmake/example.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/usr/bin/env bash
 
 # Copyright (c) 2020-2022 Jeffrey Hurchalla.
 # This Source Code Form is subject to the terms of the Mozilla Public

diff --git a/examples/example_without_cmake/example.sh b/examples/example_without_cmake/example.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/usr/bin/env bash
 
 # Copyright (c) 2020-2022 Jeffrey Hurchalla.
 # This Source Code Form is subject to the terms of the Mozilla Public

diff --git a/montgomery_arithmetic/include/hurchalla/montgomery_arithmetic/MontgomeryForm.h b/montgomery_arithmetic/include/hurchalla/montgomery_arithmetic/MontgomeryForm.h
@@ -492,7 +492,7 @@ class MontgomeryForm final {
     // If you have already instantiated this MontgomeryForm, then calling
     // remainder() should be faster than directly computing  a % modulus,
     // even if your CPU has extremely fast division (like many new CPUs).
-    T remainder(T a) const
+    HURCHALLA_FORCE_INLINE T remainder(T a) const
     {
         HPBC_PRECONDITION(a >= 0);
         return static_cast<T>(impl.remainder(static_cast<U>(a)));

diff --git a/...rithmetic/include/hurchalla/montgomery_arithmetic/detail/experimental/README.md b/...rithmetic/include/hurchalla/montgomery_arithmetic/detail/experimental/README.md
@@ -4,3 +4,5 @@ MontyFullRangeMasked.h:
 The class MontyFullRangeMasked is usable in the same situations and in the same way as MontyFullRange; i.e. any odd-value is permissable for the modulus of the constructor.  It uses some interesting and unusual optimizations to the Montgomery arithmetic algorithms, in order to (in theory) perform faster multiply and square and fused-multiply/square-add/sub operations.  The speedup comes at the cost of slightly slower simple add and subtract operations.  The speedup also applies only to certain sizes of T.  For a type T that is the same size as the CPU integer registers (e.g. uin64_t on a 64 bit computer) or a type T that is smaller than the register size, there is a decent chance that MontyFullRangeMasked<T> will perform better overall than MontyFullRange<T>, when both are given the same modulus.  This is due to the improved multiply, square, and fused-multiply/square-add/sub functions.  However, the plain add() and subtract() functions in MontyFullRangeMasked<T> will usually be slower than those in MontyFullRange<T>.  For a type T that is larger than the CPU integer register size, you can usually expect MontyFullRangeMasked<T> will perform worse overall than MontyFullRange<T>, and to provide little or no benefit.  If your modulus is small enough to allow use of MontyQuarterRange<T> or MontyHalfRange<T>, you can usually expect those classes to perform better than either MontyFullRange<T> or MontyFullRangeMasked<T>, regardless of the size of T.
 To use MontyFullRangeMasked, you would ordinarily declare a variable (using an unsigned integral type T) as follows:
 MontgomeryForm<T, MontyFullRangeMasked<T>> mf;
+
+The unit_testing_helpers subdirectory contains classes that provide a run-time polymorphic version of MontgomeryForm for potentially much faster compile times during unit testing.  These classes of course have a run-time performance penalty, so they're intended for use only in unit testing.  At the moment, the class NoForceInlineMontgomeryForm (in the main test folder) seems to improve the compile times for the unit tests sufficiently, and so these extra classes remain here as experimental.  Nevertheless, these extra classes compile correctly for me with clang16 (on macOS) and pass their tests in test_MontgomeryForm_extra.cpp.