Address code review comments from MF

north-numerical-computing · May 28, 2024 · 2292c0c · 2292c0c
1 parent 408e85c
commit 2292c0c
Show file tree

Hide file tree

Showing 7 changed files with 225 additions and 219 deletions.
diff --git a/README.md b/README.md
@@ -7,14 +7,17 @@
 
 CPFloat is a C library for simulating low-precision floating-point arithmetics. CPFloat provides efficient routines for rounding, performing arithmetic operations, evaluating  mathematical functions, and querying properties of the simulated low-precision format. Internally, numbers are stored in `float` or `double` arrays. The low-precision format (target format) follows an extension of the IEEE 754 standard and it is entirely specified by four parameters:
 * a positive integer *p*, which represents the number of digits of precision;
-* a positive integer *e*<sub>max</sub>, which represents the maximum supported exponent;
-* a positive integer *e*<sub>min</sub>, which represents the minimum supported exponent; and
+* a positive integer *e*<sub>min</sub>, which represents the minimum supported exponent;
+* a positive integer *e*<sub>max</sub>, which represents the maximum supported exponent; and
 * a Boolean variable σ, set to **true** if subnormal are supported and to **false** otherwise.
 
-The largest values of *p* and *e*<sub>max</sub>, and the smallest value of *e*<sub>min</sub> that can be used depend on the format in which the converted numbers are to be stored (storage format). A more extensive description of the characteristics of the low-precision formats that can be used, together with more details on the choice of the admissible values of *p*, *e*<sub>max</sub>, and *σ* can be found in [[1]](#ref1).
+Valid choices of *p*, *e*<sub>min</sub>, and *e*<sub>max</sub> depend on the format in which the converted numbers are to be stored (storage format). A more extensive description of the characteristics of the low-precision formats that can be used, together with more details on admissible values for *p*, *e*<sub>min</sub>, *e*<sub>max</sub>, and *σ* can be found in [[1]](#ref1).
 
 The library was originally intended as a faster version of the MATLAB function `chop` [[2]](#ref2), which is [available on GitHub](https://github.com/higham/chop).
-The latest versions of the library have a variety of subtle differences compared with `chop`.
+The latest versions of the library have a variety of subtle differences compared with `chop`:
+* since June 14, 2022 `chop` supports specifying the function for generating random numbers. The MEX interface of CPFloat does not offer this capability;
+* since v0.6.0 CPFloat allows to specify *e*<sub>min</sub> and *e*<sub>max</sub> instead of the previous strategy of specifying *e*<sub>max</sub> and enforcing *e*<sub>min</sub>=1-*e*<sub>max</sub>;
+* since v0.6.0 the default 8-bit format E4M3 has *e*<sub>max</sub>=8 (in `chop` it is set to 7).
 
 The code to reproduce the results of the tests in [[1]](#ref1) is [available on GitHub](https://github.com/north-numerical-computing/cpfloat_experiments).
 
@@ -23,11 +26,13 @@ The code to reproduce the results of the tests in [[1]](#ref1) is [available on
 
 The only (optional) dependency of CPFloat is the [C implementation](https://github.com/imneme/pcg-c) of the [PCG Library](https://www.pcg-random.org), which provides a variety of high-quality pseudo-random number generators. For an in-depth discussion of the algorithms underlying the PCG Library, we recommend the [paper](https://www.pcg-random.org/paper.html) by [Melissa O'Neill](https://www.cs.hmc.edu/~oneill) [[3]](#ref3). If the header file `pcg_variants.h` in `include/pcg-c/include/pcg_variants.h` is not included at compile-time with the `--include` option, then CPFloat relies on the default C pseudo-random number generator.
 
-The PCG Library is free software (see the [Licensing information](#licensing-information) below), and its generators are more efficient, reliable, and flexible than any combination of the functions `srand`, `rand`, and `rand_r` from the C standard  library. A warning is issued at compile time if the location of `pcg_variant.h` is not specified correctly.
+The PCG Library is free software (see the [Licensing information](#licensing-information) below), and its generators are more efficient, reliable, and flexible than any combination of the functions `srand`, `rand`, and `rand_r` from the C standard library. A warning is issued at compile time if the location of `pcg_variant.h` is not specified correctly.
+
+Compiling the MEX interface requires a reasonably recent version of MATLAB or Octave.
 
 # Developer dependencies
 
-Compiling the MEX interface requires a reasonably recent version of MATLAB or Octave, and testing the interface requires the function `float_params`, which is [available on GitHub](https://github.com/higham/float_params). The unit tests for the C implementation in `test/cpfloat_test.ts` require the [check unit testing framework for C](https://libcheck.github.io/check) and the [subunit protocol](https://github.com/testing-cabal/subunit).
+Testing the MEX interface requires the function `float_params`, which is [available on GitHub](https://github.com/higham/float_params). The unit tests for the C implementation in `test/cpfloat_test.ts` require the [check unit testing framework for C](https://libcheck.github.io/check) and the [subunit protocol](https://github.com/testing-cabal/subunit).
 
 # Installation
 
@@ -56,7 +61,7 @@ make mexoct # Compile MEX interface for Octave.
 ```
 These two commands compile and autotune the MEX interface in MATLAB and Octave, respectively, by using the functions `mex/cpfloat_compile.m` and `mex/cpfloat_autotune.m`. To use the interface, the `bin/` folder must be in MATLAB's search path.
 
-On a system where the `make` build automation tool is not available, we recommend building the MEX interface by running the script `cpfloat_compile_nomake.m` in the `mex/` folder. The script attempts to compile and auto-tune the MEX interface using the default C compiler. The following code will download the repository as a ZIP file, inflate it, and try to compile it:
+On a system where the `make` build automation tool is not available, we recommend building the MEX interface by running the script `cpfloat_compile_nomake.m` in the `mex/` folder. The script attempts to compile and autotune the MEX interface using the default C compiler. The following code will download the repository as a ZIP file, inflate it, and try to compile it:
 
 ```matlab
 zip_url = 'https://codeload.github.com/north-numerical-computing/cpfloat/zip/refs/heads/main';

diff --git a/mex/cpfloat.c b/mex/cpfloat.c
@@ -39,8 +39,8 @@ void mexFunction(int nlhs,
 
     strcpy(fpopts->format, "h");
     fpopts->precision = 11;
-    fpopts->emax = 15;
     fpopts->emin = -14;
+    fpopts->emax = 15;
     fpopts->subnormal = CPFLOAT_SUBN_USE;
     fpopts->explim = CPFLOAT_EXPRANGE_TARG;
     fpopts->round = CPFLOAT_RND_NE;
@@ -78,54 +78,54 @@ void mexFunction(int nlhs,
            !strcmp(fpopts->format, "fp8-e4m3") ||
                  !strcmp(fpopts->format, "E4M3")) {
         fpopts->precision = 4;
-        fpopts->emax = 8;
         fpopts->emin = -6;
+        fpopts->emax = 8;
       } else if (!strcmp(fpopts->format, "q52") ||
                  !strcmp(fpopts->format, "fp8-e5m2") ||
                  !strcmp(fpopts->format, "E5M2")) {
         fpopts->precision = 3;
-        fpopts->emax = 15;
         fpopts->emin = -14;
+        fpopts->emax = 15;
       } else if (!strcmp(fpopts->format, "b") ||
           !strcmp(fpopts->format, "bfloat16") ||
           !strcmp(fpopts->format, "bf16")) {
         fpopts->precision = 8;
-        fpopts->emax = 127;
         fpopts->emin = -126;
+        fpopts->emax = 127;
         is_subn_rnd_default = true;
       } else if (!strcmp(fpopts->format, "h") ||
                  !strcmp(fpopts->format, "half") ||
                  !strcmp(fpopts->format, "binary16") ||
                  !strcmp(fpopts->format, "fp16")) {
         fpopts->precision = 11;
-        fpopts->emax = 15;
         fpopts->emin = -14;
+        fpopts->emax = 15;
       } else if (!strcmp(fpopts->format, "t") ||
                  !strcmp(fpopts->format, "TensorFloat-32") ||
                  !strcmp(fpopts->format, "tf32")) {
         fpopts->precision = 11;
-        fpopts->emax = 127;
         fpopts->emin = -126;
+        fpopts->emax = 127;
       } else if (!strcmp(fpopts->format, "s") ||
                  !strcmp(fpopts->format, "single") ||
                  !strcmp(fpopts->format, "binary32") ||
                  !strcmp(fpopts->format, "fp32")) {
         fpopts->precision =  24;
-        fpopts->emax = 127;
         fpopts->emin = -126;
+        fpopts->emax = 127;
       } else if (!strcmp(fpopts->format, "d") ||
                  !strcmp(fpopts->format, "double") ||
                  !strcmp(fpopts->format, "binary64") ||
                  !strcmp(fpopts->format, "fp64")) {
         fpopts->precision =   53;
-        fpopts->emax = 1023;
         fpopts->emin = -1022;
+        fpopts->emax = 1023;
       } else if (!strcmp(fpopts->format, "c") ||
                  !strcmp(fpopts->format, "custom")) {
         if ((tmp != NULL) && (mxGetClassID(tmp) == mxDOUBLE_CLASS)) {
           fpopts->precision = ((double *)mxGetData(tmp))[0];
-          fpopts->emax = ((double *)mxGetData(tmp))[1];
-          fpopts->emin = ((double *)mxGetData(tmp))[2];
+          fpopts->emin = ((double *)mxGetData(tmp))[1];
+          fpopts->emax = ((double *)mxGetData(tmp))[2];
         } else {
           mexErrMsgIdAndTxt("cpfloat:invalidparams",
                             "Invalid floating-point parameters specified.");
@@ -223,8 +223,8 @@ void mexFunction(int nlhs,
       maxebits = 1023;
       minebits = -1022;
     }
-    if (fpopts->precision > maxfbits || fpopts->emax > maxebits
-        || fpopts->emin < minebits)
+    if (fpopts->precision > maxfbits || fpopts->emin < minebits
+        ||fpopts->emax > maxebits)
       if (!strcmp(fpopts->format, "c") || !strcmp(fpopts->format, "custom"))
         mexErrMsgIdAndTxt("cpfloat:invalidparams",
                           "Invalid floating-point parameters selected.");
@@ -297,8 +297,8 @@ void mexFunction(int nlhs,
     mxArray *outparams = mxCreateDoubleMatrix(1,3,mxREAL);
     double *outparamsptr = mxGetData(outparams);
     outparamsptr[0] = fpopts->precision;
-    outparamsptr[1] = fpopts->emax;
-    outparamsptr[2] = fpopts->emin;
+    outparamsptr[1] = fpopts->emin;
+    outparamsptr[2] = fpopts->emax;
     mxSetFieldByNumber(plhs[1], 0, 1, outparams);
 
     mxArray *outsubnormal = mxCreateDoubleMatrix(1,1,mxREAL);

diff --git a/mex/cpfloat.m b/mex/cpfloat.m
@@ -33,11 +33,11 @@
 %
 %   * The three-element vector FPOPTS.params specifies the parameters of the
 %     target floating-point format, and is ignored unless FPOPTS.format is set
-%     to either 'c' or 'custom'. The vector has the form [PRECISION,EMAX,EMIN],
-%     where PRECISION, EMAX and EMIN are positive integers representing
+%     to either 'c' or 'custom'. The vector has the form [PRECISION,EMIN,EMAX],
+%     where PRECISION, EMIN and EMAX are positive integers representing
 %     the number of binary digits in the fraction and the maximum exponent of
 %     the target format, respectively. The default value of this field is
-%     the vector [11,15,-14].
+%     the vector [11,-14,15].
 %
 %   * The scalar FPOPTS.subnormal specifies the support for subnormal numbers.
 %     The target floating-point format will not support subnormal numbers if
@@ -80,9 +80,10 @@
 %     probability, that is, a real number in the interval [0,1]. The default
 %     value for this field is 0.5.
 %
-%   The interface of CPFLOAT is partly compatible with that of the MATLAB
-%   function CHOP available at https://github.com/higham/chop. The main
-%   difference is that CPFLOAT requires EMIN specified in FPOPTS.params.
+%   The interface of CPFLOAT is mostly compatible with that of the MATLAB
+%   function CHOP available at https://github.com/higham/chop. See
+%   https://github.com/north-numerical-computing/cpfloat/blob/main/README.md
+%   for an up-to-date list of differences.
 
 % SPDX-FileCopyrightText: 2020 Massimiliano Fasi and Mantas Mikaitis
 % SPDX-License-Identifier: LGPL-2.1-or-later