Skip to content

Commit

Permalink
Address code review comments from MF
Browse files Browse the repository at this point in the history
  • Loading branch information
mmikaitis committed May 29, 2024
1 parent 7a9fa31 commit 653d1eb
Show file tree
Hide file tree
Showing 7 changed files with 225 additions and 219 deletions.
19 changes: 12 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,17 @@

CPFloat is a C library for simulating low-precision floating-point arithmetics. CPFloat provides efficient routines for rounding, performing arithmetic operations, evaluating mathematical functions, and querying properties of the simulated low-precision format. Internally, numbers are stored in `float` or `double` arrays. The low-precision format (target format) follows an extension of the IEEE 754 standard and it is entirely specified by four parameters:
* a positive integer *p*, which represents the number of digits of precision;
* a positive integer *e*<sub>max</sub>, which represents the maximum supported exponent;
* a positive integer *e*<sub>min</sub>, which represents the minimum supported exponent; and
* a positive integer *e*<sub>min</sub>, which represents the minimum supported exponent;
* a positive integer *e*<sub>max</sub>, which represents the maximum supported exponent; and
* a Boolean variable σ, set to **true** if subnormal are supported and to **false** otherwise.

The largest values of *p* and *e*<sub>max</sub>, and the smallest value of *e*<sub>min</sub> that can be used depend on the format in which the converted numbers are to be stored (storage format). A more extensive description of the characteristics of the low-precision formats that can be used, together with more details on the choice of the admissible values of *p*, *e*<sub>max</sub>, and *σ* can be found in [[1]](#ref1).
Valid choices of *p*, *e*<sub>min</sub>, and *e*<sub>max</sub> depend on the format in which the converted numbers are to be stored (storage format). A more extensive description of the characteristics of the low-precision formats that can be used, together with more details on admissible values for *p*, *e*<sub>min</sub>, *e*<sub>max</sub>, and *σ* can be found in [[1]](#ref1).

The library was originally intended as a faster version of the MATLAB function `chop` [[2]](#ref2), which is [available on GitHub](https://github.com/higham/chop).
The latest versions of the library have a variety of subtle differences compared with `chop`.
The latest versions of the library have a variety of subtle differences compared with `chop`:
* since June 14, 2022 `chop` supports specifying the function for generating random numbers. The MEX interface of CPFloat does not offer this capability;
* since v0.6.0 CPFloat allows to specify *e*<sub>min</sub> and *e*<sub>max</sub> instead of the previous strategy of specifying *e*<sub>max</sub> and enforcing *e*<sub>min</sub>=1-*e*<sub>max</sub>;
* since v0.6.0 the default 8-bit format E4M3 has *e*<sub>max</sub>=8 (in `chop` it is set to 7).

The code to reproduce the results of the tests in [[1]](#ref1) is [available on GitHub](https://github.com/north-numerical-computing/cpfloat_experiments).

Expand All @@ -23,11 +26,13 @@ The code to reproduce the results of the tests in [[1]](#ref1) is [available on

The only (optional) dependency of CPFloat is the [C implementation](https://github.com/imneme/pcg-c) of the [PCG Library](https://www.pcg-random.org), which provides a variety of high-quality pseudo-random number generators. For an in-depth discussion of the algorithms underlying the PCG Library, we recommend the [paper](https://www.pcg-random.org/paper.html) by [Melissa O'Neill](https://www.cs.hmc.edu/~oneill) [[3]](#ref3). If the header file `pcg_variants.h` in `include/pcg-c/include/pcg_variants.h` is not included at compile-time with the `--include` option, then CPFloat relies on the default C pseudo-random number generator.

The PCG Library is free software (see the [Licensing information](#licensing-information) below), and its generators are more efficient, reliable, and flexible than any combination of the functions `srand`, `rand`, and `rand_r` from the C standard library. A warning is issued at compile time if the location of `pcg_variant.h` is not specified correctly.
The PCG Library is free software (see the [Licensing information](#licensing-information) below), and its generators are more efficient, reliable, and flexible than any combination of the functions `srand`, `rand`, and `rand_r` from the C standard library. A warning is issued at compile time if the location of `pcg_variant.h` is not specified correctly.

Compiling the MEX interface requires a reasonably recent version of MATLAB or Octave.

# Developer dependencies

Compiling the MEX interface requires a reasonably recent version of MATLAB or Octave, and testing the interface requires the function `float_params`, which is [available on GitHub](https://github.com/higham/float_params). The unit tests for the C implementation in `test/cpfloat_test.ts` require the [check unit testing framework for C](https://libcheck.github.io/check) and the [subunit protocol](https://github.com/testing-cabal/subunit).
Testing the MEX interface requires the function `float_params`, which is [available on GitHub](https://github.com/higham/float_params). The unit tests for the C implementation in `test/cpfloat_test.ts` require the [check unit testing framework for C](https://libcheck.github.io/check) and the [subunit protocol](https://github.com/testing-cabal/subunit).

# Installation

Expand Down Expand Up @@ -56,7 +61,7 @@ make mexoct # Compile MEX interface for Octave.
```
These two commands compile and autotune the MEX interface in MATLAB and Octave, respectively, by using the functions `mex/cpfloat_compile.m` and `mex/cpfloat_autotune.m`. To use the interface, the `bin/` folder must be in MATLAB's search path.

On a system where the `make` build automation tool is not available, we recommend building the MEX interface by running the script `cpfloat_compile_nomake.m` in the `mex/` folder. The script attempts to compile and auto-tune the MEX interface using the default C compiler. The following code will download the repository as a ZIP file, inflate it, and try to compile it:
On a system where the `make` build automation tool is not available, we recommend building the MEX interface by running the script `cpfloat_compile_nomake.m` in the `mex/` folder. The script attempts to compile and autotune the MEX interface using the default C compiler. The following code will download the repository as a ZIP file, inflate it, and try to compile it:

```matlab
zip_url = 'https://codeload.github.com/north-numerical-computing/cpfloat/zip/refs/heads/main';
Expand Down
28 changes: 14 additions & 14 deletions mex/cpfloat.c
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ void mexFunction(int nlhs,

strcpy(fpopts->format, "h");
fpopts->precision = 11;
fpopts->emax = 15;
fpopts->emin = -14;
fpopts->emax = 15;
fpopts->subnormal = CPFLOAT_SUBN_USE;
fpopts->explim = CPFLOAT_EXPRANGE_TARG;
fpopts->round = CPFLOAT_RND_NE;
Expand Down Expand Up @@ -78,54 +78,54 @@ void mexFunction(int nlhs,
!strcmp(fpopts->format, "fp8-e4m3") ||
!strcmp(fpopts->format, "E4M3")) {
fpopts->precision = 4;
fpopts->emax = 8;
fpopts->emin = -6;
fpopts->emax = 8;
} else if (!strcmp(fpopts->format, "q52") ||
!strcmp(fpopts->format, "fp8-e5m2") ||
!strcmp(fpopts->format, "E5M2")) {
fpopts->precision = 3;
fpopts->emax = 15;
fpopts->emin = -14;
fpopts->emax = 15;
} else if (!strcmp(fpopts->format, "b") ||
!strcmp(fpopts->format, "bfloat16") ||
!strcmp(fpopts->format, "bf16")) {
fpopts->precision = 8;
fpopts->emax = 127;
fpopts->emin = -126;
fpopts->emax = 127;
is_subn_rnd_default = true;
} else if (!strcmp(fpopts->format, "h") ||
!strcmp(fpopts->format, "half") ||
!strcmp(fpopts->format, "binary16") ||
!strcmp(fpopts->format, "fp16")) {
fpopts->precision = 11;
fpopts->emax = 15;
fpopts->emin = -14;
fpopts->emax = 15;
} else if (!strcmp(fpopts->format, "t") ||
!strcmp(fpopts->format, "TensorFloat-32") ||
!strcmp(fpopts->format, "tf32")) {
fpopts->precision = 11;
fpopts->emax = 127;
fpopts->emin = -126;
fpopts->emax = 127;
} else if (!strcmp(fpopts->format, "s") ||
!strcmp(fpopts->format, "single") ||
!strcmp(fpopts->format, "binary32") ||
!strcmp(fpopts->format, "fp32")) {
fpopts->precision = 24;
fpopts->emax = 127;
fpopts->emin = -126;
fpopts->emax = 127;
} else if (!strcmp(fpopts->format, "d") ||
!strcmp(fpopts->format, "double") ||
!strcmp(fpopts->format, "binary64") ||
!strcmp(fpopts->format, "fp64")) {
fpopts->precision = 53;
fpopts->emax = 1023;
fpopts->emin = -1022;
fpopts->emax = 1023;
} else if (!strcmp(fpopts->format, "c") ||
!strcmp(fpopts->format, "custom")) {
if ((tmp != NULL) && (mxGetClassID(tmp) == mxDOUBLE_CLASS)) {
fpopts->precision = ((double *)mxGetData(tmp))[0];
fpopts->emax = ((double *)mxGetData(tmp))[1];
fpopts->emin = ((double *)mxGetData(tmp))[2];
fpopts->emin = ((double *)mxGetData(tmp))[1];
fpopts->emax = ((double *)mxGetData(tmp))[2];
} else {
mexErrMsgIdAndTxt("cpfloat:invalidparams",
"Invalid floating-point parameters specified.");
Expand Down Expand Up @@ -223,8 +223,8 @@ void mexFunction(int nlhs,
maxebits = 1023;
minebits = -1022;
}
if (fpopts->precision > maxfbits || fpopts->emax > maxebits
|| fpopts->emin < minebits)
if (fpopts->precision > maxfbits || fpopts->emin < minebits
||fpopts->emax > maxebits)
if (!strcmp(fpopts->format, "c") || !strcmp(fpopts->format, "custom"))
mexErrMsgIdAndTxt("cpfloat:invalidparams",
"Invalid floating-point parameters selected.");
Expand Down Expand Up @@ -297,8 +297,8 @@ void mexFunction(int nlhs,
mxArray *outparams = mxCreateDoubleMatrix(1,3,mxREAL);
double *outparamsptr = mxGetData(outparams);
outparamsptr[0] = fpopts->precision;
outparamsptr[1] = fpopts->emax;
outparamsptr[2] = fpopts->emin;
outparamsptr[1] = fpopts->emin;
outparamsptr[2] = fpopts->emax;
mxSetFieldByNumber(plhs[1], 0, 1, outparams);

mxArray *outsubnormal = mxCreateDoubleMatrix(1,1,mxREAL);
Expand Down
13 changes: 7 additions & 6 deletions mex/cpfloat.m
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,11 @@
%
% * The three-element vector FPOPTS.params specifies the parameters of the
% target floating-point format, and is ignored unless FPOPTS.format is set
% to either 'c' or 'custom'. The vector has the form [PRECISION,EMAX,EMIN],
% where PRECISION, EMAX and EMIN are positive integers representing
% to either 'c' or 'custom'. The vector has the form [PRECISION,EMIN,EMAX],
% where PRECISION, EMIN and EMAX are positive integers representing
% the number of binary digits in the fraction and the maximum exponent of
% the target format, respectively. The default value of this field is
% the vector [11,15,-14].
% the vector [11,-14,15].
%
% * The scalar FPOPTS.subnormal specifies the support for subnormal numbers.
% The target floating-point format will not support subnormal numbers if
Expand Down Expand Up @@ -80,9 +80,10 @@
% probability, that is, a real number in the interval [0,1]. The default
% value for this field is 0.5.
%
% The interface of CPFLOAT is partly compatible with that of the MATLAB
% function CHOP available at https://github.com/higham/chop. The main
% difference is that CPFLOAT requires EMIN specified in FPOPTS.params.
% The interface of CPFLOAT is mostly compatible with that of the MATLAB
% function CHOP available at https://github.com/higham/chop. See
% https://github.com/north-numerical-computing/cpfloat/blob/main/README.md
% for an up-to-date list of differences.

% SPDX-FileCopyrightText: 2020 Massimiliano Fasi and Mantas Mikaitis
% SPDX-License-Identifier: LGPL-2.1-or-later
Expand Down
Loading

0 comments on commit 653d1eb

Please sign in to comment.