diff --git a/README.md b/README.md index e8ed7e2..81ac538 100644 --- a/README.md +++ b/README.md @@ -7,14 +7,17 @@ CPFloat is a C library for simulating low-precision floating-point arithmetics. CPFloat provides efficient routines for rounding, performing arithmetic operations, evaluating mathematical functions, and querying properties of the simulated low-precision format. Internally, numbers are stored in `float` or `double` arrays. The low-precision format (target format) follows an extension of the IEEE 754 standard and it is entirely specified by four parameters: * a positive integer *p*, which represents the number of digits of precision; -* a positive integer *e*max, which represents the maximum supported exponent; -* a positive integer *e*min, which represents the minimum supported exponent; and +* a positive integer *e*min, which represents the minimum supported exponent; +* a positive integer *e*max, which represents the maximum supported exponent; and * a Boolean variable σ, set to **true** if subnormal are supported and to **false** otherwise. -The largest values of *p* and *e*max, and the smallest value of *e*min that can be used depend on the format in which the converted numbers are to be stored (storage format). A more extensive description of the characteristics of the low-precision formats that can be used, together with more details on the choice of the admissible values of *p*, *e*max, and *σ* can be found in [[1]](#ref1). +Valid choices of *p*, *e*min, and *e*max depend on the format in which the converted numbers are to be stored (storage format). A more extensive description of the characteristics of the low-precision formats that can be used, together with more details on admissible values for *p*, *e*min, *e*max, and *σ* can be found in [[1]](#ref1). The library was originally intended as a faster version of the MATLAB function `chop` [[2]](#ref2), which is [available on GitHub](https://github.com/higham/chop). -The latest versions of the library have a variety of subtle differences compared with `chop`. +The latest versions of the library have a variety of subtle differences compared with `chop`: +* since June 14, 2022 `chop` supports specifying the function for generating random numbers. The MEX interface of CPFloat does not offer this capability; +* since v0.6.0 CPFloat allows to specify *e*min and *e*max instead of the previous strategy of specifying *e*max and enforcing *e*min=1-*e*max; +* since v0.6.0 the default 8-bit format E4M3 has *e*max=8 (in `chop` it is set to 7). The code to reproduce the results of the tests in [[1]](#ref1) is [available on GitHub](https://github.com/north-numerical-computing/cpfloat_experiments). @@ -23,11 +26,13 @@ The code to reproduce the results of the tests in [[1]](#ref1) is [available on The only (optional) dependency of CPFloat is the [C implementation](https://github.com/imneme/pcg-c) of the [PCG Library](https://www.pcg-random.org), which provides a variety of high-quality pseudo-random number generators. For an in-depth discussion of the algorithms underlying the PCG Library, we recommend the [paper](https://www.pcg-random.org/paper.html) by [Melissa O'Neill](https://www.cs.hmc.edu/~oneill) [[3]](#ref3). If the header file `pcg_variants.h` in `include/pcg-c/include/pcg_variants.h` is not included at compile-time with the `--include` option, then CPFloat relies on the default C pseudo-random number generator. -The PCG Library is free software (see the [Licensing information](#licensing-information) below), and its generators are more efficient, reliable, and flexible than any combination of the functions `srand`, `rand`, and `rand_r` from the C standard library. A warning is issued at compile time if the location of `pcg_variant.h` is not specified correctly. +The PCG Library is free software (see the [Licensing information](#licensing-information) below), and its generators are more efficient, reliable, and flexible than any combination of the functions `srand`, `rand`, and `rand_r` from the C standard library. A warning is issued at compile time if the location of `pcg_variant.h` is not specified correctly. + +Compiling the MEX interface requires a reasonably recent version of MATLAB or Octave. # Developer dependencies -Compiling the MEX interface requires a reasonably recent version of MATLAB or Octave, and testing the interface requires the function `float_params`, which is [available on GitHub](https://github.com/higham/float_params). The unit tests for the C implementation in `test/cpfloat_test.ts` require the [check unit testing framework for C](https://libcheck.github.io/check) and the [subunit protocol](https://github.com/testing-cabal/subunit). +Testing the MEX interface requires the function `float_params`, which is [available on GitHub](https://github.com/higham/float_params). The unit tests for the C implementation in `test/cpfloat_test.ts` require the [check unit testing framework for C](https://libcheck.github.io/check) and the [subunit protocol](https://github.com/testing-cabal/subunit). # Installation @@ -56,7 +61,7 @@ make mexoct # Compile MEX interface for Octave. ``` These two commands compile and autotune the MEX interface in MATLAB and Octave, respectively, by using the functions `mex/cpfloat_compile.m` and `mex/cpfloat_autotune.m`. To use the interface, the `bin/` folder must be in MATLAB's search path. -On a system where the `make` build automation tool is not available, we recommend building the MEX interface by running the script `cpfloat_compile_nomake.m` in the `mex/` folder. The script attempts to compile and auto-tune the MEX interface using the default C compiler. The following code will download the repository as a ZIP file, inflate it, and try to compile it: +On a system where the `make` build automation tool is not available, we recommend building the MEX interface by running the script `cpfloat_compile_nomake.m` in the `mex/` folder. The script attempts to compile and autotune the MEX interface using the default C compiler. The following code will download the repository as a ZIP file, inflate it, and try to compile it: ```matlab zip_url = 'https://codeload.github.com/north-numerical-computing/cpfloat/zip/refs/heads/main'; diff --git a/mex/cpfloat.c b/mex/cpfloat.c index 49abbd7..d4c04ea 100644 --- a/mex/cpfloat.c +++ b/mex/cpfloat.c @@ -39,8 +39,8 @@ void mexFunction(int nlhs, strcpy(fpopts->format, "h"); fpopts->precision = 11; - fpopts->emax = 15; fpopts->emin = -14; + fpopts->emax = 15; fpopts->subnormal = CPFLOAT_SUBN_USE; fpopts->explim = CPFLOAT_EXPRANGE_TARG; fpopts->round = CPFLOAT_RND_NE; @@ -78,54 +78,54 @@ void mexFunction(int nlhs, !strcmp(fpopts->format, "fp8-e4m3") || !strcmp(fpopts->format, "E4M3")) { fpopts->precision = 4; - fpopts->emax = 8; fpopts->emin = -6; + fpopts->emax = 8; } else if (!strcmp(fpopts->format, "q52") || !strcmp(fpopts->format, "fp8-e5m2") || !strcmp(fpopts->format, "E5M2")) { fpopts->precision = 3; - fpopts->emax = 15; fpopts->emin = -14; + fpopts->emax = 15; } else if (!strcmp(fpopts->format, "b") || !strcmp(fpopts->format, "bfloat16") || !strcmp(fpopts->format, "bf16")) { fpopts->precision = 8; - fpopts->emax = 127; fpopts->emin = -126; + fpopts->emax = 127; is_subn_rnd_default = true; } else if (!strcmp(fpopts->format, "h") || !strcmp(fpopts->format, "half") || !strcmp(fpopts->format, "binary16") || !strcmp(fpopts->format, "fp16")) { fpopts->precision = 11; - fpopts->emax = 15; fpopts->emin = -14; + fpopts->emax = 15; } else if (!strcmp(fpopts->format, "t") || !strcmp(fpopts->format, "TensorFloat-32") || !strcmp(fpopts->format, "tf32")) { fpopts->precision = 11; - fpopts->emax = 127; fpopts->emin = -126; + fpopts->emax = 127; } else if (!strcmp(fpopts->format, "s") || !strcmp(fpopts->format, "single") || !strcmp(fpopts->format, "binary32") || !strcmp(fpopts->format, "fp32")) { fpopts->precision = 24; - fpopts->emax = 127; fpopts->emin = -126; + fpopts->emax = 127; } else if (!strcmp(fpopts->format, "d") || !strcmp(fpopts->format, "double") || !strcmp(fpopts->format, "binary64") || !strcmp(fpopts->format, "fp64")) { fpopts->precision = 53; - fpopts->emax = 1023; fpopts->emin = -1022; + fpopts->emax = 1023; } else if (!strcmp(fpopts->format, "c") || !strcmp(fpopts->format, "custom")) { if ((tmp != NULL) && (mxGetClassID(tmp) == mxDOUBLE_CLASS)) { fpopts->precision = ((double *)mxGetData(tmp))[0]; - fpopts->emax = ((double *)mxGetData(tmp))[1]; - fpopts->emin = ((double *)mxGetData(tmp))[2]; + fpopts->emin = ((double *)mxGetData(tmp))[1]; + fpopts->emax = ((double *)mxGetData(tmp))[2]; } else { mexErrMsgIdAndTxt("cpfloat:invalidparams", "Invalid floating-point parameters specified."); @@ -223,8 +223,8 @@ void mexFunction(int nlhs, maxebits = 1023; minebits = -1022; } - if (fpopts->precision > maxfbits || fpopts->emax > maxebits - || fpopts->emin < minebits) + if (fpopts->precision > maxfbits || fpopts->emin < minebits + ||fpopts->emax > maxebits) if (!strcmp(fpopts->format, "c") || !strcmp(fpopts->format, "custom")) mexErrMsgIdAndTxt("cpfloat:invalidparams", "Invalid floating-point parameters selected."); @@ -297,8 +297,8 @@ void mexFunction(int nlhs, mxArray *outparams = mxCreateDoubleMatrix(1,3,mxREAL); double *outparamsptr = mxGetData(outparams); outparamsptr[0] = fpopts->precision; - outparamsptr[1] = fpopts->emax; - outparamsptr[2] = fpopts->emin; + outparamsptr[1] = fpopts->emin; + outparamsptr[2] = fpopts->emax; mxSetFieldByNumber(plhs[1], 0, 1, outparams); mxArray *outsubnormal = mxCreateDoubleMatrix(1,1,mxREAL); diff --git a/mex/cpfloat.m b/mex/cpfloat.m index 7938f75..4f1dfa3 100644 --- a/mex/cpfloat.m +++ b/mex/cpfloat.m @@ -33,11 +33,11 @@ % % * The three-element vector FPOPTS.params specifies the parameters of the % target floating-point format, and is ignored unless FPOPTS.format is set -% to either 'c' or 'custom'. The vector has the form [PRECISION,EMAX,EMIN], -% where PRECISION, EMAX and EMIN are positive integers representing +% to either 'c' or 'custom'. The vector has the form [PRECISION,EMIN,EMAX], +% where PRECISION, EMIN and EMAX are positive integers representing % the number of binary digits in the fraction and the maximum exponent of % the target format, respectively. The default value of this field is -% the vector [11,15,-14]. +% the vector [11,-14,15]. % % * The scalar FPOPTS.subnormal specifies the support for subnormal numbers. % The target floating-point format will not support subnormal numbers if @@ -80,9 +80,10 @@ % probability, that is, a real number in the interval [0,1]. The default % value for this field is 0.5. % -% The interface of CPFLOAT is partly compatible with that of the MATLAB -% function CHOP available at https://github.com/higham/chop. The main -% difference is that CPFLOAT requires EMIN specified in FPOPTS.params. +% The interface of CPFLOAT is mostly compatible with that of the MATLAB +% function CHOP available at https://github.com/higham/chop. See +% https://github.com/north-numerical-computing/cpfloat/blob/main/README.md +% for an up-to-date list of differences. % SPDX-FileCopyrightText: 2020 Massimiliano Fasi and Mantas Mikaitis % SPDX-License-Identifier: LGPL-2.1-or-later diff --git a/src/cpfloat_binary32.h b/src/cpfloat_binary32.h index d3bdc44..3ecbe83 100644 --- a/src/cpfloat_binary32.h +++ b/src/cpfloat_binary32.h @@ -13,211 +13,211 @@ #include "cpfloat_docmacros.h" /* Validation of floating-point parameters. */ -doc_cpfloat_validate_optstruct(double, 12, 24, 127, -126) +doc_cpfloat_validate_optstruct(double, 12, 24, -126, 127) static inline int cpfloat_validate_optstructf(const optstruct *fpopts); /* Rounding functions. */ -doc_cpfloat(float, 24, 127, -126) +doc_cpfloat(float, 24, -126, 127) static inline int cpfloatf(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpfloat(float, 24, 127, -126) +doc_cpfloat(float, 24, -126, 127) static inline int cpf_fproundf(float *X, const float *A, const size_t numelem, optstruct *fpopts); /* Elementary arithmetic operations. */ -doc_cpf_bivariate(sum, \f$ X_i = A_i + B_i \f$, 24, 127, -126) +doc_cpf_bivariate(sum, \f$ X_i = A_i + B_i \f$, 24, -126, 127) static inline int cpf_addf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); -doc_cpf_bivariate(difference, \f$ X_i = A_i - B_i \f$, 24, 127, -126) +doc_cpf_bivariate(difference, \f$ X_i = A_i - B_i \f$, 24, -126, 127) static inline int cpf_subf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); -doc_cpf_bivariate(product, \f$ X_i = A_i \times B_i \f$, 24, 127, -126) +doc_cpf_bivariate(product, \f$ X_i = A_i \times B_i \f$, 24, -126, 127) static inline int cpf_mulf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); -doc_cpf_bivariate(ratio, \f$ X_i = A_i / B_i \f$, 24, 127, -126) +doc_cpf_bivariate(ratio, \f$ X_i = A_i / B_i \f$, 24, -126, 127) static inline int cpf_divf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); /* Trigonometric functions. */ -doc_cpf_univariate(trigonometric cosine, \f$ X_i = \cos(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(trigonometric cosine, \f$ X_i = \cos(A_i) \f$, 24, -126, 127) static inline int cpf_cosf(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(trigonometric sine, \f$ X_i = \sin(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(trigonometric sine, \f$ X_i = \sin(A_i) \f$, 24, -126, 127) static inline int cpf_sinf(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(trigonometric tangent, \f$ X_i = \tan(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(trigonometric tangent, \f$ X_i = \tan(A_i) \f$, 24, -126, 127) static inline int cpf_tanf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse trigonometric cosine, - \f$ X_i = \mathrm{acos}(A_i) \f$, 24, 127, -126) + \f$ X_i = \mathrm{acos}(A_i) \f$, 24, -126, 127) static inline int cpf_acosf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse trigonometric sine, - \f$ X_i = \mathrm{asin}(A_i) \f$, 24, 127, -126) + \f$ X_i = \mathrm{asin}(A_i) \f$, 24, -126, 127) static inline int cpf_asinf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse trigonometric tangent, - \f$ X_i = \mathrm{atan}(A_i) \f$, 24, 127, -126) + \f$ X_i = \mathrm{atan}(A_i) \f$, 24, -126, 127) static inline int cpf_atanf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(2-argument arctangent, - \f$ X_i = \mathrm{atan} (B_i / A_i) \f$, 24, 127, -126) + \f$ X_i = \mathrm{atan} (B_i / A_i) \f$, 24, -126, 127) static inline int cpf_atan2f(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); /* Hyperbolic functions. */ -doc_cpf_univariate(hyperbolic cosine, \f$ X_i = \mathrm{cosh}(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(hyperbolic cosine, \f$ X_i = \mathrm{cosh}(A_i) \f$, 24, -126, 127) static inline int cpf_coshf(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(hyperbolic sine, \f$ X_i = \mathrm{sinh}(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(hyperbolic sine, \f$ X_i = \mathrm{sinh}(A_i) \f$, 24, -126, 127) static inline int cpf_sinhf(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(hyperbolic tangent , \f$ X_i = \mathrm{tanh}(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(hyperbolic tangent , \f$ X_i = \mathrm{tanh}(A_i) \f$, 24, -126, 127) static inline int cpf_tanhf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse hyperbolic cosine, - \f$ X_i = \mathrm{arcosh}(A_i) \f$, 24, 127, -126) + \f$ X_i = \mathrm{arcosh}(A_i) \f$, 24, -126, 127) static inline int cpf_acoshf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse hyperbolic sine, - \f$ X_i = \mathrm{arsinh}(A_i) \f$, 24, 127, -126) + \f$ X_i = \mathrm{arsinh}(A_i) \f$, 24, -126, 127) static inline int cpf_asinhf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse hyperbolic tangent, - \f$ X_i = \mathrm{artanh}(A_i) \f$, 24, 127, -126) + \f$ X_i = \mathrm{artanh}(A_i) \f$, 24, -126, 127) static inline int cpf_atanhf(float *X, const float *A, const size_t numelem, optstruct *fpopts); /* Exponentiation and logarithmic functions. */ -doc_cpf_univariate(exponential, \f$ X_i = \exp(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(exponential, \f$ X_i = \exp(A_i) \f$, 24, -126, 127) static inline int cpf_expf(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_frexp(24, 127, -126) +doc_cpf_frexp(24, -126, 127) static inline int cpf_frexpf(float *X, int *exp, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_scaling(2, 24, 127, -126) +doc_cpf_scaling(2, 24, -126, 127) static inline int cpf_ldexpf(float *X, const float *A, const int *exp, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(natural logarithm, \f$ X_i = \log(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(natural logarithm, \f$ X_i = \log(A_i) \f$, 24, -126, 127) static inline int cpf_logf(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(base-10 logarithm, \f$ X_i = \log_{10}(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(base-10 logarithm, \f$ X_i = \log_{10}(A_i) \f$, 24, -126, 127) static inline int cpf_log10f(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_modf(24, 127, -126) +doc_cpf_modf(24, -126, 127) static inline int cpf_modff(float *X, float *intpart, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(base-2 exponential, \f$ X_i = 2^{A_i} \f$, 24, 127, -126) +doc_cpf_univariate(base-2 exponential, \f$ X_i = 2^{A_i} \f$, 24, -126, 127) static inline int cpf_exp2f(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(exp(x) - 1, \f$ X_i = \exp(A_i) - 1 \f$, 24, 127, -126) +doc_cpf_univariate(exp(x) - 1, \f$ X_i = \exp(A_i) - 1 \f$, 24, -126, 127) static inline int cpf_expm1f(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_ilogb(24, 127, -126) +doc_cpf_ilogb(24, -126, 127) static inline int cpf_ilogbf(int *exp, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(natural logarithm of number shifted by one, - \f$ X_i = \log(1+A_i) \f$, 24, 127, -126) + \f$ X_i = \log(1+A_i) \f$, 24, -126, 127) static inline int cpf_log1pf(float *X, const float *A, size_t numelem, optstruct *fpopts); -doc_cpf_univariate(base-2 logarithm, \f$ X_i = \log_2(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(base-2 logarithm, \f$ X_i = \log_2(A_i) \f$, 24, -126, 127) static inline int cpf_log2f(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(base-FLT_RADIX logarithm of absolute value, - \f$ X_i = \log(\lvert A_i \rvert) \f$, 24, 127, -126) + \f$ X_i = \log(\lvert A_i \rvert) \f$, 24, -126, 127) static inline int cpf_logbf(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_scaling(FLT\_RADIX, 24, 127, -126) +doc_cpf_scaling(FLT\_RADIX, 24, -126, 127) static inline int cpf_scalbnf(float *X, const float *A, const int *exp, const size_t numelem, optstruct *fpopts); -doc_cpf_scaling(FLT\_RADIX, 24, 127, -126) +doc_cpf_scaling(FLT\_RADIX, 24, -126, 127) static inline int cpf_scalblnf(float *X, const float *A, const long int *exp, const size_t numelem, optstruct *fpopts); /* Power functions. */ -doc_cpf_bivariate(real powers, \f$ X_i = A_i^{B_i} \f$, 24, 127, -126) +doc_cpf_bivariate(real powers, \f$ X_i = A_i^{B_i} \f$, 24, -126, 127) static inline int cpf_powf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(square root, \f$ X_i = \sqrt{A_i} \f$, 24, 127, -126) +doc_cpf_univariate(square root, \f$ X_i = \sqrt{A_i} \f$, 24, -126, 127) static inline int cpf_sqrtf(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(cube root, \f$ X_i = \sqrt[3]{A_i} \f$, 24, 127, -126) +doc_cpf_univariate(cube root, \f$ X_i = \sqrt[3]{A_i} \f$, 24, -126, 127) static inline int cpf_cbrtf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(hypotenuse of a right-angle triangle, - \f$ X_i = \sqrt{A_i^2 + B_i^2} \f$, 24, 127, -126) + \f$ X_i = \sqrt{A_i^2 + B_i^2} \f$, 24, -126, 127) static inline int cpf_hypotf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); /* Error and gamma functions. */ -doc_cpf_univariate(error function, \f$ X_i = \mathrm{erf}(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(error function, \f$ X_i = \mathrm{erf}(A_i) \f$, 24, -126, 127) static inline int cpf_erff(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(complementary error function, - \f$ X_i = \mathrm{erfc}(A_i) \f$, 24, 127, -126) + \f$ X_i = \mathrm{erfc}(A_i) \f$, 24, -126, 127) static inline int cpf_erfcf(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(gamma function, \f$ X_i = \Gamma(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(gamma function, \f$ X_i = \Gamma(A_i) \f$, 24, -126, 127) static inline int cpf_tgammaf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(natural logarithm of absolute value of gamma function, - \f$ X_i = \log(\lvert \Gamma(A_i) \rvert) \f$, 24, 127, -126) + \f$ X_i = \log(\lvert \Gamma(A_i) \rvert) \f$, 24, -126, 127) static inline int cpf_lgammaf(float *X, const float *A, const size_t numelem, optstruct *fpopts); /* Rounding and remainder functions. */ -doc_cpf_univariate(ceiling function, \f$ X_i = \lceil A_i \rceil \f$, 24, 127, -126) +doc_cpf_univariate(ceiling function, \f$ X_i = \lceil A_i \rceil \f$, 24, -126, 127) static inline int cpf_ceilf(float *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(floor function, \f$ X_i = \lfloor A_i \rfloor \f$, 24, 127, -126) +doc_cpf_univariate(floor function, \f$ X_i = \lfloor A_i \rfloor \f$, 24, -126, 127) static inline int cpf_floorf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(floating-point remainder of division, - \f$ X_i = A_i \;\mathrm{mod}\; B_i \f$, 24, 127, -126) + \f$ X_i = A_i \;\mathrm{mod}\; B_i \f$, 24, -126, 127) static inline int cpf_fmodf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(integer truncation, \f$ X_i = \mathrm{trunc}(A_i) \f$, 24, 127, -126) +doc_cpf_univariate(integer truncation, \f$ X_i = \mathrm{trunc}(A_i) \f$, 24, -126, 127) static inline int cpf_truncf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(closest integer (with round-to-nearest), - \f$ X_i = \mathrm{round}(A_i) \f$, 24, 127, -126) + \f$ X_i = \mathrm{round}(A_i) \f$, 24, -126, 127) static inline int cpf_roundf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(closest integer (with round-to-nearest), - \f$ X_i = \mathrm{round}(A_i) \f$, 24, 127, -126) + \f$ X_i = \mathrm{round}(A_i) \f$, 24, -126, 127) static inline int cpf_lroundf(long *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate_nobitflip(closest integer (with round-to-nearest), - \f$ X_i = \mathrm{round}(A_i) \f$, 24, 127, -126) + \f$ X_i = \mathrm{round}(A_i) \f$, 24, -126, 127) static inline int cpf_llroundf(long long *X, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_rint(PMAX, 127, -126) +doc_cpf_rint(PMAX, -126, 127) static inline int cpf_rintf(float *X, int *exception, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_rint(24, 127, -126) +doc_cpf_rint(24, -126, 127) static inline int cpf_lrintf(long *X, int *exception, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_rint(24, 127, -126) +doc_cpf_rint(24, -126, 127) static inline int cpf_llrintf(long long *X, int *exception, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_nearbyint(24, 127, -126) +doc_cpf_nearbyint(24, -126, 127) static inline int cpf_nearbyintf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(remainder of the floating point division, \f$ X_i = A_i^2 - k \times B_i \f$ for largest \f$ k \f$ such that \f$ k \times B_i < A_i \f$, - 24, 127, -126) + 24, -126, 127) static inline int cpf_remainderf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); -doc_cpf_remquo(24, 127, -126) +doc_cpf_remquo(24, -126, 127) static inline int cpf_remquof(float *X, int *quot, const float *A, const float *B, const size_t numelem, optstruct *fpopts); @@ -225,17 +225,17 @@ static inline int cpf_remquof(float *X, int *quot, /* Floating-point manipulation functions. */ doc_cpf_bivariate(number from magnitude and sign, \f$ X_i = \mathrm{sign}(A_i) \times \lvert B_i \rvert \f$, - 24, 127, -126) + 24, -126, 127) static inline int cpf_copysignf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(next floating-point number in specified direction, the floating-point number closest to \f$ A_i \f$ in the - direction of \f$ B_i \f$, 24, 127, -126) + direction of \f$ B_i \f$, 24, -126, 127) static inline int cpf_nextafterf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(next floating-point number in specified direction, the floating-point number closest to \f$ A_i \f$ in the - direction of \f$ B_i \f$, 24, 127, -126) + direction of \f$ B_i \f$, 24, -126, 127) static inline int cpf_nexttowardf(float *X, const float *A, const long double *B, const size_t numelem, @@ -243,41 +243,41 @@ static inline int cpf_nexttowardf(float *X, const float *A, /* Minimum, maximum, difference functions. */ doc_cpf_bivariate(positive difference, \f$ X_i = \lvert A_i - B_i \rvert \f$, - 24, 127, -126) + 24, -126, 127) static inline int cpf_fdimf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(element-wise maximum, \f$ X_i = \mathrm{max}(A_i, B_i) \f$, - 24, 127, -126) + 24, -126, 127) static inline int cpf_fmaxf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(element-wise minimum, \f$ X_i = \mathrm{min}(A_i, B_i) \f$, - 24, 127, -126) + 24, -126, 127) static inline int cpf_fminf(float *X, const float *A, const float *B, const size_t numelem, optstruct *fpopts); /* Classification. */ -doc_cpf_fpclassify(24, 127, -126) +doc_cpf_fpclassify(24, -126, 127) static inline int cpf_fpclassifyf(int *r, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_isfun(finite, 24, 127, -126) +doc_cpf_isfun(finite, 24, -126, 127) static inline int cpf_isfinitef(int *r, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_isfun(infinite, 24, 127, -126) +doc_cpf_isfun(infinite, 24, -126, 127) static inline int cpf_isinff(int *r, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_isfun(not a number, 24, 127, -126) +doc_cpf_isfun(not a number, 24, -126, 127) static inline int cpf_isnanf(int *r, const float *A, const size_t numelem, optstruct *fpopts); -doc_cpf_isfun(normal, 24, 127, -126) +doc_cpf_isfun(normal, 24, -126, 127) static inline int cpf_isnormalf(int *r, const float *A, const size_t numelem, optstruct *fpopts); /* Other functions. */ -doc_cpf_univariate(absolute value, \f$ X_i = \lvert A_i \rvert \f$, 24, 127, -126) +doc_cpf_univariate(absolute value, \f$ X_i = \lvert A_i \rvert \f$, 24, -126, 127) static inline int cpf_fabsf(float *X, const float *A, const size_t numelem, optstruct *fpopts); doc_cpf_trivariate(fused multiply-add , \f$ X_i = A_i \times B_i + C_i \f$, - 24, 127, -126) + 24, -126, 127) static inline int cpf_fmaf(float *X, const float *A, const float *B, const float *C, const size_t numelem, optstruct *fpopts); diff --git a/src/cpfloat_binary64.h b/src/cpfloat_binary64.h index 0651060..5df0a6b 100644 --- a/src/cpfloat_binary64.h +++ b/src/cpfloat_binary64.h @@ -13,272 +13,272 @@ #include "cpfloat_docmacros.h" /* Validation of floating-point parameters. */ -doc_cpfloat_validate_optstruct(double, 26, 53, 1023, -1022) +doc_cpfloat_validate_optstruct(double, 26, 53, -1022, 1023) static inline int cpfloat_validate_optstruct(const optstruct *fpopts); /* Rounding functions. */ -doc_cpfloat(double, 53, 1023, -1022) +doc_cpfloat(double, 53, -1022, 1023) static inline int cpfloat(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpfloat(double, 53, 1023, -1022) +doc_cpfloat(double, 53, -1022, 1023) static inline int cpf_fpround(double *X, const double *A, const size_t numelem, optstruct *fpopts); /* Elementary arithmetic operations. */ -doc_cpf_bivariate(sum, \f$ X_i = A_i + B_i \f$, 53, 1023, -1022) +doc_cpf_bivariate(sum, \f$ X_i = A_i + B_i \f$, 53, -1022, 1023) static inline int cpf_add(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); -doc_cpf_bivariate(difference, \f$ X_i = A_i - B_i \f$, 53, 1023, -1022) +doc_cpf_bivariate(difference, \f$ X_i = A_i - B_i \f$, 53, -1022, 1023) static inline int cpf_sub(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); -doc_cpf_bivariate(product, \f$ X_i = A_i \times B_i \f$, 53, 1023, -1022) +doc_cpf_bivariate(product, \f$ X_i = A_i \times B_i \f$, 53, -1022, 1023) static inline int cpf_mul(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); -doc_cpf_bivariate(ratio, \f$ X_i = A_i / B_i \f$, 53, 1023, -1022) +doc_cpf_bivariate(ratio, \f$ X_i = A_i / B_i \f$, 53, -1022, 1023) static inline int cpf_div(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); /* Trigonometric functions. */ -doc_cpf_univariate(trigonometric cosine, \f$ X_i = \cos(A_i) \f$, 53, 1023, -1022) +doc_cpf_univariate(trigonometric cosine, \f$ X_i = \cos(A_i) \f$, 53, -1022, 1023) static inline int cpf_cos(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(trigonometric sine, \f$ X_i = \sin(A_i) \f$, 53, 1023, -1022) +doc_cpf_univariate(trigonometric sine, \f$ X_i = \sin(A_i) \f$, 53, -1022, 1023) static inline int cpf_sin(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(trigonometric tangent, \f$ X_i = \tan(A_i) \f$, 53, 1023, -1022) +doc_cpf_univariate(trigonometric tangent, \f$ X_i = \tan(A_i) \f$, 53, -1022, 1023) static inline int cpf_tan(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse trigonometric cosine, - \f$ X_i = \mathrm{acos(A_i)} \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{acos(A_i)} \f$, 53, -1022, 1023) static inline int cpf_acos(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse trigonometric sine, - \f$ X_i = \mathrm{asin}(A_i) \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{asin}(A_i) \f$, 53, -1022, 1023) static inline int cpf_asin(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse trigonometric tangent, - \f$ X_i = \mathrm{atan}(A_i) \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{atan}(A_i) \f$, 53, -1022, 1023) static inline int cpf_atan(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(2-argument arctangent, - \f$ X_i = \mathrm{atan}(B_i / A_i) \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{atan}(B_i / A_i) \f$, 53, -1022, 1023) static inline int cpf_atan2(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); /* Hyperbolic functions. */ doc_cpf_univariate(hyperbolic cosine, \f$ X_i = \mathrm{cosh}(A_i) \f$, - 53, 1023, -1022) + 53, -1022, 1023) static inline int cpf_cosh(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(hyperbolic sine, \f$ X_i = \mathrm{sinh}(A_i) \f$, 53, 1023, -1022) +doc_cpf_univariate(hyperbolic sine, \f$ X_i = \mathrm{sinh}(A_i) \f$, 53, -1022, 1023) static inline int cpf_sinh(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(hyperbolic tangent , \f$ X_i = \mathrm{tanh}(A_i) \f$, - 53, 1023, -1022) + 53, -1022, 1023) static inline int cpf_tanh(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse hyperbolic cosine, - \f$ X_i = \mathrm{arcosh}(A_i) \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{arcosh}(A_i) \f$, 53, -1022, 1023) static inline int cpf_acosh(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse hyperbolic sine, - \f$ X_i = \mathrm{arsinh}(A_i) \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{arsinh}(A_i) \f$, 53, -1022, 1023) static inline int cpf_asinh(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(inverse hyperbolic tangent, - \f$ X_i = \mathrm{artanh}(A_i) \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{artanh}(A_i) \f$, 53, -1022, 1023) static inline int cpf_atanh(double *X, const double *A, const size_t numelem, optstruct *fpopts); /* Exponentiation and logarithmic functions. */ -doc_cpf_univariate(exponential, \f$ X_i = \exp(A_i) \f$, 53, 1023, -1022) +doc_cpf_univariate(exponential, \f$ X_i = \exp(A_i) \f$, 53, -1022, 1023) static inline int cpf_exp(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_frexp(53, 1023, -1022) +doc_cpf_frexp(53, -1022, 1023) static inline int cpf_frexp(double *X, int *exp, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_scaling(2, 53, 1023, -1022) +doc_cpf_scaling(2, 53, -1022, 1023) static inline int cpf_ldexp(double *X, const double *A, const int *exp, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(natural logarithm, \f$ X_i = \log(A_i) \f$, 53, 1023, -1022) +doc_cpf_univariate(natural logarithm, \f$ X_i = \log(A_i) \f$, 53, -1022, 1023) static inline int cpf_log(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(base - 10 logarithm, \f$ X_i = \log_{10}(A_i) \f$, 53, 1023, -1022) +doc_cpf_univariate(base - 10 logarithm, \f$ X_i = \log_{10}(A_i) \f$, 53, -1022, 1023) static inline int cpf_log10(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_modf(53, 1023, -1022) +doc_cpf_modf(53, -1022, 1023) static inline int cpf_modf(double *X, double *intpart, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(base-2 exponential, \f$ X_i = 2^{A_i} \f$, 53, 1023, -1022) +doc_cpf_univariate(base-2 exponential, \f$ X_i = 2^{A_i} \f$, 53, -1022, 1023) static inline int cpf_exp2(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(exp(x) - 1, \f$ X_i = \exp(A_i) - 1 \f$, 53, 1023, -1022) +doc_cpf_univariate(exp(x) - 1, \f$ X_i = \exp(A_i) - 1 \f$, 53, -1022, 1023) static inline int cpf_expm1(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_ilogb(53, 1023, -1022) +doc_cpf_ilogb(53, -1022, 1023) static inline int cpf_ilogb(int *exp, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(natural logarithm of number shifted by one, - \f$ X_i = \log(1+A_i) \f$, 53, 1023, -1022) + \f$ X_i = \log(1+A_i) \f$, 53, -1022, 1023) static inline int cpf_log1p(double *X, const double *A, size_t numelem, optstruct *fpopts); -doc_cpf_univariate(base-2 logarithm, \f$ X_i = \log_2(A_i) \f$, 53, 1023, -1022) +doc_cpf_univariate(base-2 logarithm, \f$ X_i = \log_2(A_i) \f$, 53, -1022, 1023) static inline int cpf_log2(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(base-FLT_RADIX logarithm of absolute value, - \f$ X_i = \log(\lvert A_i \rvert) \f$, 53, 1023, -1022) + \f$ X_i = \log(\lvert A_i \rvert) \f$, 53, -1022, 1023) static inline int cpf_logb(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_scaling(FLT\_RADIX, 53, 1023, -1022) +doc_cpf_scaling(FLT\_RADIX, 53, -1022, 1023) static inline int cpf_scalbn(double *X, const double *A, const int *exp, const size_t numelem, optstruct *fpopts); -doc_cpf_scaling(FLT\_RADIX, 53, 1023, -1022) +doc_cpf_scaling(FLT\_RADIX, 53, -1022, 1023) static inline int cpf_scalbln(double *X, const double *A, const long int *exp, const size_t numelem, optstruct *fpopts); /* Power functions. */ -doc_cpf_bivariate(real powers, \f$ X_i = A_i^{B_i} \f$, 53, 1023, -1022) +doc_cpf_bivariate(real powers, \f$ X_i = A_i^{B_i} \f$, 53, -1022, 1023) static inline int cpf_pow(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(square root, \f$ X_i = \sqrt{A_i} \f$, 53, 1023, -1022) +doc_cpf_univariate(square root, \f$ X_i = \sqrt{A_i} \f$, 53, -1022, 1023) static inline int cpf_sqrt(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(cube root, \f$ X_i = \sqrt[3]{A_i} \f$, 53, 1023, -1022) +doc_cpf_univariate(cube root, \f$ X_i = \sqrt[3]{A_i} \f$, 53, -1022, 1023) static inline int cpf_cbrt(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(hypotenuse of a right-angle triangle, - \f$ X_i = \sqrt{A_i^2 + B_i^2} \f$, 53, 1023, -1022) + \f$ X_i = \sqrt{A_i^2 + B_i^2} \f$, 53, -1022, 1023) static inline int cpf_hypot(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); /* Error and gamma functions. */ -doc_cpf_univariate(error function, \f$ X_i = \mathrm{erf}(A_i) \f$, 53, 1023, -1022) +doc_cpf_univariate(error function, \f$ X_i = \mathrm{erf}(A_i) \f$, 53, -1022, 1023) static inline int cpf_erf(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(complementary error function, - \f$ X_i = \mathrm{erfc}(A_i) \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{erfc}(A_i) \f$, 53, -1022, 1023) static inline int cpf_erfc(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(gamma function, \f$ X_i = \Gamma(A_i) \f$, 53, 1023, -1022) +doc_cpf_univariate(gamma function, \f$ X_i = \Gamma(A_i) \f$, 53, -1022, 1023) static inline int cpf_tgamma(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(natural logarithm of absolute value of gamma function, - \f$ X_i = \log(\lvert \Gamma(A_i) \rvert) \f$, 53, 1023, -1022) + \f$ X_i = \log(\lvert \Gamma(A_i) \rvert) \f$, 53, -1022, 1023) static inline int cpf_lgamma(double *X, const double *A, const size_t numelem, optstruct *fpopts); /* Rounding and remainder functions. */ -doc_cpf_univariate(ceiling function, \f$ X_i = \lceil A_i \rceil \f$, 53, 1023, -1022) +doc_cpf_univariate(ceiling function, \f$ X_i = \lceil A_i \rceil \f$, 53, -1022, 1023) static inline int cpf_ceil(double *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_univariate(floor function, \f$ X_i = \lfloor A_i \rfloor \f$, 53, 1023, -1022) +doc_cpf_univariate(floor function, \f$ X_i = \lfloor A_i \rfloor \f$, 53, -1022, 1023) static inline int cpf_floor(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(floating-point remainder of division, - \f$ X_i = A_i \;\mathrm{mod}\; B_i \f$, 53, 1023, -1022) + \f$ X_i = A_i \;\mathrm{mod}\; B_i \f$, 53, -1022, 1023) static inline int cpf_fmod(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(integer truncation, \f$ X_i = \mathrm{trunc}(A_i) \f$, - 53, 1023, -1022) + 53, -1022, 1023) static inline int cpf_trunc(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(closest integer (with round-to-nearest), - \f$ X_i = \mathrm{round}(A_i) \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{round}(A_i) \f$, 53, -1022, 1023) static inline int cpf_round(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate(closest integer (with round-to-nearest), - \f$ X_i = \mathrm{round}(A_i) \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{round}(A_i) \f$, 53, -1022, 1023) static inline int cpf_lround(long *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_univariate_nobitflip(closest integer (with round-to-nearest), - \f$ X_i = \mathrm{round}(A_i) \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{round}(A_i) \f$, 53, -1022, 1023) static inline int cpf_llround(long long *X, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_rint(53, 1023, -1022) +doc_cpf_rint(53, -1022, 1023) static inline int cpf_rint(double *X, int *exception, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_rint(53, 1023, -1022) +doc_cpf_rint(53, -1022, 1023) static inline int cpf_lrint(long *X, int *exception, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_rint(53, 1023, -1022) +doc_cpf_rint(53, -1022, 1023) static inline int cpf_llrint(long long *X, int *exception, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_nearbyint(53, 1023, -1022) +doc_cpf_nearbyint(53, -1022, 1023) static inline int cpf_nearbyint(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(remainder of the floating point division, \f$ X_i = A_i^2 - k \times B_i \f$ for largest \f$ k \f$ such that \f$ k \times B_i < A_i \f$, - 53, 1023, -1022) + 53, -1022, 1023) static inline int cpf_remainder(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); -doc_cpf_remquo(53, 1023, -1022) +doc_cpf_remquo(53, -1022, 1023) static inline int cpf_remquo(double *X, int *quot, const double *A, const double *B, const size_t numelem, optstruct *fpopts); /* Floating-point manipulation functions. */ doc_cpf_bivariate(number from magnitude and sign, - \f$ X_i = \mathrm{sign}(A_i) * abs(B_i) \f$, 53, 1023, -1022) + \f$ X_i = \mathrm{sign}(A_i) * abs(B_i) \f$, 53, -1022, 1023) static inline int cpf_copysign(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(next floating-point number in specified direction, the floating-point number closest to \f$ A_i \f$ in the - direction of \f$ B_i \f$, 53, 1023, -1022) + direction of \f$ B_i \f$, 53, -1022, 1023) static inline int cpf_nextafter(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(next floating-point number in specified direction, the floating-point number closest to \f$ A_i \f$ in the - direction of \f$ B_i \f$, 53, 1023, -1022) + direction of \f$ B_i \f$, 53, -1022, 1023) static inline int cpf_nexttoward(double *X, const double *A, const long double *B, const size_t numelem, optstruct *fpopts); /* Minimum, maximum, difference functions. */ doc_cpf_bivariate(positive difference, \f$ X_i = \lvert A_i \rvert - B_i \f$, - 53, 1023, -1022) + 53, -1022, 1023) static inline int cpf_fdim(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(element-wise maximum, \f$ X_i = \mathrm{max}(A_i, B_i) \f$, - 53, 1023, -1022) + 53, -1022, 1023) static inline int cpf_fmax(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); doc_cpf_bivariate(element-wise minimum, \f$ X_i = \mathrm{min}(A_i, B_i) \f$, - 53, 1023, -1022) + 53, -1022, 1023) static inline int cpf_fmin(double *X, const double *A, const double *B, const size_t numelem, optstruct *fpopts); /* Classification. */ -doc_cpf_fpclassify(53, 1023, -1022) +doc_cpf_fpclassify(53, -1022, 1023) static inline int cpf_fpclassify(int *r, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_isfun(finite, 53, 1023, -1022) +doc_cpf_isfun(finite, 53, -1022, 1023) static inline int cpf_isfinite(int *r, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_isfun(infinite, 53, 1023, -1022) +doc_cpf_isfun(infinite, 53, -1022, 1023) static inline int cpf_isinf(int *r, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_isfun(not a number, 53, 1023, -1022) +doc_cpf_isfun(not a number, 53, -1022, 1023) static inline int cpf_isnan(int *r, const double *A, const size_t numelem, optstruct *fpopts); -doc_cpf_isfun(normal, 53, 1023, -1022) +doc_cpf_isfun(normal, 53, -1022, 1023) static inline int cpf_isnormal(int *r, const double *A, const size_t numelem, optstruct *fpopts); /* Other functions. */ -doc_cpf_univariate(absolute value, \f$ X_i = \lvert A_i \rvert \f$, 53, 1023, -1022) +doc_cpf_univariate(absolute value, \f$ X_i = \lvert A_i \rvert \f$, 53, -1022, 1023) static inline int cpf_fabs(double *X, const double *A, const size_t numelem, optstruct *fpopts); doc_cpf_trivariate(fused multiply-add , \f$ X_i = A_i \times B_i + C_i \f$, - 53, 1023, -1022) + 53, -1022, 1023) static inline int cpf_fma(double *X, const double *A, const double *B, const double *C, const size_t numelem, optstruct *fpopts); diff --git a/src/cpfloat_docmacros.h b/src/cpfloat_docmacros.h index 7104e1f..7c8212a 100644 --- a/src/cpfloat_docmacros.h +++ b/src/cpfloat_docmacros.h @@ -4,7 +4,7 @@ #ifndef _CPFLOAT_DOCMACROS_ #define _CPFLOAT_DOCMACROS_ -#define doc_cpfloat_validate_optstruct(FPTYPE, PMIN, PMAX, EMAX, EMIN) \ +#define doc_cpfloat_validate_optstruct(FPTYPE, PMIN, PMAX, EMIN, EMAX) \ /** \ @brief Validate fields of @ref optstruct struct for `FPTYPE` storage format. \ \ @@ -23,8 +23,6 @@ Possible return values are: \ \li @b -4 The rounding mode specified in @p fpopts->round does not correspond \ to a valid choice, thus no rounding will be performed. \ - \li @b -3 The required minimum exponent in @p fpopts->emin is larger than \ - EMIN, the largest possible exponent for a variable of type `FPTYPE`. \ \li @b -2 The required number of digits in @p fpopts->precision is between \ PMIN and PMAX inclusive, which might cause double rounding if round-to-\ nearest is used. \ @@ -33,7 +31,9 @@ \li @b 0 All the parameters in @p fpopts are valid. \ \li @b 2 The required number of digits in @p fpopts->precision is larger \ than PMAX, the number of significant digits in a variable of type `FPTYPE`. \ - \li @b 3 The required maximum exponent in @p fpopts->emax is larger than \ + \li @b 3 The required minimum exponent in @p fpopts->emin is larger than \ + EMIN, the largest possible exponent for a variable of type `FPTYPE`, or \ + the required maximum exponent in @p fpopts->emax is larger than \ EMAX, the largest possible exponent for a variable of type `FPTYPE`. \ \li @b 5 The value of @p fpopts->flip indicates that soft errors should be \ introduced, but @p fpopts->p is not a real number between 0 and 1 and thus \ @@ -45,7 +45,7 @@ given in the list above. \ */ -#define doc_cpfloat(FPTYPE, PMAX, EMAX, EMIN) \ +#define doc_cpfloat(FPTYPE, PMAX, EMIN, EMAX) \ /** \ @brief Round `FPTYPE` array to lower precision. \ \ @@ -65,11 +65,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_univariate(MATHFUN, FUNSTRING, PMAX, EMAX, EMIN) \ +#define doc_cpf_univariate(MATHFUN, FUNSTRING, PMAX, EMIN, EMAX) \ /** \ @brief Compute MATHFUN rounded to lower precision. \ \ @@ -89,11 +89,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_univariate_nobitflip(MATHFUN, FUNSTRING, PMAX, EMAX, EMIN) \ +#define doc_cpf_univariate_nobitflip(MATHFUN, FUNSTRING, PMAX, EMIN, EMAX) \ /** \ @brief Compute MATHFUN in lower precision. \ \ @@ -112,11 +112,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_bivariate(MATHFUN, FUNSTRING, PMAX, EMAX, EMIN) \ +#define doc_cpf_bivariate(MATHFUN, FUNSTRING, PMAX, EMIN, EMAX) \ /** \ @brief Compute MATHFUN in lower precision. \ \ @@ -137,11 +137,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_trivariate(MATHFUN, FUNSTRING, PMAX, EMAX, EMIN) \ +#define doc_cpf_trivariate(MATHFUN, FUNSTRING, PMAX, EMIN, EMAX) \ /** \ @brief Compute MATHFUN in lower precision. \ \ @@ -163,11 +163,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_frexp(PMAX, EMAX, EMIN) \ +#define doc_cpf_frexp(PMAX, EMIN, EMAX) \ /** \ @brief Exponent and normalized fraction of rounded floating-point number. \ \ @@ -195,11 +195,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_scaling(BASE, PMAX, EMAX, EMIN) \ +#define doc_cpf_scaling(BASE, PMAX, EMIN, EMAX) \ /** \ @brief Scale number by power of BASE in lower precision. \ \ @@ -220,11 +220,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_modf(PMAX, EMAX, EMIN) \ +#define doc_cpf_modf(PMAX, EMIN, EMAX) \ /** \ @brief Compute integral and fractional part. \ \ @@ -246,11 +246,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_ilogb(PMAX, EMAX, EMIN) \ +#define doc_cpf_ilogb(PMAX, EMIN, EMAX) \ /** \ @brief Compute integral part of the logarithm of the absolute value. \ \ @@ -271,11 +271,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_rint(PMAX, EMAX, EMIN) \ +#define doc_cpf_rint(PMAX, EMIN, EMAX) \ /** \ @brief Compute the closest integer with specified rounding mode. \ \ @@ -297,11 +297,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_nearbyint(PMAX, EMAX, EMIN) \ +#define doc_cpf_nearbyint(PMAX, EMIN, EMAX) \ /** \ @brief Compute the closest integer with specified rounding mode. \ \ @@ -321,12 +321,12 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_remquo(PMAX, EMAX, EMIN) \ +#define doc_cpf_remquo(PMAX, EMIN, EMAX) \ /** \ @brief Compute reminder and quotient of rounded numbers. \ \ @@ -349,11 +349,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_fpclassify(PMAX, EMAX, EMIN) \ +#define doc_cpf_fpclassify(PMAX, EMIN, EMAX) \ /** \ @brief Categorize floating-point values. \ \ @@ -376,11 +376,11 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ -#define doc_cpf_isfun(STRING, PMAX, EMAX, EMIN) \ +#define doc_cpf_isfun(STRING, PMAX, EMIN, EMAX) \ /** \ @brief Check whether value is STRING in lower precision target format. \ \ @@ -399,8 +399,8 @@ and the probability of soft errors striking the rounded values. \ \ @return The function returns @b 1 if @p fpopts->precision is larger than \ - PMAX, @b 2 if @p fpopts->emax is larger than EMAX or fptops->emin is smaller - than EMIN, and @b 0 otherwise. \ + PMAX, @b 2 if @p fptops->emin is smaller than EMIN or fpopts->emax is larger \ + than EMAX, and @b 0 otherwise. \ */ #endif /* #ifndef _CPFLOAT_DOCMACROS_ */ diff --git a/test/cpfloat_test.m b/test/cpfloat_test.m index beef168..abef1f8 100644 --- a/test/cpfloat_test.m +++ b/test/cpfloat_test.m @@ -94,21 +94,21 @@ [c,options] = cpfloat(pi,fp); assert_eq(options.format,'d') assert_eq(options.subnormal,1) - assert_eq(options.params, [53 1023 -1022]) + assert_eq(options.params, [53 -1022 1023]) [~,fp] = cpfloat; assert_eq(fp.format,'d') assert_eq(fp.subnormal,1) - assert_eq(fp.params, [53 1023 -1022]) + assert_eq(fp.params, [53 -1022 1023]) clear fp fp.format = 'bfloat16'; [c,options] = cpfloat(pi,fp); assert_eq(options.format,'bfloat16') assert_eq(options.subnormal,0) - assert_eq(options.params, [8 127 -126]) + assert_eq(options.params, [8 -126 127]) [~,fp] = cpfloat; assert_eq(fp.format,'bfloat16') assert_eq(fp.subnormal,0) - assert_eq(fp.params, [8 127 -126]) + assert_eq(fp.params, [8 -126 127]) clear cpfloat [~,fp] = cpfloat; @@ -137,7 +137,7 @@ A2 = hilb(6); C = cpfloat(A2); options.format = 'c'; - options.params = [8 127 -126]; % bfloat16 + options.params = [8 -126 127]; % bfloat16 C1 = cpfloat(A,options); assert_eq(A,C1); C2 = cpfloat(B,options); @@ -146,7 +146,7 @@ clear options options.format = 'c'; - options.params = [11 15 -14]; % h + options.params = [11 -14 15]; % h options2.format = 'h'; A = hilb(6); [X1,opt] = cpfloat(A,options); @@ -191,8 +191,8 @@ [u,xmins,xmin,xmax,p,emins,emin,emax] = float_params('q43'); options.format = 'E4M3'; % Modification for OCP compliant q43. - emax = 8; % Previously thought to be 7 emin = -6; % Previously thought to be 1-emax=-7. + emax = 8; % Previously thought to be 7 emins = emin + 1 - p; % Exponent of smallest subnormal number. xmins = 2^emins; xmin = 2^emin; @@ -571,7 +571,7 @@ assert_eq(cd,double(cs)); options.format = 'c'; - options.params = [11 5 -4]; + options.params = [11 -4 5]; temp1 = cpfloat(single(pi),options); options.format = 'h'; options = rmfield(options, 'params'); @@ -602,14 +602,14 @@ temp = 0; try options.format = 'c'; - options.params = [12 5 -4]; + options.params = [12 -4 5]; temp = cpfloat(single(pi),options); % Error - double rounding! catch end assert_eq(temp,0) try options.format = 'c'; - options.params = [26 9 -8]; + options.params = [26 -8 9]; temp = cpfloat(pi,options); % Error - double rounding! catch end