-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACP: Add floating point representation conversions #501
Comments
Having some sort of way to deconstruct and reconstruct floats would be a nice convenience for soft float work. I'm not positive that mixing classification and de/reconstruction is better than Making If there is a need for For naming I'd prefer
What exactly would this expand to? If it casts the mantissa to the float and then applies the exponent, it seems better to just provide a |
Classification into categories is part of deconstruction though. Would this method work on infinity, giving the max exponent and zero mantissa? The user would have to know to use another method to check for infinity first, or that max exponent and zero mantissa encodes infinity, either way in effect doing part of the deconstruction manually.
Yes I think an unsigned
My thinking was that the "signaling" bit would not be included in
Yes that makes sense. More useful would be to add to the exponent (rather than setting it). The function can then be called There can still be a |
This seems like a great application for arbitrary-bitwidth integers. Then the extracted mantissa and exponent can be exactly the precision encoded in the float. |
I am currently working on this as I re-implement floating |
I'm not sure we should have this level of complexity in the standard library, to combine the classification and the to/from parts operation combined into one with an enum like this. For the purposes many people might want this for, I'd be inclined to have a simpler As noted earlier in this thread, arbitrary-width integers would make this much nicer. But in the absence of that, we can pick an appropriate wider type, and truncate input values (or assert in debug mode). cc @BartMassey |
Do keep in mind that something usually has to renormalizing denormalized numbers and adjust the exponent accordingly, also add the implicit one bit to normalized numbers. Most things working with these numbers will want to do that. So some amount of classification will be done internally anyhow, even if it is then thrown away. The fn integer_decode_f32(f: f32) -> (u64, i16, i8) {
let bits: u32 = f.to_bits();
let sign: i8 = if bits >> 31 == 0 { 1 } else { -1 };
let mut exponent: i16 = ((bits >> 23) & 0xff) as i16;
let mantissa = if exponent == 0 {
(bits & 0x7fffff) << 1
} else {
(bits & 0x7fffff) | 0x800000
};
// Exponent bias + mantissa shift
exponent -= 127 + 23;
(mantissa as u64, exponent, sign)
} I'd probably go a little farther and actually normalize denorms there. I could live with this interface, though I would mildly prefer something a little more Rustic. Thoughts? |
I'd also probably left-justify the significand. It's normally what you want, I think? Also, recognizing NaN and ±Inf is not really a thing here because you have to know the exponent adjustment: that's going to have to be done outside, I think. |
Also, this design pretty much locks you into about |
Yes :) |
Also that sign calculation could be done without the test. let sign = 1 - ((bits >> 30) & 2) as i8; Might be slower, but would conserve a branch predictor. |
Alright. Give me a couple of hours, and I'll try to propose something everyone can live with and we can go from there. |
Here's what I've got so far. Still needs some work. https://github.com/BartMassey/float-parts My plan is to generalize the I think we should also provide an |
Ok, I've checked in a version that does both https://github.com/BartMassey/float-parts I gave up on the default function because the lack of concrete types in the trait definition was too painful: used a macro instead. I'll wait on review, bikeshedding, and response to my suggestions about adding the inverse function. |
Of course, this could just be concrete functions or methods on each of the |
This could be the /// Works for positive and negative integers.
impl Shl<i32> for f32 |
@BartMassey The implementation looks good, however in the libs-api meeting we thought that the API would be better as separate methods to extract the mantissa and exponent rather than a single method that returns all 3 parts (we already have ways to extract the sign bit). This would make it easier to document the behavior of each of these methods. |
@Amanieu Sounds good! Thanks much for the review. (Out of curiosity, what's the official way to collect the sign bit currently?) |
|
To be honest, I think the convenience of something like |
What would the usecase for that be? I think most times you just want to check whether you should branch based on the sign, or you want a signed |
If you're going on to reconstruct the float somehow it can be helpful. But yeah, we'll leave it for now. |
Proposal
Problem statement
There is currently no easy way to:
Even creating a power of 2 in floating point is not easy even though it is a reasonably useful operation that can be computed exactly.
powi
can't be used because it has unspecified precision.The only way is to directly deal with the internal bit representation and use
to_bits
andfrom_bits
to convert.Motivating examples or use cases
This would essentially be a replacement for
f32::classify
that keeps all the information about the number.f32::classify
can be implemented in terms ofto_repr
.Converting floating point to and from bignums (e.g.
UBig::to_f32
) could use this.Implementing
f32::div_euclid
instd
using integral division could use this.Users could use this to inspect their floating point numbers, understand their behavior, debug their code, etc.
Solution sketch
Alternatives
Existing API (
to_bits
,from_bits
) can be used to have this implemented as an external crate. It seems like it belongs tocore
because it deals with the essence of how floating point numbers are represented.impl From<f32> for FpRepresentation
could be implemented instead ofto_repr
.impl TryFrom<FpRepresentation<i32>> for f32
could be implemented instead offrom_repr_exact
.An equivalent of C functions
ldexp
andfrexp
could be implemented instead. The advantages of the proposed solution over these:frexp
returns mantissa as a floating point number but you typically want to deal with it as an integer (otherwise why convert?), which requires another conversion stepFor
ldexp
the extra conversion step (integer -> fp) is less of a problem because it might be necessary anyway when the mantissa has more bits than can be represented. For example if you want to convert a 64-bit mantissa + exponent intof32
you would have to first convert the 64-bit number intof32
either way.There could be a separate enum variant for subnormal numbers. This is an unnecessary complication. Subnormal numbers can be distinguished with the proposed API by the fact that they have a small mantissa (
mantissa.abs() < 1 << (MANTISSA_DIGITS - 1)
) and often don't need separate logic.Links and related work
internals.rust-lang.org thread
The text was updated successfully, but these errors were encountered: