-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement floats #65
Draft
ruuda
wants to merge
29
commits into
master
Choose a base branch
from
type-float
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Implement floats #65
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The one thing that prevents that right now is floats, and the fuzzer discovered it within a few seconds: ╭──────╴ Opcode (hex) │ ╭───╴ Argument (hex) │ │ ╭╴ Operation, argument (decimal) 26 03 ExprPushInput, 3 take_str, 3 → "4e2" e6 01 ModeJsonSuperset, 1 EvalJsonSuperset --> 4e2
It doesn't add functionality to deal with exponents or decimal points, it only moves stuff into a function. The test currently has the wrong expectation with an error in there.
Let's finally add the float type. After some deliberation, I think I want to represent them as decimals in scientific notation internally, unless you do a division at which point they will turn into a rational. I can turn the rational back into a decimal (which may be lossy) in order to format it as a string; the rational itself is an implementation detail, but it does allow us to get certain computations exactly right where a float would not.
It contains the Decimal type after all, and right now I don't have anything named "float" either way.
Why the naming discrepancy? My thinking is that I want both Decimal and Rational as internal representation, so Value can be decimal or rational. But what is a good name for "decimal or rational"? I can think of "number", but I want to reserve that for the supertype of int and this one. "Rational" would be technically correct because decimals are also rationals, just with a power of 10 denominator. But then what if I put a sqrt function in there, or an approximation of pi? Sure technically they are approximations, but I think this would _also_ be weird. So let's go with float ... Also, this is what Cue calls them so there is precedent.
RCL rejects 9223372036854775807.576460752303423487 with an overflow error, but I think that is fine, I don't want to lose precision on inputs.
This adds the type and the relations, but it doesn't add all the tests, fuzzer dictionary, documentation, etc. I'll do all of that later as part of this same feature branch.
This should get us one step closer to json compatibility, maybe even the final step.
The choice I went with is to have a 16-bit exponent, which gives RCL's float/decimal type more range than a regular f64. Now the fuzzer can generate an input with a large exponent, and RCL will happily echo it, and it's technically syntactically valid json, but Serde rejects it with "number out of range" (in the same way that RCL rejects some numbers as overflow). So add an exception for this mismatch.
Now that Rustfmt formatted them tall anyway, it's probably best to keep them sorted.
This removes one case of incompatibility with Serde. If you write a float literal that is too precise to be represented exactly, then we now silently round it rather than treating it as an overflow error. I think this is acceptable because if you are in the case where you care about numbers to 19 significant digits then probably RCL is not the best tool for what you are doing, but the case where we encounter some arbitrary json that we want to query with "rcl jq" and it happens to have some humongous float in it, that is probably more likely. Python handles float literals in this way too so I think it's okay.
This is tricky stuff. Better write a lot of fuzz tests for this later.
Ugh this is such a rabbit hole, and it looks like I am putting it full of special cases, I have a feeling it could be way more elegant.
This adds back the exception that was removed by allowing float parsing imprecision, though in a more limited form initially because it only affected exponents. But after running the fuzzer for a bit longer, it also affects large integers, so we are back to the start, overflow is just an intentional incompatibility.
This was a to do, the fuzzer found it pretty quickly. The exact case it found that triggered it was this: {5.001,5.001e97,}
RCL can handle larger exponents on floats, we have to admit that then.
Surrogate pairs are not supported by RCL on purpose, so when that can be parsed by Serde but is rejected by RCL, we shouldn't fail the fuzzer on it.
This overflow was discovered by the fuzz_source fuzzer. The test currently fails.
See also the parent commit that adds the test. The test now passes.
The golden test doesn't hurt, but this is one of the few places where we can do a reasonable unit test as well.
This is only the start, but let's verify Decimal::cmp against f64::cmp. It instantly finds an input where they disagree: Compare { a: NormalF64( -0.16406250000007813, ), b: NormalF64( 0.0, ), }
It turns out that the case that the fuzzer found was already one that my past self marked as to do.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a work in progress, I’ll update the description later.
Float
,Num
)Open questions
How should (in)equality between numbers interact with their identity? If we allow
10 < 11.0
to typecheck and produce the result you’d expect, then we should also allow10 == 10.0
, and{10, 10.0}
But what should they produce? Numerically they are equal, but asruntime::Value
, they are different values. I see two possible ways forward:Int
andFloat
further. Possibly only keep aNum
type after all and everything is a float.10
and10.0
are equal,{10, 10.0} == {10}
. This probably means that we print all numbers that are ints without suffix, which means that processing json is no longer “lossless” in the sense that we'd convert[10.0, 10]
into[10, 10]
. That sounds like something that will cause problems, hmmm ...Int
andFloat
to mix so implicitly.10 < 11.0
is a type error.10 == 10.0
is false for the same reason that10 == "10"
is false. (Which makes sense and is consistent in one way, but may be surprising in another.){10, 10.0}
is a set with two elements.I’m leaning towards the second one, but on the other hand, am I making my own life difficult? Why have both
Int
andFloat
? I think primarily to add additional type-safety in schemas (for things like ports, or counts, non-integers make no sense). But if that’s the argument, then should there also beUInt
?Uint16
?Fin n
? ...What does Python do here? It has different runtime representations, but considers the values equal, and for sets the first value is preserved:
Cue treats
10.0
as not an int, even though it is equal to the integer10
:Thinking out loud, a possible solution may be:
Int
andFloat
(or maybeFloat
should be calledNum
), andInt
is a subtype of the more general one.Decimal
.Decimal
should preserve some presentation information that does not affect its identity, in particular the number of decimals. This way10.0
and10
can both be represented, and preserved exactly, for lossless json processing.Num
and expect anInt
, we need to do more than just a type check; we should also change the presentation attribute to render it with zero decimals. This feels ad-hoc though. An alternative is to do what Cue does and verify the presentation attribute as part of the type check.