-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort extensions during JSON serialization #1117
base: main
Are you sure you want to change the base?
Conversation
v2/event/event_marshal.go
Outdated
for k, v := range ext { | ||
extensionKeyValues = append(extensionKeyValues, keyValue{k, v}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing, but this makes a copy of "v", which if large, could be an issue.
What do you think about just putting the ext names into a slice, sort it, then then iterate over the slice and grab the values from the ext map? Then the only copying is the ext key names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, that actually simplified the code nicely since I didn't need this keyValue
pair type when I'm just working with the extension names.
I've updated the code accordingly
This matches Go's behavior for built-in types, which uses deterministic ordering (alphabetical for maps and by order defined in file for structs). Signed-off-by: Joseph Jon Booker <joe@neoturbine.net>
4c40d06
to
098365f
Compare
@sargas I'm not an expert with Restate, but looking at the Restate SDK/APIs why could one not wrap the creation/serialization of a CloudEvent into a I'm a bit wary introducing changes to our core marshaling logic since I don't yet understand 1/ the impact it has to other users (perhaps relying on the current non-deterministic marshaling behavior) and 2/ the performance impact of the proposed code change. |
The last part is the correct - Restate has an implicit assumption that the serialization (using
import (
"context"
"fmt"
cloudevents "github.com/cloudevents/sdk-go/v2"
"github.com/google/uuid"
"log"
"math/rand/v2"
"time"
restate "github.com/restatedev/sdk-go"
"github.com/restatedev/sdk-go/server"
)
func main() {
if err := server.NewRestate().
Bind(restate.Reflect(FirstService{})).
Bind(restate.Reflect(SecondService{})).
Start(context.Background(), ":9080"); err != nil {
log.Fatal(err)
}
}
type SecondService struct{}
func (SecondService) Process(_ restate.Context, _ cloudevents.Event) error {
return nil
}
type FirstService struct{}
func (FirstService) Handle(ctx restate.Context) error {
event, _ := restate.Run(ctx, func(ctx restate.RunContext) (cloudevents.Event, error) {
event := cloudevents.NewEvent()
event.SetID(uuid.NewString())
event.SetSource("example/uri")
event.SetType("example.type")
event.SetTime(time.Now())
event.SetExtension("a", "b")
event.SetExtension("c", "d")
event.SetExtension("e", "f")
return event, nil
})
_, _ = restate.Service[restate.Void](ctx, "SecondService", "Process").Request(event)
if rand.Float32() < 0.5 {
return nil
} else {
return fmt.Errorf("failing with purposeful error")
}
} Requests to trigger FirstService/Handle will create the The issue here is that since
Base64 decoding the "parameter" in the actual and expected journal messages shows they only differ in the order of the extensions in the JSON-serialized form of the event.
I'm completely sympathetic to being cautious here - that's part of why I worded this PR as a proposal. I don't think there's anything wrong with the existing code (a Marshaler has to create valid JSON, but it isn't documented that it needs to be deterministic). My example above can have workarounds added, like dealing with strings or []byte's instead of At the same time, something is off about
I'm not sure anyone would expect this, let alone rely on it. I'd expect the opposite is more common (people or software like Restate that assumes it would be deterministic).
I don't have an answer to this, but I'll note that this PR still keeps the marshaling simpler then the code used in |
Thx for the example, I understand the ask better now.
I disagree, for 1/ the JSON spec is explicit about the unordered behavior of
I don't want to be pedantic, but not only have I been struck by Hyrum's Law couple times, but I see it in my daily job as well. That's why I'm cautious about these changes - not to block them with this excuse, but to raise awareness of the implications if we introduce them because not everyone might be aware that this can be considered a breaking change for (some) users. Also, as per Go's behavior, developers often "exploit" such randomness e.g., in tests (slightly different scenario, but inherent to our discussion here).
Will review your code again now that I have a better understanding of the issue. I'd thought that introducing ordered behavior requires additional components, such as slices/arrays which allocate and therefore natively have performance implications. tl;dr I absolutely agree that the current standard behavior in how Go/CE marshals to JSON is an issue, perhaps a wider issue as users might require more deterministic behavior. I just want to understand and discuss the implication if we'd change this behavior for all users. An alternative might be to give users control over the marshaller :) WDYT? cc/ @duglin |
My perspective...
Now... if someone were to suggest that we demand incoming CE's be sorted.... then that's a different story, obviously. |
I will add though... I think Restate is wrong in its processing and in MOST other cases I would tell people to yell at them to fix their code because as @embano1 said... the JSON spec doesn't mandate that things be ordered. I'm just willing to let this ordering happen because I see a potential benefit and no harm to it. |
Yeah, I don't think restate is right here either, although I understand their logic in just comparing the opaque serialized output - my companion issue on their end asks about adding some kind of "mime type" so these journal entries can be compared in a more sophisticated way then just byte-by-byte.
If clients are depending on extensions being randomly ordered when JSON-serialized despite no documentation saying to expect that, that sounds terrifying to make any changes 😨 I'm not aware of a scenario where a developer would depend on this, but can't rule it out.
It does, sorry for the confusion - I was comparing to
So this PR does have some kind of performance cost for sorting extension names and memory cost (currently 16kb per extension, plus overhead for GC?). My point was that the impact is less than what work is normally done performing JSON serialization for maps.
Like some kind of static global? That might mess up users' code and tests more. The |
Before you wrote this, I also dug into the current implementation and I thought for extensions we'd use I ran the existing benchmarks to get a better understanding on the impact: goos: darwin
goarch: arm64
pkg: github.com/cloudevents/sdk-go/v2/event
cpu: Apple M1 Pro
│ current.txt │ new.txt │
│ sec/op │ sec/op vs base │
Marshal/struct_data_v0.3-10 3.553µ ± 4% 3.611µ ± 7% ~ (p=0.247 n=10)
Marshal/nil_data_v0.3-10 3.308µ ± 7% 3.126µ ± 6% -5.52% (p=0.003 n=10)
Marshal/string_data_v0.3-10 3.544µ ± 4% 3.659µ ± 7% ~ (p=0.210 n=10)
Marshal/struct_data_v1.0-10 3.553µ ± 4% 3.557µ ± 4% ~ (p=0.853 n=10)
Marshal/nil_data_v1.0-10 3.345µ ± 15% 3.588µ ± 11% ~ (p=0.143 n=10)
Marshal/string_data_v1.0-10 3.406µ ± 2% 3.603µ ± 6% +5.77% (p=0.019 n=10)
Marshal/base64_json_encoded_data_v1.0-10 1.895µ ± 3% 1.850µ ± 6% ~ (p=0.119 n=10)
Marshal/number_data_v1.0-10 1.960µ ± 8% 1.870µ ± 1% -4.57% (p=0.000 n=10)
Unmarshal/data,_attributes_and_extensions_and_specversion_with_struct_data_v0.3-10 4.359µ ± 5% 4.366µ ± 2% ~ (p=0.837 n=10)
Unmarshal/string_data_v1.0-10 4.202µ ± 1% 4.363µ ± 14% +3.83% (p=0.000 n=10)
Unmarshal/nil_data_v1.0-10 4.058µ ± 4% 4.067µ ± 10% ~ (p=0.353 n=10)
Unmarshal/struct_data_v0.3-10 4.343µ ± 15% 4.368µ ± 9% ~ (p=0.631 n=10)
Unmarshal/nil_data_v0.3-10 3.812µ ± 4% 3.884µ ± 2% ~ (p=0.109 n=10)
Unmarshal/data,_attributes_and_extensions_and_specversion_with_struct_data_v1.0-10 4.537µ ± 9% 4.609µ ± 5% ~ (p=0.469 n=10)
Unmarshal/base64_json_encoded_data_v1.0-10 2.725µ ± 4% 2.695µ ± 2% ~ (p=0.325 n=10)
Unmarshal/base64_xml_encoded_data_v1.0-10 2.837µ ± 4% 2.794µ ± 1% -1.52% (p=0.003 n=10)
Unmarshal/string_data_v0.3-10 4.203µ ± 5% 4.127µ ± 4% ~ (p=0.143 n=10)
Unmarshal/struct_data_v1.0-10 4.600µ ± 12% 4.338µ ± 3% -5.70% (p=0.014 n=10)
geomean 3.466µ 3.471µ +0.15%
│ current.txt │ new.txt │
│ B/op │ B/op vs base │
Marshal/struct_data_v0.3-10 2.228Ki ± 0% 2.465Ki ± 0% +10.65% (p=0.000 n=10)
Marshal/nil_data_v0.3-10 2.246Ki ± 0% 1.995Ki ± 0% -11.15% (p=0.000 n=10)
Marshal/string_data_v0.3-10 2.162Ki ± 0% 2.375Ki ± 0% +9.85% (p=0.000 n=10)
Marshal/struct_data_v1.0-10 2.255Ki ± 0% 2.465Ki ± 0% +9.31% (p=0.000 n=10)
Marshal/nil_data_v1.0-10 2.247Ki ± 0% 1.995Ki ± 0% -11.21% (p=0.000 n=10)
Marshal/string_data_v1.0-10 2.216Ki ± 0% 2.418Ki ± 0% +9.12% (p=0.000 n=10)
Marshal/base64_json_encoded_data_v1.0-10 801.0 ± 0% 801.0 ± 0% ~ (p=1.000 n=10) ¹
Marshal/number_data_v1.0-10 1.563Ki ± 0% 1.563Ki ± 0% ~ (p=1.000 n=10)
Unmarshal/data,_attributes_and_extensions_and_specversion_with_struct_data_v0.3-10 1.758Ki ± 0% 1.758Ki ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/string_data_v1.0-10 1.719Ki ± 0% 1.719Ki ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/nil_data_v1.0-10 1.680Ki ± 0% 1.680Ki ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/struct_data_v0.3-10 1.734Ki ± 0% 1.734Ki ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/nil_data_v0.3-10 1.680Ki ± 0% 1.680Ki ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/data,_attributes_and_extensions_and_specversion_with_struct_data_v1.0-10 1.758Ki ± 0% 1.758Ki ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/base64_json_encoded_data_v1.0-10 1.203Ki ± 0% 1.203Ki ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/base64_xml_encoded_data_v1.0-10 1.227Ki ± 0% 1.227Ki ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/string_data_v0.3-10 1.719Ki ± 0% 1.719Ki ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/struct_data_v1.0-10 1.734Ki ± 0% 1.734Ki ± 0% ~ (p=1.000 n=10) ¹
geomean 1.718Ki 1.731Ki +0.75%
¹ all samples are equal
│ current.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
Marshal/struct_data_v0.3-10 19.00 ± 0% 21.00 ± 0% +10.53% (p=0.000 n=10)
Marshal/nil_data_v0.3-10 17.00 ± 0% 18.00 ± 0% +5.88% (p=0.000 n=10)
Marshal/string_data_v0.3-10 20.00 ± 0% 23.00 ± 0% +15.00% (p=0.000 n=10)
Marshal/struct_data_v1.0-10 19.00 ± 0% 21.00 ± 0% +10.53% (p=0.000 n=10)
Marshal/nil_data_v1.0-10 17.00 ± 0% 18.00 ± 0% +5.88% (p=0.000 n=10)
Marshal/string_data_v1.0-10 19.00 ± 0% 21.00 ± 0% +10.53% (p=0.000 n=10)
Marshal/base64_json_encoded_data_v1.0-10 9.000 ± 0% 9.000 ± 0% ~ (p=1.000 n=10) ¹
Marshal/number_data_v1.0-10 11.00 ± 0% 11.00 ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/data,_attributes_and_extensions_and_specversion_with_struct_data_v0.3-10 44.00 ± 0% 44.00 ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/string_data_v1.0-10 42.00 ± 0% 42.00 ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/nil_data_v1.0-10 40.00 ± 0% 40.00 ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/struct_data_v0.3-10 43.00 ± 0% 43.00 ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/nil_data_v0.3-10 40.00 ± 0% 40.00 ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/data,_attributes_and_extensions_and_specversion_with_struct_data_v1.0-10 44.00 ± 0% 44.00 ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/base64_json_encoded_data_v1.0-10 25.00 ± 0% 25.00 ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/base64_xml_encoded_data_v1.0-10 25.00 ± 0% 25.00 ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/string_data_v0.3-10 42.00 ± 0% 42.00 ± 0% ~ (p=1.000 n=10) ¹
Unmarshal/struct_data_v1.0-10 43.00 ± 0% 43.00 ± 0% ~ (p=1.000 n=10) ¹
geomean 25.75 26.56 +3.13%
¹ all samples are equal
|
This PR proposes to make JSON serialization deterministic, i.e., always produce the same result for the same input. This matches Go's behavior for built-in types, which notably will explicitly sort fields within a map when serializing.
v2/event/event_marshal.go's
WriteJson
's only non-deterministic behavior is an iteration over a map of extensions, which is performed in random order. This PR takes a similar approach asencoding/json
, without their logic for "resolving" key names (since extension names are always strings), and without the use ofreflect.Value
.The motivation behind this change is a subtle dependence that the Restate framework has on JSON ordering being deterministic. This framework allows for code blocks that are non-deterministic (e.g., creating a CloudEvent
Event
with the current time), but then expects when retrying that the journal is "consistent" on replays before any errors. In affect, this means that normal code should be deterministic. This breaks down when a CloudEvent is passed as input to a service and serialized as JSON by default: even when theEvent
hasn't changed, the serialization will produce different bytes, causing the framework to believe the code is non-deterministic when everything was the same until the final serialization of a CloudEvent.For this framework, it is easy to work around - we need to manually convert to JSON and normalize it (e.g., by converting to a map and then back to JSON, to make use of golang's deterministic JSON serialization of maps). It is an unfortunate extra step caused by the CloudEvent SDK's use of a non-deterministic serialization and, at least in this use, unexpected difference given the behavior of the standard library.