Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Play JSON codecs #37

Merged
merged 11 commits into from
Feb 11, 2019
Merged

Play JSON codecs #37

merged 11 commits into from
Feb 11, 2019

Conversation

vil1
Copy link
Member

@vil1 vil1 commented Jan 13, 2019

This PR adds interpreters for (en|de)coders to/from the Play! JSON AST in a playJson module, and is therefore part of #32.

More specifically, it introduces interpreters for play.api.libs.json.Reads and play.api.libs.json.Writes.

JSON having no standard support for mere products (aka tuples), I had to make a decision about how we represent such mere products. At the same time, I had to devise a way to handle differently products that are part of a Record (which must be represented as flat JsObjects) and "mere products" (represented as I arbitrarily decided).

Solving that problem made me introduce HCoalgebra, hyloNT and HEnvT.

  • HCoalgebra is the dual of HAlgebra, ie. a natural transformation F ~> S[F, ?]
  • hyloNT is a recursion scheme that uses both an HCoalgebra and HAlgebra. It builds up a tree (here, some schema flavour) using the HCoalgebra and then collapses if (into a target functor F) using the HAlgebra.
  • HEnvT is a mean for labelling nodes of a tree (here a schema) with values of some given type, while retaining the structure of that tree (schema).

These three constructs are used to derive Reads and Writes from a schema.

First I use an HCoalgebra to "push-down" information (using HEnvT) so that every :*: node "knows" if it's part of a Record or not, and then I use this information in the HAlgebra to decide how to render this particular :*: node.

The :*: nodes that are under a Record node are rendered as a single JsObject: we know that these nodes contain only ProductTerms which are rendered as JsObject with a single field, these JsObjects are then merged to form the record's representation.

Other :*: node are considered as tuples, which are (debatably) rendered as { "_1": "foo", "_2": { "_1": "bar", "_2": ...}}.

Tests ensure that the Reads and Writes derived from a given schema are symmetrical, ie. reads(writes(data)) == data.

Copy link
Collaborator

@GrafBlutwurst GrafBlutwurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow the annotation logic you employ during the coalgebra. you start out with a false, a record causes any further nodes to carry along a true, a :*: passes false to left and carries along on the right and any other just carries the flag along.

but shouldn't a record pass a true to all immediate children and in any other case pass a false?

HEnvT(
x,
:*:(
ascribeWith(false)(left),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you ascribing left always with false?

Writes(
pair =>
(left.writes(pair._1), right.writes(pair._2)) match {
case (l @ JsObject(_), r @ JsObject(_)) => l ++ r
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this collide if both terms have the same productTermID ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, and we currently don't ensure that such "absurd" schema (having multiple fields with the same name) cannot be constructed.

@vil1
Copy link
Member Author

vil1 commented Jan 14, 2019

I don't quite follow the annotation logic you employ during the coalgebra. you start out with a false, a record causes any further nodes to carry along a true, a :*: passes false to left and carries along on the right and any other just carries the flag along.

but shouldn't a record pass a true to all immediate children and in any other case pass a false?

This needs clarification indeed.

What I want is to label with true all the "chain" of :*: that are under a Record, but not their children.

So I schematically (no pun intended) end up with something like:

false -> Record(
  true -> :*:(
    false -> x // <- is guarantied to be a ProductTerm,
    true -> :*:(
       false -> y,
       true -> z  // also guarantied to be a ProductTerm
    )
  )
)

The "last child" (here z) ends up labelled with true but it doesn't matter (it is guarantied to be a ProductTerm and I don't use the label for rendering those).

Does that make more sense?

.travis.yml Outdated
@@ -7,5 +7,5 @@ jdk:
before_install:
- export PATH=${PATH}:./vendor/bundle
script:
- sbt ++$TRAVIS_SCALA_VERSION scalafmtCheck test:scalafmtCheck scalafmtSbtCheck "project core" test:run "project scalacheck" test:run
- sbt ++$TRAVIS_SCALA_VERSION scalafmtCheck test:scalafmtCheck scalafmtSbtCheck "project core" test:run "project scalacheck" test:run "project playJson" test:run
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of defining a root module aggregating all the modules?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should have thought about it earlier, (I mean, we do have a root module aggregating all other modules already).
Anyway, this will be indirectly fixed when I update this PR to accommodate the changes introduced by #39.

val alg = new (HEnvT[Boolean, RSchema, Writes, ?] ~> Writes) {

def apply[A](env: HEnvT[Boolean, RSchema, Writes, A]): Writes[A] = env.fa match {
case One() => Writes(_ => JsNull)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would also be possible to not emit optional fields that have the None value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely sure it would be impossible, but I'm unable to devise a way to do so (for now).

That's because of the encoding of options as a mere Iso on top of A \/ Unit. Not emitting fields valued as None would therefore require to find a way to process x :+: y differently wether y is One() or something else.

So I guess your remark makes another case for adding an Optional member to the Schema GADT.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess your remark makes another case for adding an Optional member to the Schema GADT.

That’s what I also think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was saying in my PR regarding A \/ Unit can lead to an ugly representation. The Optional member for the GADT should be trivial but we should take care we don't pollute the GADT too much with such special cases. This brings us full circle back to the discussion in gitter about effects in schemas 😁

},
a => JsSuccess(-\/(a))
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to directly choose the correct (left or right) decoder based on its label?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, at that exact point it wouldn't, because a :+: node has no clue about what its children are "looking for" (and in the case of a Union, the label is handled by the :+:'s children).
But that's a good lead for future optimisation.

.fold(
el =>
right.reads(json).map(\/-.apply) match {
case JsError(er) => JsError(JsError.merge(el, er))
Copy link
Collaborator

@julienrf julienrf Jan 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why merge the errors of both alternatives?

Given the Schema[Role] defined in the tests, I would expect the following behavior from decoders:

  • { "user": {} } would be decoded as a failure with the following (single) error: “Missing field ’active’” ;
  • { "moderator": {} } would be decoded as a failure with the following error: “Unknown alternative label: 'moderator'. Valid labels are: 'user', 'admin'.” ;
  • {} would be decoded as a failure with the following error: “Missing alternative label”.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that error management isn't completely satisfactory.

This is due to:

  1. the fact that (as you pointed out in another comment) we don't leverage our encoding for unions, which wraps branches in an object with a single field whose name is the branch's label (mostly because we're unable to do so ATM)
  2. the semantics of JsError. Getting a JsError(Seq(x, y, z)) means "My unfulfilled expectations are x and y and z". But in the case of a union, we would need a "or" semantic.

But know that I've written it down, I realise that 2. is a consequence of 1. If we find a way to properly leverage our encoding for branches, we should be able to get better error messages.

I'm not sure we can/want to satisfy your second point though (detecting "unknown branches")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can/want to satisfy your second point though (detecting "unknown branches")

At least, just reporting “unknown alternative: 'moderator'”, would be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be doable.

case SeqSchema(elem) =>
Reads {
case JsArray(elems) =>
elems.toList.traverse(elem.reads _)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don’t we need to wrap with undefinedAsNull here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I guess that if we add Optional as an AST node we won’t need undefinedAsNull anymore :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept thinking about it, and I had to sadly remove that GIF, because I think you're wrong after all.

Once you "get" to a SeqSchema, you're necessarily "inside" a field's value, so the udefinedAsNull part will be taken care of by the ProductTerm above.

case One() => (_ => "null")
}
): RInterpreter[Encoder] =
new Interpreter[R.Prim, R.SumTermId, R.ProductTermId, Encoder] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me a bit heavy to always carry these type parameters everywhere. Why not use abstract type members instead? I see below that you defined an RInterpreter type alias to alleviate the syntactic noise but I think it would be simpler to use type members everywhere instead of type parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to get #50 merged before this PR, that should enable me to make type signatures much more concise in this PR.

encloseInBraces.compose(u.choices).compose(u.iso.reverseGet)
case SumTerm(id, base) => makeField(branchLabel(id)).compose(base)
case One() => (_ => "null")
}
Copy link
Collaborator

@julienrf julienrf Jan 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to this PR but what is the difference between :*:, Record and ProductTerm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:*:(a, b) represents a (A, B), ProductTerm(id, a) represents a "field" with name id and schema a (it doesn't make much sense outside of a record).
A Record contains a schema that is guarantied to be composed of a "tree of :*:" whose all left children (as well as the right child of the deepest :*:) are ProductTerms.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Record contains a schema that is guarantied to be composed of a "tree of :*:" whose all left children (as well as the right child of the deepest :*:) are ProductTerms.

Could we see this guarantee in the types? Currently the type of a record’s fields is F[_].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's doable. We need fields to be a F[_] to use recursion schemes (during a cataNT for example, F is bound to the "target functor" the schema is folded to).

@vil1 vil1 force-pushed the play-json branch 2 times, most recently from eb4ce54 to 892e86f Compare January 14, 2019 22:44
@vil1 vil1 merged commit 9b2eef8 into spartanz:prototyping Feb 11, 2019
@vil1 vil1 deleted the play-json branch February 11, 2019 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants