AST format #9
Replies: 12 comments 5 replies
-
This sounds good to me, agreed focusing on semantics is good and makes the number of nodes smaller. What's
Why not a simple
Ah right, you'd swap the if_true/if_false fields then for converting an
I think it'd be nice if the ReadXVariable and WriteXVariable have clearly corresponding names (I'm not a fan of InstVarNode/InstAsgnNode in JRuby or lvar/lvasgn in |
Beta Was this translation helpful? Give feedback.
-
Just adding in what JRuby does for its AST. It is very similar to MRI but a bit more along your suggestions with respect to Nodes. Seems like beyond clarity perhaps alignment should be considered here? If adding an extra boolean makes a particular node too wide to fit into, let's say, a clean arena allocation scheme then perhaps it makes sense to break some of these into distinct nodes.
JRuby has these broken up by type (and so does MRI).
we have an OperatorCallNode used for both uni and bin ops. It just extends CallNode. Not really sure from a semantic standpoint why these are all just not CallNode?
This is what we do.
ditto (FloatNode)
We mix runtime with parser for this and it is a RubySymbol (yeah eager symbols)
This is just IfNode (all conditionals are just IfNode).
same
same
we have OpElementAsgnNode but we create additional types for OpAsgnAndNode and OpAsgnOrNode. Seems this could all just be a single node. JRuby is old and our AST also used to be our interpreter so we still have some specialization not needed.
same as MRI
same
We only use a single DotNode with an exclusive field (Range is a better name and what we use for the operand in our IR).
We have RedoNode,RetryNode, NextNode, and ReturnNode
??? From AST perspective I guess this is ListNode.
This is just IfNode (all conditionals are just IfNode).
This is just IfNode (all conditionals are just IfNode).
We have a single UntilNode
We differentiate variable types with their own nodes (e.g. LocalVariableNode)
We have a single WhileNode |
Beta Was this translation helpful? Give feedback.
-
I guess that goes back to But |
Beta Was this translation helpful? Give feedback.
-
@eregon yeah you're right. I forgot about how those worked. |
Beta Was this translation helpful? Give feedback.
-
@eregon I think we shouldn't do desugaring by default or otherwise all linters will complain by default with suggestions to replace |
Beta Was this translation helpful? Give feedback.
-
@iliabylich That's a good point. Maybe we can do desugaring as an option in the future then (not needed for the first version). It seems quite convenient for Ruby interpreters and potentially for semantic analysis/other usages too (potentially we could even desugar pattern matching which could be a pretty cool feature for Ruby interpreters). |
Beta Was this translation helpful? Give feedback.
-
Thanks friends. This is really good discussion, and very helpful for me to understand all of your various use cases. I'll try to summarize what has been discussed so far so that we can come to the same understanding. Design goalsFirst of all, I think we need to have a discussion around the overarching goal of the design of the tree. Currently, I'm seeing 3 different applications being discussed, and we need to agree on which one we're pushing for first before pushing for all of them at the same time.
All of these trees have their purposes and their tradeoffs. CSTs are going to be necessary for formatters but come at the cost of a larger number of nodes and may require more effort to compile. ASTs are going to be useful for most folks, but risk dropping information that a formatter or linter might be interested in and require them to go back to the source. The higher-level AST risks dropping even more information, but is likely the easiest to compile for language implementers. The direction I'd like to go in is to build as abstract of a tree as possible while retaining all of the necessary information from the source such that consumers of this library don't need to go back into the source to gather more information. (As just an example of that, currently in the Ideally, I'd like to provide APIs that also provide the desugaring that has been mentioned. I think that the more logic we can move into here and out of the various implementations and tools, the easier it's going to be moving forward for all of the tooling. I know @eregon even mentioned at one point desugaring pattern matching like ruby-next does, which could be very interesting. But for the first pass, I honestly don't even want to touch this until we get it completely functional and compatible. Current nodesNodes we're not changingWith all of that in mind, here are the nodes that I believe we're all agreed on should stay as they are:
Nodes we're considering collapsingNext, here are the nodes that you are suggesting we collapse into other nodes:
Other nodesFinally, here are the various other nodes:
@iliabylich I love the layout of that file, it makes a lot of sense. Overall I'm very happy to take inspiration from it. Thanks for the detailed issue. Let me know what you think of the points I've made above. |
Beta Was this translation helpful? Give feedback.
-
Wow, that's a thorough and huge review of everything we wrote above, thank you.
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the summary!
👍 Agreed this seems a good first target. Maybe it's even good enough to not need the concrete syntax tree?
I'm a bit surprised about a
Wouldn't it make sense if it's a single statement/expression to just be that node? And so only a Re Re
There are some
I think these need to be separate nodes, because there are many
Mmh, but one does read or write a variable not "get a variable" (AFAIK), and it ends up reading/writing memory. Also I find it's a feature that it's not a single letter difference between them since they typically have very different logic (the Write part being much more complex often).
I think separate nodes for |
Beta Was this translation helpful? Give feedback.
-
"The direction I'd like to go in is to build as abstract of a tree as possible while retaining all of the necessary information from the source such that consumers of this library don't need to go back into the source to gather more information." Just as a show of support... fwiw more nodes for runtimes will mean more code but not really more complexity. Processing binary in addition to a call will have an extra case stmt which just calls the same compilation that we would do for a 2 argument call. So to support other use cases I do not think this is a big burden (to JRuby at least). If we are literally talking about using switches for this processing even a ternary/if will just be two labels referencing the same code (assuming they have the same field layout). As another dimension of discussion in formulation of nodes... Also when the actual data stored is looked at more I am interested in less allocations to build the tree so I also think that is a orthogonal aspect to what nodes may be present. More nodes but same-sized nodes is a virtue (tree navigation of array). I know the goal is to be approximately the same speed as MRI but I suspect that can be improved upon. Analysis of nodes per source and a heuristic to arena alloc the right amount up front likely will be a win. |
Beta Was this translation helpful? Give feedback.
-
Okay cool, this definitely feels like we're getting somewhere. In terms of direction, it sounds like we're onboard with an AST that maintains just enough information to map back to source but not so much that we explode the number of nodes. With the additional feedback here, I'm going to re-summarize with less detail now.
On the various other questions that folks asked:
With all of this in mind, I'll get cracking on these changes. |
Beta Was this translation helpful? Give feedback.
-
Also, I'm going to convert this into a discussion, since I have a feeling this is going to stay around for a while. |
Beta Was this translation helpful? Give feedback.
-
I propose to take some in inspiration from this config. There are two things that are different from whitequark/parser and right now I think it was my mistake to introduce them:
IfMod
andIfTernary
should go away,If
node is enough to cover all cases.Apart from this it makes a lot of sense to me. Thoughts on current config:
Assignment
- in static analysis from my experience people never look for "an abstract assignment", there's always a need to find a certain type of assignment, like lvar assignment (to track local variables) or ivar assignment. I think it makes sense to introduce separate nodes for all types of assignment like it's done in whitequark/parser, ruby_parser and all parsers derived from them.Binary
- I think it's too generic and complex at the same time.2 + 2
is aBinary
, right? I think it should have the same type asfoo.bar
, because it is a method call. On the other side all types of "real" binary operators need their own node types because they have special behaviour.CharacterLiteral
- it should be just aString
, there's no different between them.FloatLiteral
- let's keep it.Identifier
- this node has no value on its own and it always requires tracking current context to understand the meaning.IfModifier
- like I described above a singleIf
node is enough. If there's a need to track a modifier version source maps can be used, but there's no different in how "normal if" works vs "if modifier". AST should represent meaning, not layout.ImaginaryLiteral
- let's keep itIntegerLiteral
- keepOperatorAssignment
- keep, there's an equivalent in whitequark/parserProgram
- ehh, we can keep it, both whitequark/parser, ruby_parser (and IIRC all parsers except parser.y/Ripper) usebegin
node for it, because there's difference in how top-level block of code is treated comparing to let's say block body orbegin..end
.RationalLiteral
- keepRange
- I think it makes sense to split it to..
and...
rangesRedo
/Retry
- keepStatements
- should be justbegin
I guessTernary
- redundant, can be replaced withIf
UnlessModifier
- redundant, can beif
with "inverted" branchesUntilModifier
- redundant, there can be a singleUntil
nodeVariableReference
- sounds a bit verbose,lvar
orlocalvariable
should be enoughWhileModifier
- redundant, there can be a singleWhile
node@kddnewton PTAL at the document that I linked above and let me know what you think
Beta Was this translation helpful? Give feedback.
All reactions