Ensure complete coverage when cycles exist #328

hendrikvanantwerpen · 2023-10-26T10:52:17Z

WIP: So far this PR only contains a test triggering the problem.

Problem

The resolution behavior w.r.t. cycles is not always preserved when we go through partial paths compared to resolving in the graph directly.

Take the following example:

[ref "foo"] -> [push "bar"] -> [push "bar"] -> [root] -> [scope 1] -> [def "foo"]
                                                           ^   |
                                                           |   v
                                                         [pop "bar"]

If we resolve foo in the graph directly, we get the foo definition as expected. Because we resolve without a precondition variable, following the cycle is safe: we can only follow it as long as there are symbols on the stack to consume.

On the other hand, if we compute partial paths first (using defs, refs, root, and exported scopes as end points), things don't work out. The partial path from foo to root is okay. But the other path(s) would have to be from root to foo, including the cycle. One path excludes the cycle, but is not enough to resolve the foo reference later. Any path involving the cycle will be rejeected, because the partial path does have a precondition variable, so we could follow it arbitrary number of times.

(One special case is supported, when the cycle coincides exactly with an accepted path. So if scope 1 was an exported scope, the cycle would be included as a partial path. But this requires a lot of care when defining the stack graph rules to ensure this is the case.)

Goal

The overall goal of partial paths is that they are purely an optimization. Precomputing partial data should not make some solutions unavailable. This gives us the following two requirements:

If a path is produced in the full graph, it should also be produced by stitching the partial paths. Caveat: Partial path sets sometimes restrict the kinds of start or end nodes of paths, in which case this holds only for paths in the full graph that also follow that restriction.
We must ensure that we do not accidentally introduce multiple ways of stitching the same path. This could cause exponential blowup during path stitching.

Solution

How can we solve this? We'll start with a naive solution that will address the first requirement, but violate the second. Then we describe a modification that fixes that.

The idea is to implicitly split a path involving a cycle in components, and keep those components separately, instead of rejecting their compositions, as we do now. So for our example, we'll encounter the following path during partial path finding:

[root] -> [scope 1] -> [pop "bar"] -> [scope 1]

We'll decompose this into the following paths:

The prefix <%1> ($1) | [root] -> [scope 1] | <%1> ($1)
The cycle <"bar",%1> ($1) | [scope 1] -> [pop "bar"] -> [scope 1] | <%1> ($1)
The seed for possible postfixes <%1> ($1) | [scope 1] | <%1> ($1).

The prefix and the cycle itself are added to the partial path set, but not further extended. The postfix seed is not added to the partial path set itself, but extended and added to the set when it hits an accepted endpoint.

This solution fulfills requirement 1 but violates requirement 2. To see why this is the case, we have to realize that the prefix has been encountered before during path finding, and will be extended to [def "foo"]. Similarly, the new seed will also be extended to [def "foo"]. So there will be two ways to construct the path [root] -> [scope 1] -> [def "foo"]. Once directly, and once by combining the prefix and the extended seed.

To solve this, we need to ensure that the prefix and seed can only be combined, if the cycle was involved in the path as well. We can achieve this by picking two new, unique symbol q. Using these we construct the paths above with modified pre- and postconditions.

The prefix including the first cycle <"bar",%1> ($1) | [root] -> [scope 1] -> [pop "bar"] -> [scope 1] | <%1,q> ($1)
The path for following cycles <q,"bar",%1> ($1) | [scope 1] -> [pop "bar"] -> [scope 1] | <q,%1> ($1)
The seed for possible postfixes <q,%1> ($1) | [scope 1] | <%1> ($1).

As before, the seed is the only path that is further extended. The new symbol work as a guard, ensuring these paths cannot be combined with other paths in the set, but only with each other for the specific case where the cycle is involved. When paths are stitched together, the q symbol will disappear, resulting in the same path as would be produced by resolving directly in the graph.

In this example, it might still work correctly if the q symbol is not used, because [scope 1] is not an endpoint for the partial path set we're computing. However, if the cycle can start and end in a node that is a possible endpoint (let's say it was marked as exported), the q precondition will prevent stitching [ref "foo"] -> [root] -> [scope 1] with extensions of the seed.

github-actions · 2023-10-26T11:01:34Z

Performance Summary

Comparing base d26ad13 with head 7f612c0 on typescript_benchmark benchmark. For details see workflow artifacts. Note that performance is tested on the last commits with changes in stack-graphs, not on every commit.

Before

--------------------------------------------------------------------------------
Command:            base/target/release/tree-sitter-stack-graphs-typescript index -D base.sqlite --max-file-time=30 --hide-error-details -- base/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 base-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
4.581^                      :#                                                
     |                      :#          ::                                    
     |                      :#          :                                     
     |                      :#          :                                     
     |                      :#::::      : :::                                 
     |                      :#: :       : : :                                 
     |                    : :#: :       : : :                                 
     |                    : :#: :       : : :                                 
     |                    :::#: :      :: : :                       :         
     |                    :::#: :      :: : :     :                 :         
     |                    :::#: :    :::: : :   :::                ::         
     |                @ @@:::#: :    : :: : :   : :  ::::         :::  :::::: 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   64.66

After

--------------------------------------------------------------------------------
Command:            head/target/release/tree-sitter-stack-graphs-typescript index -D head.sqlite --max-file-time=30 --hide-error-details -- head/data/typescript_benchmark
Massif arguments:   --massif-out-file=perf.out
ms_print arguments: --x=72 --y=12 head-perf-results/perf.out
--------------------------------------------------------------------------------


    GB
4.581^                        :#                                              
     |                        :#         :                                    
     |                        :#         :                                    
     |                        :#         :                                    
     |                        :#::       ::::                                 
     |                        :#:        :: :                                 
     |                     :  :#:     :  :: :                                 
     |                     :  :#:     :  :: :                                 
     |                     ::::#:     ::::: :                        :        
     |                     :: :#:     :: :: :     ::                 :        
     |                     :: :#:    @:: :: :    ::                :::        
     |                   @@:: :#:    @:: :: :   :::   :::::  @    :: :  ::::: 
   0 +----------------------------------------------------------------------->Gi
     0                                                                   61.41

Ensure complete coverage when cycles exist

7f612c0

hendrikvanantwerpen self-assigned this Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure complete coverage when cycles exist #328

Ensure complete coverage when cycles exist #328

hendrikvanantwerpen commented Oct 26, 2023 •

edited

Loading

github-actions bot commented Oct 26, 2023

Ensure complete coverage when cycles exist #328

Are you sure you want to change the base?

Ensure complete coverage when cycles exist #328

Conversation

hendrikvanantwerpen commented Oct 26, 2023 • edited Loading

Problem

Goal

Solution

github-actions bot commented Oct 26, 2023

Performance Summary

hendrikvanantwerpen commented Oct 26, 2023 •

edited

Loading