Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced mode removes logic when an array reference is defined within for loop parameters #4207

Open
DVLP opened this issue Jan 1, 2025 · 19 comments

Comments

@DVLP
Copy link

DVLP commented Jan 1, 2025

Edit: Scroll on for the simplest reproduction examples that don't involve any 3rd party dependencies.

I don't know how to describe what internally is happening but it may be helpful to give this example of an undesirable behaviour
This is createString from flatbuffers method

createString(s) {
        if (s === null || s === undefined) {
            return 0;
        }
        let utf8;
        if (s instanceof Uint8Array) {
            utf8 = s;
        }
        else {
            utf8 = this.text_encoder.encode(s);
        }
        this.addInt8(0);
        this.startVector(1, utf8.length, 1);
        this.bb.setPosition(this.space -= utf8.length);
        for (let i = 0, offset = this.space, bytes = this.bb.bytes(); i < utf8.length; i++) {
            bytes[offset++] = utf8[i];
        }
        return this.endVector();
    }

After CC

createString(d) {
    if (null === d || void 0 === d)
        return 0;
    d = d instanceof Uint8Array ? d : this.za.encode(d);
    this.ba(0);
    this.startVector(1, d.length, 1);
    this.j.setPosition(this.Za -= d.length);
    for (let e = 0, k = this.Za; e < d.length; e++)
        k++;
    return this.endVector()
}

Notice how .bytes() was replaced by this.Za and the logic bytes[offset++] = utf8[i]; was replaced by k++;
bb.bytes() simply returns a property

bytes() {
        return this.bytes_;
    }

adding an extern* helped to keep this function, which in turn fixed the inner logic

for (let e = 0, k = this.cb, m = this.j.bytes(); e < d.length; e++)
                        m[k++] = d[e];

, but the bug remains somwhere

/**
 * @return {!Uint8Array}
 */
ByteBuffer.prototype.bytes = function() {};
@niloc132
Copy link
Contributor

niloc132 commented Jan 1, 2025

Can you share more of your program? Is it possible that for this example, the ByteBuffer's bytes aren't being read anywhere, so the compiler could remove the writes to it?

Marking the method as as extern would indeed prevent this optimization, since it could allow any external code to call that method, so the field could potentially be read anywhere. This (or exporting bytes() is the correct "workaround" for this, since we want the ADVANCED compilation level to remove variables that are written but never read (and many other cases as well), but marking that method as an entrypoint for code the compiler cannot see is the right way to tell it that there are accesses that it doesn't know about, cannot reason about.

@DVLP
Copy link
Author

DVLP commented Jan 1, 2025

It's flatbuffers, javascript flavour(from mjs subfolder when instaling a module with NPM). Here's the typescript source https://github.com/google/flatbuffers/tree/master/ts
https://github.com/google/flatbuffers/blob/master/ts/builder.ts#L527

@DVLP
Copy link
Author

DVLP commented Jan 1, 2025

If the variable is not being read directly it may be due to memory sharing with .subarray so the access may be not obvious. In that case it sounds like a new feature to build in CC, to be able to track which variables are views to the same array buffer.

@niloc132
Copy link
Contributor

niloc132 commented Jan 1, 2025

Yes, I'm quite familiar with flatbuffers (previously used the JS/TS impl, now using the Java impl transpiled to JS). If .subarray() is removed as not being a read, that would be quite a bug indeed, but that seems unlikely given how other types aren't having that issue.

When I suggested sharing more of the program, I mean a (minimal) reproducible test case that demonstrates both creation of the fb byte buffer Builder and reading from it (could included generated types from a .fbs file, up to you). If for example you were just running closure on flatbuffers (and possibly your generated schema files), the compiler would be correct to remove code that doesn't look like it has an accessible entrypoint. If however a minimal program shows shows bb.bytes() and the bb._bytes property being compiled out despite being accessed within code that closure can see, that would point to a bug that could be fixed. Calling Builder.dataBuffer() isn't sufficient, something has to consume the bytes it stores (or otherwise export members that access them, so the compiler can see it must not prune them).

@DVLP
Copy link
Author

DVLP commented Jan 2, 2025

Ok here's a minimal reproduction project. flatbuffers is in flatbuffers_mjs instead of being in a package
https://github.com/DVLP/ccflatbuffers

@niloc132
Copy link
Contributor

niloc132 commented Jan 2, 2025

To confirm (note that I haven't run it yet, and I'm not a member of the project, but interested in both closure-compiler and flatbuffer usage), this test seems to only let the user verify by compiling and seeing if the output looks right - but doesn't actually show a "sorry, no data was written" (or more likely "sorry, there's no array for me to deserialize") error anywhere?

If I'm reading https://github.com/DVLP/ccflatbuffers/blob/fa061dcee830a6c6ffde4c20460dc9b43155673e/index.js right, serializeData ends with

    return builder.asUint8Array()

but is only called once at the end of the file:

serializeData({ uuid: 'TEST' })

which throws away the returned array. Since the return of serializeData is thrown away, the return itself can be removed (as long as asUint8Array has no side effects, etc), which could end up still removing the array entirely?

My point with my other messages is that I think you must actually consume and use this array - log it, or even pass it to the (unused) parseBuffer() method and then log those results, so we can see what happens there too?


With that said, I do see that the externs sort of export the serializeData method, but it exposes it as an object rather than "here's a function that will return a Uint8Array" - perhaps correcting that could resolve this?

@DVLP
Copy link
Author

DVLP commented Jan 2, 2025

serializeData is only in index.js so the output is produced in /dist/bundle.js, but there's no test of the functionality here. Otherwise nothing is generated. This is enough to produce a full output which is the same as when using the result of serializeData in my large project. Adding types to serializeData won't change anything about the flatbuffers' bundle bug, as I tried using AI to build a more comprehensive externs list and that didn't fix the bug. Only adding bytes() fixes the issue. With this setup everything works, encoding and decoding, only strings don't.

@niloc132
Copy link
Contributor

niloc132 commented Jan 2, 2025

Can you confirm which version of flatbuffers you are using? The js/ts files don't specify nor the readme (why not use https://www.npmjs.com/package/flatbuffers?), and while I don't run windows, I wouldn't run a .exe file from someone else's git repo even if I did.

Alternatively, check in the generated types - it is a distasteful way to solve the problem IMHO, but with no public distributions of flatc binaries, it is often the best way to resolve cross platform builds.

@DVLP
Copy link
Author

DVLP commented Jan 2, 2025

It's the latest flatbuffers from npm. Anyway the issue is not about flatbuffers but closure breaking the logic inside the loop. Forget about flatbuffers. flatc binaries are generally available. A person who knows how to debug and fix this problem will be able to take it from here.

@niloc132
Copy link
Contributor

niloc132 commented Jan 2, 2025

Using v24.12.23 I ran npm run generate and npm run build - note that there are 10 warnings when building with closure, both in the externs and the index.js file itself - inaccurate type info (or incorrect usage of types) can result in wrong output.

However, that doesn't seem to be the case here. Here's a minimal test case that demonstrates the issue without the extra build steps and code:

const arr = [];

for (let i = 0, d = arr; i < 5; i++) {
  d[i] = "test";
}

console.log(arr);

Run closure 20250101.0.0-nightly:

$ google-closure-compiler --js='index.js' --js_output_file='dist/bundle.js' --compilation_level=ADVANCED --entry_point='index.js'  --debug --formatting=PRETTY_PRINT 

Expected - the array would be populated in the loop, and then logged at the end.

Actual:

for (let $i$jscomp$3$$ = 0; $i$jscomp$3$$ < 5; $i$jscomp$3$$++) {
}
console.log([]);

Adding --print_source_after_each_pass shows that the removeUnusedCode pass seems to be at fault:

// normalize yields:
// ************************************
const arr = [];
for (let i$jscomp$3 = 0, d = arr; i$jscomp$3 < 5; i$jscomp$3++) {
  d[i$jscomp$3] = "test";
}
console.log(arr);

// ************************************

// removeUnusedCode yields:
// ************************************
const arr = [];
for (let i$jscomp$3 = 0; i$jscomp$3 < 5; i$jscomp$3++) {
}
console.log(arr);

// ************************************

As another workaround, you can edit the loop in builder.js to declare bytes just before the loop, since it is only assigned once before the loop starts.

@DVLP
Copy link
Author

DVLP commented Jan 2, 2025

Great, this is a very good minimal example. Looks like the issue is a much more straightforward bug than subarray memory sharing thing.

@DVLP DVLP changed the title Advanced mode replaced a function call with a property Advanced mode removes logic when an array reference is used Jan 2, 2025
@DVLP
Copy link
Author

DVLP commented Jan 2, 2025

Updated the title. Changing let to var doesn't produce the same bug. Moving i and d declaration(even without definitions) above the loop also makes the bug disappear. It must be something about defining the reference to the array inside the loop parameters
In

const arr = [];
for (var i = 0, d = arr; i < 5; i++) {
  d[i] = "test";
}
console.log(arr);

Out

const $arr$$ = [];
for (var $i$$ = 0; 5 > $i$$; $i$$++) {
  $arr$$[$i$$] = "test";
}
console.log($arr$$);

In

const arr = [];
let i, d, e;
for (i = 0, d = arr; i < 5; i++) {
  d[i] = "test";
}
console.log(arr);

Out

const $arr$$ = [];
let $i$$ = 0, $d$$ = $arr$$;
$i$$ = 0;
for ($d$$ = $arr$$; 5 > $i$$; $i$$++) {
  $d$$[$i$$] = "test";
}
console.log($arr$$);

@DVLP DVLP changed the title Advanced mode removes logic when an array reference is used Advanced mode removes logic when an array reference is defined withing for loop parameters Jan 2, 2025
@DVLP DVLP changed the title Advanced mode removes logic when an array reference is defined withing for loop parameters Advanced mode removes logic when an array reference is defined within for loop parameters Jan 2, 2025
@DVLP
Copy link
Author

DVLP commented Jan 2, 2025

Looks like this problem also occurs when it's an object not an array
In

const obj = {};
for (let i = 0, d = obj; i < 5; i++) {
  d.prop = "test";
}
console.log(obj);

Out

for (let $i$jscomp$3$$ = 0; 5 > $i$jscomp$3$$; $i$jscomp$3$$++) {
}
console.log({});

When a parameter of the for loop is defined, CC must think it's copied by a value, not reference and then assumes that the local scope of the loop(hence it doesn't happen var) has no side effects

@ChadKillingsworth
Copy link
Collaborator

This is by design. There are no accessors to the property so it is removed as dead code. If you log out the actual property, then it shouldn't be removed.

@niloc132
Copy link
Contributor

niloc132 commented Jan 2, 2025

@ChadKillingsworth I can expect that for the field access on the object, but on an array as well?

@ChadKillingsworth
Copy link
Collaborator

You are assigning to the array but never accessing any element. That would be dead code.

There's several techniques that can be used to ensure that the compiler understands that the assignments are needed, but it requires some work to keep it.

@niloc132
Copy link
Contributor

niloc132 commented Jan 2, 2025

Okay, here I'm accessing at least one element:

const arr = [];
for (let i = 0, d = arr; i < 5; i++) {
  d[i] = "test";
}
console.log(arr[0]);

The compiler service isn't dead yet, so here's the compile output:
https://closure-compiler.appspot.com/home#code%3D%252F%252F%2520%253D%253DClosureCompiler%253D%253D%250A%252F%252F%2520%2540output_file_name%2520default.js%250A%252F%252F%2520%2540compilation_level%2520ADVANCED_OPTIMIZATIONS%250A%252F%252F%2520%2540formatting%2520pretty_print%250A%252F%252F%2520%253D%253D%252FClosureCompiler%253D%253D%250A%250Aconst%2520arr%2520%253D%2520%255B%255D%253B%250Afor%2520(let%2520i%2520%253D%25200%252C%2520d%2520%253D%2520arr%253B%2520i%2520%253C%25205%253B%2520i%252B%252B)%2520%257B%250A%2520%2520d%255Bi%255D%2520%253D%2520%2522test%2522%253B%250A%257D%250Aconsole.log(arr%255B0%255D)%253B

'use strict';
for (let a = 0; a < 5; a++) {
}
console.log(void 0);

This roughly corresponds to what is happening in flatbuffers, where the array is aliased in the loop, but writes in the loop are removed as dead code. Later the array is accessed - other writes and reads to the array are not optimized out, only this one let alias within the for gets this treatment.

To be clear, the optimization to console.log() to write undefined is correct if the loop never ran or didn't exist. The bug is that the aliased writes are removed.

@DVLP
Copy link
Author

DVLP commented Jan 3, 2025

@ChadKillingsworth looks like even accessing the property doesn't fix it.
In

const obj = {};
for (let i = 0, d = obj; i < 5; i++) {
    d.prop = "test";
}
console.log(obj.prop);

Out

for (let $i$jscomp$3$$ = 0; 5 > $i$jscomp$3$$; $i$jscomp$3$$++) {
}
console.log({}.$prop$);

@ChadKillingsworth
Copy link
Collaborator

I think this has something to do with scoping. I'm not sure why exactly (nor do I remember the exact rules around for loop declarations) - that may be worth looking into.

const arr = [];
let d = arr;
for (let i = 0; i < 5; i++) {
  d[i] = "test";
}
console.log(arr[0]);

That produces the output you expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants