Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make source links relative to OUT_DIR #9580

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nmattia
Copy link

@nmattia nmattia commented Nov 7, 2024

Fixes #9553

When generating code, cranelift-isle records the path to the source files. These paths were sometimes absolute, meaning it kept a trace of the system it was run on.

These changes here add helper functions in Files to try and strip the value of $OUT_DIR (if set) from the file names.

This also removes a log line (which included the full path of files) from cranelift-codegen-meta.

When generating code, `cranelift-isle` records the path to the source
files. These paths were sometimes absolute, meaning it kept a trace of
the system it was run on.

These changes here add helper functions in `Files` to try and strip the
value of `$OUT_DIR` (if set) from the file names.
@nmattia nmattia requested a review from a team as a code owner November 7, 2024 11:26
@nmattia nmattia requested review from cfallin and removed request for a team November 7, 2024 11:26
@nmattia
Copy link
Author

nmattia commented Nov 7, 2024

@cfallin here are some changes related to our discussion here.

This also removes a log line (which included the full path of files) from cranelift-codegen-meta.

About this ☝️ I'm happy to remove it/submit it separately/make it relative.

@fitzgen you mentioned that, in the past, files were not printed with absolute paths. Is it worth investing some time to ensure this doesn't happen again in the future?

Note: I originally changed the code to strip CARGO_MANIFEST_DIR, but it looks like the files were actually not in there; instead they were in OUT_DIR. This might be an artifact of the build system used in my project; let me know if this makes sense.

Err(_) => file_name,
Ok(root) => file_name.strip_prefix(&root).unwrap_or(file_name).into(),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the build script that invokes cranelift-isle can read this instead. Cranelift-isle can technically be used to build arbitrary code. Files::from_paths could get a set of paths to make make input paths relative to, or alternatively a set of paths against which to resolve relative paths and then have the build script only pass the relative paths for individual isle files instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not too familiar with the code/cranelift. Are you suggesting that we should make the paths relative here instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There is already a make_isle_source_path_relative function which gets used, but that only affects isle files that are part of the source rather than those that get generated by the build script itself. You did still have to separately pass a (set of) prefix(s) to join with the relative path to get a path that can be passed to fs::read, but this prefix can be ignored when printing the source locations in the generated file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only affects isle files that are part of the source rather than those that get generated by the build script itself.

Unrelated to this thread, but that explains why I had to use OUT_DIR instead of CARGO_MANIFEST_DIR !

Yes.

That makes sense. I'll try to find some time later to update the PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, didn't have much time to look into this!

After playing with this for a bit, I think it might make sense to update the existing function (make_isle_source_path_relative) to work with paths that are not subpaths of the current directory. Borrowing some code from pathdiff, here's what this would look like:

diff --git a/cranelift/codegen/build.rs b/cranelift/codegen/build.rs
index 2cbf2c9e5..cfe6d37d7 100644
--- a/cranelift/codegen/build.rs
+++ b/cranelift/codegen/build.rs
@@ -140,10 +140,11 @@ fn make_isle_source_path_relative(
     cur_dir: &std::path::Path,
     filename: &std::path::Path,
 ) -> std::path::PathBuf {
-    if let Ok(suffix) = filename.strip_prefix(&cur_dir) {
-        suffix.to_path_buf()
-    } else {
-        filename.to_path_buf()
+
+    match diff_paths(filename, cur_dir) {
+        Some(relative) => relative,
+        None => filename.to_path_buf(),
+
     }
 }
 
@@ -253,3 +254,48 @@ fn rustfmt(code: &str) -> std::io::Result<String> {
 
     Ok(String::from_utf8(data).expect("rustfmt always writes utf-8 to stdout"))
 }
+
+pub fn diff_paths<P, B>(path: P, base: B) -> Option<std::path::PathBuf>
+where
+    P: AsRef<std::path::Path>,
+    B: AsRef<std::path::Path>,
+{
+    let path = path.as_ref();
+    let base = base.as_ref();
+
+    if path.is_absolute() != base.is_absolute() {
+        if path.is_absolute() {
+            Some(std::path::PathBuf::from(path))
+        } else {
+            None
+        }
+    } else {
+        let mut ita = path.components();
+        let mut itb = base.components();
+        let mut comps: Vec<std::path::Component> = vec![];
+        loop {
+            match (ita.next(), itb.next()) {
+                (None, None) => break,
+                (Some(a), None) => {
+                    comps.push(a);
+                    comps.extend(ita.by_ref());
+                    break;
+                }
+                (None, _) => comps.push(std::path::Component::ParentDir),
+                (Some(a), Some(b)) if comps.is_empty() && a == b => (),
+                (Some(a), Some(b)) if b == std::path::Component::CurDir => comps.push(a),
+                (Some(_), Some(b)) if b == std::path::Component::ParentDir => return None,
+                (Some(a), Some(_)) => {
+                    comps.push(std::path::Component::ParentDir);
+                    for _ in itb {
+                        comps.push(std::path::Component::ParentDir);
+                    }
+                    comps.push(a);
+                    comps.extend(ita.by_ref());
+                    break;
+                }
+            }
+        }
+        Some(comps.iter().map(|c| c.as_os_str()).collect())
+    }
+}

Can confirm that this also fixes the issue (no more non-determinism), plus the fix is a bit more self contained. Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjorn3 what do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cfallin or @bjorn3 do you think it makes sense to extend make_isle_source_path_relative as suggested above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm -- I'll admit I'm having a somewhat visceral reaction to this much complexity in a build.rs script. Yes, it works, and it is conceptually well-defined, but it's just... a lot of logic that seems like it shouldn't be necessary.

Popping up a few levels in the "why are we doing this" hierarchy, another thought occurs to me: we would also have deterministic output from the ISLE compiler if we omitted the comments with filenames in them; those comments are mainly meant for times when humans are trying to debug the generated code; we already have an environment variable to place the ISLE generated code in a non-build-outputs location for a human to be able to browse it (ISLE_SOURCE_DIR); perhaps we could add another environment variable ISLE_COMMENTS such that if set, it adds the comments, and we omit the comments otherwise?

We would need to add an option to CodegenOptions in the ISLE compiler (call it, say, include_comments), then set that option appropriately in this build.rs, and follow it in codegen.rs -- and let's omit all comments except the "this is machine generated" one at the top if unset. How does that sound?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cfallin sounds good to me! Will see when I have free cycles to look into this; let me know if you beat me to it.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:meta Everything related to the meta-language. isle Related to the ISLE domain-specific language labels Nov 7, 2024
Copy link

github-actions bot commented Nov 7, 2024

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "cranelift:meta", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid printing full file paths in cranelift generated code
3 participants