Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'extends' feature to overlay repos files when importing #148

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

christophebedard
Copy link
Contributor

@christophebedard christophebedard commented Jun 13, 2020

This adds a new extends: root-level key for .repos files, e.g.

# extension.repos
extends: base.repos
repositories:
  a/repo:
    type: git
    url: https://github.com/a/repo.git
    version: my-branch

The base.repos file could look like:

# base.repos
repositories:
  a/repo:
    type: git
    url: https://github.com/a/repo.git
    version:  master
  another/repo:
    type: git
    url: https://github.com/another/repo.git
    version: 1.2.3

Running vcs import --input extension.repos would checkout a/repo @ my-branch (instead of master) as well as another/repo @ 1.2.3.

This is motivated by ROS 2 development. As a concrete example, the master ros2.repos file tracks a specific ros2_tracing tag. As a developer of that repository, I'd like to maintain a workspace based on that master ros2.repos file (which, for practical reasons, I don't want to modify locally), but using the master branch of ros2_tracing. This is why this solution uses an extension/overlay approach. Otherwise, when a new repo is added to ros2.repos, I update my local ros2.repos file (e.g. with git pull), run vcs import ... to update my workspace, and end up checking out the specific tag instead of staying on master.

Although I haven't tested it, this should allow mixing .repos and .rosinstall files. Note that this means that vcstool extends the rosinstall format (with root-level extends:).

Currently, an extension repos file still has to be valid by itself. Eventually, I'd like to change the import logic to only validate the result of the merge between the extension file and the base file so that the extension file doesn't need to re-define all required attributes (especially type). However, this is for another PR.

@christophebedard christophebedard force-pushed the add-import-extends-feature branch from 6d6f04c to ac25dc0 Compare June 13, 2020 18:54
@christophebedard
Copy link
Contributor Author

christophebedard commented Jun 18, 2020

Also, this could eventually fit nicely with #89, e.g. the extension .repos file could provide the fork remote on top of the original file's origin remote.

@christophebedard christophebedard force-pushed the add-import-extends-feature branch from ac25dc0 to 20d2f2a Compare July 1, 2020 22:26
@dirk-thomas
Copy link
Owner

Thank you for working on this. I can see how this feature would be valuable. Before going into the details of the PR I have a few high level thoughts:

  • Atm vcs import only takes a single input file. Arguably it could be updated to allow passing multiple files. I would expect the semantic of that operation be the same as running vcs import multiple times - once for each input file - in the order they are passed. I would think this already satisfies your use case for a custom branch of ros2_tracing.
  • So my main question would be what are the use cases the extends approach enables which aren't possible otherwise?
    • Is avoiding the intermediate state (if you were to follow the first bullet) worth the complexity?
    • Clearly a partial repos file which e.g. only sets a different version but doesn't repeat the type and url of an entry would be something enabled by this.
    • Especially in combination with Add patching functionality  #156 I could see partial files being useful.
  • Are there cases where specifying more than one file under extends would be useful? Should there a semantic difference between nested chaining vs. enumerating multiple extends files as siblings?

@j-rivero
Copy link

j-rivero commented Jul 7, 2020

Thanks @christophebedard and @dirk-thomas for the PR. Let me add my use case here: I maintain the gazebodistro set of files. There are many packages that use as dependency a subset of the other packages available in the same repository (i.e: ign-mathX uses ign-cmakeY, ign-transportZ uses ign-mathX and ign-cmakeY). There is a lot of configurations duplicated to reflect the dependency chain on every of the major versions released. As you can imaging maintaining that amount of duplicate configurations among multiple files is quite a nightmare. If I understand the PR correctly, with this approach I could do:

# ign-transportZ
repositories:
 a/ign-cmakeY:
   type: git
   url: https://github.com/ign/ign-cmake.git
   version: Y
 a/ign-mathX:
   type: git
   url: https://github.com/ign/ign-math.git
   version: X
 a/ign-transportZ:
   type: git
   url: https://github.com/ign/ign-transport.git
   version: Z

And will save me to list again that configurations if I use:

# foo software using only ign-transportZ
extends: ign-transportZ.repos
repositories:
 a/ign-foo:
   type: git
   url: https://github.com/ign/ign-foo.git
   version: lala

@christophebedard
Copy link
Contributor Author

  • So my main question would be what are the use cases the extends approach enables which aren't possible otherwise?

First of all, it of course allows the user to only run one command instead of vcs import multiple times. Although no one would actually do it, this PR allows extending .repos files ~infinitely, and running vcs import ~infinitely sounds like a pain 😜

My goal was to give the extension a meaning, if that makes sense. If you were to provide a list of .repos files to --input, they're only ever "linked" together when you run the vcs import --input a.repos b.repos ... command. This way, whatever name you give the extension .repos file will always be valid, since its link to the extended .repos file is clearly defined with extends: base.repos.

Also, in my use case above, and I guess in a lot of similar use cases, the relationship between the extension file and the extended file isn't going to change, so why not shorten the command?

And this also allows you to still easily provide the .repos file (in this case the extension file) through stdin, just like before.

  • Is avoiding the intermediate state (if you were to follow the first bullet) worth the complexity?

I think so.

  • Clearly a partial repos file which e.g. only sets a different version but doesn't repeat the type and url of an entry would be something enabled by this.
  • Especially in combination with Add patching functionality  #156 I could see partial files being useful.

Absolutely, with some changes to the parsing/validation logic.

  • Are there cases where specifying more than one file under extends would be useful? Should there a semantic difference between nested chaining vs. enumerating multiple extends files as siblings?

I can see that being useful, yes, e.g. if you only need to aggregate (and maybe add more repos on top) instead of needing to override versions. So the difference is aggregation (of repos) vs overriding (of versions), even if they are combined most of the time.

@christophebedard
Copy link
Contributor Author

If I understand the PR correctly, with this approach I could do:

Yep!

And, to reference my reply above, if the extension file has a meaning by itself, since you're only combining repos here and not overriding versions, you could instead use one .repos file which aggregates your two .repos files (if the enumeration/aggregation thing gets added).

@dirk-thomas
Copy link
Owner

So the difference is aggregation (of repos) vs overriding (of versions), even if they are combined most of the time.

I would assume the following semantic:

  • for aggregation / sibling files we want to make sure that the entries don't collide (either error or warn about it if they do) and
  • for nested files / potential overriding we want to explicitly allow "updating" entries from leafy files by entries closer to the root input file.

(I know this isn't covered by the PR atm but I am trying to clarify where we want to go.)

@christophebedard
Copy link
Contributor Author

  • for aggregation / sibling files we want to make sure that the entries don't collide (either error or warn about it if they do) and

Indeed. In this case, the file listed first (top to bottom) could have priority.

  • for nested files / potential overriding we want to explicitly allow "updating" entries from leafy files by entries closer to the root input file.

Exactly, with the order being from leaf (extended file) to root (extension file), like in the tests I've written.

Also, this starts getting more and more complex, but should "deletions" be allowed, i.e. removing a repo that is in the leaf/extended file using a special key/value in the extension file?

(I know this isn't covered by the PR atm but I am trying to clarify where we want to go.)

I should've opened an issue before the PR 😛

@christophebedard
Copy link
Contributor Author

@dirk-thomas friendly ping. What's the next step?

@dgpetrie
Copy link

dgpetrie commented Feb 2, 2021

Here is a thought on the approach for aggregation. Would it be easier to add a vcs client for type: vcs?

@christophebedard
Copy link
Contributor Author

christophebedard commented Feb 2, 2021

Here is a thought on the approach for aggregation. Would it be easier to add a vcs client for type: vcs?

While probably not impossible, I'm not sure that this fits vcstool's architecture. Besides, I'm not sure what meaning a type: vcs entry's path would have.

@dgpetrie
Copy link

dgpetrie commented Feb 3, 2021

Sorry if I was not clear. I was suggesting a way to do recursive aggregation where a VCS input file can include children VCS input files. So a repo dependent upon a base repo or set of repos can include the dependency VCS input yaml files(). To take @j-rivero's example above:

# foo software using only ign-transportZ
repositories:
  ign-transportZ.repos:
    type: vcs
    url: ign-transportZ.repos
 a/ign-foo:
   type: git
   url: https://github.com/ign/ign-foo.git
   version: lala

@christophebedard
Copy link
Contributor Author

christophebedard commented Feb 4, 2021

Sorry if I was not clear. I was suggesting a way to do recursive aggregation where a VCS input file can include children VCS input files. So a repo dependent upon a base repo or set of repos can include the dependency VCS input yaml files().

Sorry, I'm a bit confused, because that's kind of what this PR does. I'm just not sure if you're only talking about how we declare these dependencies in a yaml format in a .repos file.

We could just pre-process like I did for this PR and replace an entry of type: vcs under repositories: by the repositories contained in that file and do it recursively, etc. However, repeating the path and using type: vcs seems clunky, e.g.:

extends: ign-transportZ.repos
repositories:
  ...

vs

repositories:
  ign-transportZ.repos:
    type: vcs
    url: ign-transportZ.repos
  ...

However, if you're talking about using an actual vcs client (like the ones here: https://github.com/dirk-thomas/vcstool/tree/2e90383a09e810049ee505e1e618690957c631dc/vcstool/clients), I don't think that would be ideal, from what I can tell. Here's an example:

# base.repos
repositories:
  a/repo:
    type: git
    url: https://github.com/a/repo.git
    version:  master
  another/repo:
    type: git
    url: https://github.com/another/repo.git
    version: 1.2.3
# extension.repos
repositories:
  base.repos:
    type: vcs
    url: base.repos
  a/repo:
    type: git
    url: https://github.com/a/repo.git
    version: my-branch

which is just the example I added to the README, but using type: vcs to declare the extension.

Here, given extension.repos as your entrypoint file, you have to make sure to process base.repos before a/repo, because the extension file modifies the original a/repo's version: (from master to my-branch). Dependencies are supported, e.g. here for nested repositories:

def add_dependencies(jobs):

But then how does the vcs client deal with "merging" the a/repo entry in base.repos and the a/repo entry in extension.repos so that we only have one single entry with version: my-branch? From what I can tell, clients simply process their singular entry without knowing/caring about other entries (although they're called/executed in the right order according to the dependencies, of course).

Like I said, it might be doable given some work (e.g. by allowing clients to modify the jobs list), or maybe I'm just missing something. But to me it just seems simpler to not use an actual vcs client and to pre-process to resolve all the extension declarations first and then call the clients to execute a "flat"/simple jobs list.

@andry81
Copy link

andry81 commented Aug 28, 2023

repositories:
  base.repos:
    type: vcs
    url: base.repos

This is less readable because you have to check whether the base.repos is a file or a directory in the file system.

When the

extends: base.repos
...

gives you the exact meaning that the base.repos is an extension file, so is more readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants