Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix operator-controller not using updated registries configuration #1554

Merged
merged 1 commit into from
Jan 16, 2025

Conversation

dtfranz
Copy link
Contributor

@dtfranz dtfranz commented Jan 8, 2025

NOTE: Same issue applies to catalogd, I'll make a similar change there. Done

With this change, operator-controller will be able to invalidate the configuration cache of containers/image and accept new configuration at runtime when changes occur.

For code simplicity's sake, there's a few things to take note of that can be addressed if deemed necessary:

  • The cache always invalidates on the first unpack, as we've not yet read the registries.conf file. This is easily addressed but I wanted to keep main() clear of the file reading op.
  • We only check the registries.conf file. Checking the entire directory is more involved since we have to traverse the folder tree along with the symlinks dropped in there by k8s (WalkDir does not support symlinks , but it might get easier in go 1.25?)

@dtfranz dtfranz requested a review from a team as a code owner January 8, 2025 00:33
Copy link

netlify bot commented Jan 8, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 1231180
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/678805966953120008f2692d
😎 Deploy Preview https://deploy-preview-1554--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@dtfranz
Copy link
Contributor Author

dtfranz commented Jan 8, 2025

Working on tests
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 8, 2025
cmd/manager/main.go Outdated Show resolved Hide resolved
cmd/manager/main.go Outdated Show resolved Hide resolved
@dtfranz dtfranz force-pushed the mirror-without-restart branch from bd195d1 to e871aa5 Compare January 9, 2025 01:50
internal/rukpak/source/containers_image.go Outdated Show resolved Hide resolved
internal/rukpak/source/containers_image.go Outdated Show resolved Hide resolved
internal/rukpak/source/containers_image.go Outdated Show resolved Hide resolved
internal/rukpak/source/containers_image.go Outdated Show resolved Hide resolved
internal/rukpak/source/containers_image.go Show resolved Hide resolved
@@ -250,6 +302,11 @@ func (i *ContainersImageRegistry) unpackImage(ctx context.Context, unpackPath st
if err != nil {
return fmt.Errorf("error creating image source: %w", err)
}
defer func() {
if err := layoutSrc.Close(); err != nil {
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a question - imho panicking here seems a little excessive, but I see it in multiple places, so I think it's getting caught

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just following convention here :) Not sure it's the right way to handle it either but I just noticed the source close was missing so I wanted to catch that while I was in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe let's proceed with the convention for this PR, but create a discussion in upstream slack? I'm also a bit 🤔 about these panics. Maybe we should level set on a tenet for when to use it and why?

@dtfranz dtfranz force-pushed the mirror-without-restart branch from e871aa5 to a30588c Compare January 9, 2025 18:34
if err != nil {
continue
} else if !t.Equal(i.RegistriesConfTimestamp) {
// We've found the timestamped symlink and it's been updated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtfranz @joelanford

turns out there might be an issue with relying on created symlinks: if you use subPath to mount a specific file from ConfigMap, the symlinks are not created at all. See https://stackoverflow.com/a/50687707 and kubernetes/kubernetes#50345 for context.

Note: can't seem to have the powers to unresolve a previous discussion, so adding it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that, an fsnotify solution is starting to seem more attractive to me now 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back to a question I saw @perdasilva ask.. What is the cost of invalidating every single time? I assume that would mean that Unpack would be re-reading files and re-loading essentially the entire configuration at every pass.

WDYT about starting out with "invalidate every time" to keep code complexity low, with a comment that explains that we pay this reload cost unnecessarily, and that if it becomes a performance issue, we should follow up later and optimize to invalidate only on changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that personally. Have we encountered any performance issues otherwise so far? Just so we aren't adding on top of anything else. AFAIK we aren't, so I'd be OK with invalidating on each pass with a follow up issue to track it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perdasilva now kicking myself because I completely misinterpreted that question you asked - my bad!! I thought you were saying my first change was invalidating on each pass and didn't read it as a recommendation 😅 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running an extremely rudimentary time elapsed check on a local e2e run, the cache invalidation can take between 500ns and 3µs. I think that's acceptable though it may be worth mentioning every unpack will have to wait on the config mutex from any prior unpack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good ^^

Copy link

codecov bot commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 66.70%. Comparing base (bf96d5f) to head (1231180).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
internal/rukpak/source/containers_image.go 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1554      +/-   ##
==========================================
- Coverage   66.84%   66.70%   -0.15%     
==========================================
  Files          57       57              
  Lines        4588     4595       +7     
==========================================
- Hits         3067     3065       -2     
- Misses       1298     1304       +6     
- Partials      223      226       +3     
Flag Coverage Δ
e2e 53.25% <60.00%> (+0.41%) ⬆️
unit 53.66% <71.42%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 10, 2025
@joelanford
Copy link
Member

Now that catalogd is included in this repo, those changes can be incorporated in this PR as well.

@dtfranz
Copy link
Contributor Author

dtfranz commented Jan 10, 2025

Now that catalogd is included in this repo, those changes can be incorporated in this PR as well.

Absolutely - I'll include the change in there as well.

@dtfranz dtfranz force-pushed the mirror-without-restart branch from a30588c to 2389538 Compare January 15, 2025 00:02
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 15, 2025
@perdasilva
Copy link
Contributor

@dtfranz @joelanford one thing I was thinking about is that (and maybe this has changed since) the unpack flow blocks the reconciliation thread. I wonder if we should just spawn the unpack process, update the status and re-enqueue after 30 seconds (or whatever), check the status of the process, update status, repeat until the process reaches a terminal state.

@dtfranz dtfranz force-pushed the mirror-without-restart branch from 2389538 to 74bb174 Compare January 15, 2025 16:19
@dtfranz
Copy link
Contributor Author

dtfranz commented Jan 15, 2025

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 15, 2025
…ates without restart

Signed-off-by: dtfranz <dfranz@redhat.com>
@dtfranz dtfranz force-pushed the mirror-without-restart branch from 74bb174 to 1231180 Compare January 15, 2025 18:59
Copy link
Contributor

@perdasilva perdasilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm =D great work! thank you!!!

@dtfranz dtfranz added this pull request to the merge queue Jan 16, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 16, 2025
@dtfranz dtfranz added this pull request to the merge queue Jan 16, 2025
Merged via the queue into operator-framework:main with commit f26bf23 Jan 16, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants