-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Use Ops.CollectStatus Event #53
Conversation
31741e8
to
7a0a1f6
Compare
Signed-off-by: gatici <gulsum.atici@canonical.com>
7a0a1f6
to
f444049
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment: this may be the chance to explore a new way of handling charm statuses, like it is done in Charmed Kubeflow Profiles
src/charm.py
Outdated
if self._is_unit_in_non_active_status(): | ||
event.add_status(self._is_unit_in_non_active_status()) # type: ignore[arg-type] | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how expensive some of those operations are, but we shouldn't be evaluating all of this twice. Just do the following. However, you'll also want to set ActiveStatus
if the above returns None:
if self._is_unit_in_non_active_status(): | |
event.add_status(self._is_unit_in_non_active_status()) # type: ignore[arg-type] | |
return | |
if status := self._is_unit_in_non_active_status(): | |
self.unit.status = status | |
else: | |
self.unit.status = ActiveStatus() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we prefer event.add_status here ? According to documentation it says "Each code path in a collect-status handler should call add_status at least once."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use ActiveStatus in else block, we lose the states which are set by functions triggered by relation_broken events.
So, I am not setting ActiveStatus() through that hook. There is a strange situation. After a relation is broken, although we are checking the relations existence in def _is_unit_in_non_active_status, it could not catch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the issue you're seeing is related to canonical/operator#940 or canonical/operator#888? (Which we're hoping to address this cycle -- probably earlyish in the cycle.)
Yeah, ideally if you're using collect-status you'd avoid setting blocked in the relation broken handlers and just query for the status of the relation. But perhaps due to one of the above issues that's not working? What code would you be using to check for the relation's existence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, and to respond to your earlier comment -- sorry, that's my mistake! We should use event.add_status(...)
here, oops!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good question. Until #940 is addresses (which might be a little while, so don't hold your breath) you'll probably have to use a workaround. One workaround would be to set a variable/flag on the charm instance in the relation-broken hook, and then check that flag in the collect-status handler. Something like this (completely untested):
class MyCharm(ops.CharmBase):
def __init__(self, *args):
super().__init__(*args)
self.framework.observe(self.on.database_relation_broken, self._on_database_relation_broken)
self.framework.observe(self.on.collect_unit_status, self._on_collect_unit_status)
self._database_relation_breaking = False
def _on_database_relation_broken(self, event):
self._database_relation_breaking = True
def _on_collect_unit_status(self, event):
...
if not self.get_relation('database') or self._database_relation_breaking:
event.add_status(ops.BlockedStatus('blah blah blah'))
return
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about storing a key in peer relation data similar to that example: https://github.com/canonical/postgresql-operator/blob/e13b871022bb6cff44cfb3ca83427c0e9d03f606/src/relations/db.py#L198C1-L202C19?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benhoyt I updated the PR adding a key into relation data. Unit tests are all fine but it is still failing in integration tests while detecting broken relation. Could you please have a look ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it quickly just to show you the implementation. There could be some improvement areas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented according to your suggestion. Ready to review.
Hi Dario, |
Signed-off-by: gatici <gulsum.atici@canonical.com>
af44289
to
c66244d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me, thanks!
Hopefully once we resolve #940 you can update it to avoid the "broken" boolean variables and just query the relation status immediately.
c66244d
to
1a6ffb3
Compare
Signed-off-by: gatici <gulsum.atici@canonical.com>
1a6ffb3
to
1741815
Compare
if not self.model.get_relation("database") or self._database_relation_breaking: | ||
return BlockedStatus("Waiting for database relation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that self.model.get_relation("database")
returns False
if a relation has not been completely removed by a previous hook?
I am thinking about this case:
amf
andmongodb-k8s
are properly related- invoke
remove-integration
and for some reason the hook fails only on mongo side and the relation becomes stale (nowamf
should be in BlockedStatus) - invoke any other hook
At the moment the modification in integration tests works because the relation is properly removed and get_relation
returns False
as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact self.model.get_relation("database") is generally returning True because of a known issue (https://bugs.launchpad.net/juju/+bug/1979811). Although the integration is removed, it still appears in the model. As self._database_relation_breaking returns True, then unit status is set to Blocked.
I also deployed the charm including this patch and tested it manually.
$ juju status --relations
Model Controller Cloud/Region Version SLA Timestamp
test-integration-twnt microk8s-localhost microk8s/localhost 3.1.6 unsupported 21:49:59+03:00
App Version Status Scale Charm Channel Rev Address Exposed Message
mongodb-k8s active 1 mongodb-k8s 5/edge 36 10.152.183.198 no Primary
sdcore-amf active 1 sdcore-amf 0 10.152.183.22 no
sdcore-nrf active 1 sdcore-nrf edge 58 10.152.183.58 no
self-signed-certificates active 1 self-signed-certificates beta 33 10.152.183.202 no
Unit Workload Agent Address Ports Message
mongodb-k8s/0* active idle 10.1.146.19 Primary
sdcore-amf/0* active idle 10.1.146.7
sdcore-nrf/0* active idle 10.1.146.51
self-signed-certificates/0* active idle 10.1.146.15
Integration provider Requirer Interface Type Message
mongodb-k8s:database sdcore-amf:database mongodb_client regular
mongodb-k8s:database sdcore-nrf:database mongodb_client regular
mongodb-k8s:database-peers mongodb-k8s:database-peers mongodb-peers peer
sdcore-nrf:fiveg-nrf sdcore-amf:fiveg-nrf fiveg_nrf regular
self-signed-certificates:certificates sdcore-amf:certificates tls-certificates regular
self-signed-certificates:certificates sdcore-nrf:certificates tls-certificates regular
$ juju remove-relation mongodb-k8s:database sdcore-amf:database
$ juju status --relations
Model Controller Cloud/Region Version SLA Timestamp
test-integration-twnt microk8s-localhost microk8s/localhost 3.1.6 unsupported 21:50:38+03:00
App Version Status Scale Charm Channel Rev Address Exposed Message
mongodb-k8s active 1 mongodb-k8s 5/edge 36 10.152.183.198 no Primary
sdcore-amf waiting 1 sdcore-amf 0 10.152.183.22 no installing agent
sdcore-nrf active 1 sdcore-nrf edge 58 10.152.183.58 no
self-signed-certificates active 1 self-signed-certificates beta 33 10.152.183.202 no
Unit Workload Agent Address Ports Message
mongodb-k8s/0* active idle 10.1.146.19 Primary
sdcore-amf/0* blocked idle 10.1.146.7 Waiting for database relation
sdcore-nrf/0* active idle 10.1.146.51
self-signed-certificates/0* active idle 10.1.146.15
Integration provider Requirer Interface Type Message
mongodb-k8s:database sdcore-nrf:database mongodb_client regular
mongodb-k8s:database-peers mongodb-k8s:database-peers mongodb-peers peer
sdcore-nrf:fiveg-nrf sdcore-amf:fiveg-nrf fiveg_nrf regular
self-signed-certificates:certificates sdcore-amf:certificates tls-certificates regular
self-signed-certificates:certificates sdcore-nrf:certificates tls-certificates regular
Juju debug-log:
core-amf-0: 21:49:50 INFO unit.sdcore-amf/0.juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/test-integration-twnt/services/sdcore-amf "HTTP/1.1 200 OK"
unit-sdcore-amf-0: 21:49:50 INFO unit.sdcore-amf/0.juju-log Kubernetes service 'sdcore-amf' patched successfully
unit-sdcore-amf-0: 21:49:51 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-mongodb-k8s-0: 21:50:36 INFO juju.worker.uniter.operation ran "database-relation-departed" hook (via hook dispatching script: dispatch)
unit-sdcore-amf-0: 21:50:37 INFO juju.worker.uniter.operation ran "database-relation-departed" hook (via hook dispatching script: dispatch)
unit-mongodb-k8s-0: 21:50:37 INFO unit.mongodb-k8s/0.juju-log database:4: Remove relation user: relation-4
unit-mongodb-k8s-0: 21:50:37 INFO unit.mongodb-k8s/0.juju-log database:4: Update relation user: relation-1 on free5gc
unit-mongodb-k8s-0: 21:50:37 INFO unit.mongodb-k8s/0.juju-log database:4: Updating relation data according to diff
unit-sdcore-amf-0: 21:50:37 INFO juju.worker.uniter.operation ran "database-relation-broken" hook (via hook dispatching script: dispatch)
unit-mongodb-k8s-0: 21:50:37 INFO juju.worker.uniter.operation ran "database-relation-broken" hook (via hook dispatching script: dispatch)
@@ -97,6 +104,12 @@ def __init__(self, *args): | |||
self._database = DatabaseRequires( | |||
self, relation_name="database", database_name=DATABASE_NAME | |||
) | |||
# Setting attributes to detect broken relations until | |||
# https://github.com/canonical/operator/issues/940 is fixed | |||
self._database_relation_breaking = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are in no rush to use collect status and I feel like I'd rather wait for the ops fix instead of forcing this change now and reverting some of it when the fix is available. This is quite a large change already right before we want to promote to beta and going ahead with it makes me slightly nervous. I'd at least wait until we have our charms in beta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will freeze the PR's till the ops issue is fixed. Thank you!
@@ -97,6 +104,12 @@ def __init__(self, *args): | |||
self._database = DatabaseRequires( | |||
self, relation_name="database", database_name=DATABASE_NAME | |||
) | |||
# Setting attributes to detect broken relations until | |||
# https://github.com/canonical/operator/issues/940 is fixed | |||
self._database_relation_breaking = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @benhoyt,
According to Juju documentation https://juju.is/docs/sdk/constructs#heading--framework,
The attributes such as self._database_relation_breaking
should be set to their default values upon every hook execution. However, I tested it and those states are kept somehow.
I both modified the integration test and tested manually by adding additional config-changed hook to see the effect.
After relation broken, running a config-changed hook did not reset the init attribute state which conflicts with the documentation. If the documentation was correct, the integration tests should fail in this PR but they are not failing (?).
Please enlighten us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatici I am not experiencing this behaviour. I just brutally added some logs in this revision.
What I did:
- Deploy all required charms and properly relate everything.
- Remove relation between
amf
andmongodb-k8s
(database
). - Run
juju config amf external-amf-ip=1.1.1.1
unit-amf-0: 16:33:35 CRITICAL unit.amf/0.juju-log database:5: self._database_relation_breaking: False (class init)
unit-amf-0: 16:33:35 CRITICAL unit.amf/0.juju-log database:5: self._database_relation_breaking: True (event: <RelationBrokenEvent via AMFOperatorCharm/on/database_relation_broken[105]>)
unit-amf-0: 16:33:35 INFO juju.worker.uniter.operation ran "database-relation-broken" hook (via hook dispatching script: dispatch)
unit-amf-0: 16:33:52 CRITICAL unit.amf/0.juju-log self._database_relation_breaking: False (class init)
unit-amf-0: 16:33:52 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
First line prints the default value of _database_relation_breaking
on class init when the relation broken event is triggered.
Second line is right after we update the attribute in the same hook.
In the third line you see the hook completes.
In the fourth line the class is instatiated again (for the config-changed
hook) and the attribute is reset to its default value.
In the fifth line you see the hook completes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatici First, there's a discrepancy here between Harness tests (with the same Harness instance) and reality. The charm instance isn't reinitialized between hook executions when the Harness triggers events.
This also applies to deferred, which may be what you're seeing here? Normally the charm is a completely new instance (with attributes reset by __init__
) every hook execution, but if an event was deferred, the deferred events are run first against the charm instance, and then the "main" event also executed on that same charm instance (without resetting attributes).
These discrepancies are a problem that we hope to address that this cycle. See canonical/operator#736
So I wonder if the difference @dariofaccin is seeing is because there were no deferred events in that case (for whatever reason).
CC: @tonyandrewmeyer for visibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benhoyt Many thanks for your help and quick response!
The change is done in PR #84 because of merge conflicts. |
Description
This PR uses new ops feature CollectStatus Event to manage status Charm. Event named collect_unit_status sets the status of charm automatically at the end of every hook.
Reference:
Checklist: