Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Use Ops.CollectStatus Event #53

Closed
wants to merge 3 commits into from

Conversation

gatici
Copy link
Contributor

@gatici gatici commented Nov 22, 2023

Description

This PR uses new ops feature CollectStatus Event to manage status Charm. Event named collect_unit_status sets the status of charm automatically at the end of every hook.

Reference:

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that validate the behaviour of the software
  • I validated that new and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have bumped the version of the library

@gatici gatici requested a review from a team as a code owner November 22, 2023 12:33
@gatici gatici force-pushed the TELCO-881-use-ops-CollectStatus branch from 31741e8 to 7a0a1f6 Compare November 22, 2023 12:41
@gatici gatici requested a review from benhoyt November 22, 2023 12:42
@gatici gatici changed the title Chore: Use Ops.CollectStatus Event chore: Use Ops.CollectStatus Event Nov 22, 2023
Signed-off-by: gatici <gulsum.atici@canonical.com>
@gatici gatici force-pushed the TELCO-881-use-ops-CollectStatus branch from 7a0a1f6 to f444049 Compare November 22, 2023 14:13
Copy link
Contributor

@dariofaccin dariofaccin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment: this may be the chance to explore a new way of handling charm statuses, like it is done in Charmed Kubeflow Profiles

src/charm.py Outdated Show resolved Hide resolved
src/charm.py Show resolved Hide resolved
src/charm.py Outdated
Comment on lines 180 to 182
if self._is_unit_in_non_active_status():
event.add_status(self._is_unit_in_non_active_status()) # type: ignore[arg-type]
return
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how expensive some of those operations are, but we shouldn't be evaluating all of this twice. Just do the following. However, you'll also want to set ActiveStatus if the above returns None:

Suggested change
if self._is_unit_in_non_active_status():
event.add_status(self._is_unit_in_non_active_status()) # type: ignore[arg-type]
return
if status := self._is_unit_in_non_active_status():
self.unit.status = status
else:
self.unit.status = ActiveStatus()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we prefer event.add_status here ? According to documentation it says "Each code path in a collect-status handler should call add_status at least once."

Copy link
Contributor Author

@gatici gatici Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use ActiveStatus in else block, we lose the states which are set by functions triggered by relation_broken events.
So, I am not setting ActiveStatus() through that hook. There is a strange situation. After a relation is broken, although we are checking the relations existence in def _is_unit_in_non_active_status, it could not catch it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the issue you're seeing is related to canonical/operator#940 or canonical/operator#888? (Which we're hoping to address this cycle -- probably earlyish in the cycle.)

Yeah, ideally if you're using collect-status you'd avoid setting blocked in the relation broken handlers and just query for the status of the relation. But perhaps due to one of the above issues that's not working? What code would you be using to check for the relation's existence?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, and to respond to your earlier comment -- sorry, that's my mistake! We should use event.add_status(...) here, oops!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good question. Until #940 is addresses (which might be a little while, so don't hold your breath) you'll probably have to use a workaround. One workaround would be to set a variable/flag on the charm instance in the relation-broken hook, and then check that flag in the collect-status handler. Something like this (completely untested):

class MyCharm(ops.CharmBase):
    def __init__(self, *args):
        super().__init__(*args)
        self.framework.observe(self.on.database_relation_broken, self._on_database_relation_broken)
        self.framework.observe(self.on.collect_unit_status, self._on_collect_unit_status)
        self._database_relation_breaking = False

    def _on_database_relation_broken(self, event):
        self._database_relation_breaking = True

    def _on_collect_unit_status(self, event):
        ...
        if not self.get_relation('database') or self._database_relation_breaking:
            event.add_status(ops.BlockedStatus('blah blah blah'))
            return
        ...

Copy link
Contributor Author

@gatici gatici Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@gatici gatici Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benhoyt I updated the PR adding a key into relation data. Unit tests are all fine but it is still failing in integration tests while detecting broken relation. Could you please have a look ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it quickly just to show you the implementation. There could be some improvement areas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented according to your suggestion. Ready to review.

src/charm.py Outdated Show resolved Hide resolved
@gatici
Copy link
Contributor Author

gatici commented Nov 23, 2023

Just a comment: this may be the chance to explore a new way of handling charm statuses, like it is done in Charmed Kubeflow Profiles

Hi Dario,
I checked the repository. It raises the error by indicating the status. When an event is triggered we know which status should be set if any error raises. By using ops.CollectStatus, the statuses are set after every hook run. We can use it to improve our status management more clear. But I do not want to use it in this PR. We can discuss it in our daily and use it separately as an improvement for ambiguous situations.

Signed-off-by: gatici <gulsum.atici@canonical.com>
@gatici gatici marked this pull request as draft November 24, 2023 00:20
@gatici gatici force-pushed the TELCO-881-use-ops-CollectStatus branch from af44289 to c66244d Compare November 24, 2023 08:59
@gatici gatici marked this pull request as ready for review November 24, 2023 09:04
@gatici gatici requested review from benhoyt, dariofaccin and a team November 24, 2023 09:05
Copy link

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me, thanks!

Hopefully once we resolve #940 you can update it to avoid the "broken" boolean variables and just query the relation status immediately.

@gatici gatici force-pushed the TELCO-881-use-ops-CollectStatus branch from c66244d to 1a6ffb3 Compare November 27, 2023 09:00
Signed-off-by: gatici <gulsum.atici@canonical.com>
@gatici gatici force-pushed the TELCO-881-use-ops-CollectStatus branch from 1a6ffb3 to 1741815 Compare November 27, 2023 12:13
Comment on lines +165 to +166
if not self.model.get_relation("database") or self._database_relation_breaking:
return BlockedStatus("Waiting for database relation")
Copy link
Contributor

@dariofaccin dariofaccin Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that self.model.get_relation("database") returns False if a relation has not been completely removed by a previous hook?
I am thinking about this case:

  1. amf and mongodb-k8s are properly related
  2. invoke remove-integration and for some reason the hook fails only on mongo side and the relation becomes stale (now amf should be in BlockedStatus)
  3. invoke any other hook

At the moment the modification in integration tests works because the relation is properly removed and get_relation returns False as expected.

Copy link
Contributor Author

@gatici gatici Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact self.model.get_relation("database") is generally returning True because of a known issue (https://bugs.launchpad.net/juju/+bug/1979811). Although the integration is removed, it still appears in the model. As self._database_relation_breaking returns True, then unit status is set to Blocked.
I also deployed the charm including this patch and tested it manually.

$ juju status --relations
Model                  Controller          Cloud/Region        Version  SLA          Timestamp
test-integration-twnt  microk8s-localhost  microk8s/localhost  3.1.6    unsupported  21:49:59+03:00

App                       Version  Status  Scale  Charm                     Channel  Rev  Address         Exposed  Message
mongodb-k8s                        active      1  mongodb-k8s               5/edge    36  10.152.183.198  no       Primary
sdcore-amf                         active      1  sdcore-amf                           0  10.152.183.22   no       
sdcore-nrf                         active      1  sdcore-nrf                edge      58  10.152.183.58   no       
self-signed-certificates           active      1  self-signed-certificates  beta      33  10.152.183.202  no       

Unit                         Workload  Agent  Address      Ports  Message
mongodb-k8s/0*               active    idle   10.1.146.19         Primary
sdcore-amf/0*                active    idle   10.1.146.7          
sdcore-nrf/0*                active    idle   10.1.146.51         
self-signed-certificates/0*  active    idle   10.1.146.15         

Integration provider                   Requirer                    Interface         Type     Message
mongodb-k8s:database                   sdcore-amf:database         mongodb_client    regular  
mongodb-k8s:database                   sdcore-nrf:database         mongodb_client    regular  
mongodb-k8s:database-peers             mongodb-k8s:database-peers  mongodb-peers     peer     
sdcore-nrf:fiveg-nrf                   sdcore-amf:fiveg-nrf        fiveg_nrf         regular  
self-signed-certificates:certificates  sdcore-amf:certificates     tls-certificates  regular  
self-signed-certificates:certificates  sdcore-nrf:certificates     tls-certificates  regular  

$ juju remove-relation mongodb-k8s:database  sdcore-amf:database 
$ juju status --relations
Model                  Controller          Cloud/Region        Version  SLA          Timestamp
test-integration-twnt  microk8s-localhost  microk8s/localhost  3.1.6    unsupported  21:50:38+03:00

App                       Version  Status   Scale  Charm                     Channel  Rev  Address         Exposed  Message
mongodb-k8s                        active       1  mongodb-k8s               5/edge    36  10.152.183.198  no       Primary
sdcore-amf                         waiting      1  sdcore-amf                           0  10.152.183.22   no       installing agent
sdcore-nrf                         active       1  sdcore-nrf                edge      58  10.152.183.58   no       
self-signed-certificates           active       1  self-signed-certificates  beta      33  10.152.183.202  no       

Unit                         Workload  Agent  Address      Ports  Message
mongodb-k8s/0*               active    idle   10.1.146.19         Primary
sdcore-amf/0*                blocked   idle   10.1.146.7          Waiting for database relation
sdcore-nrf/0*                active    idle   10.1.146.51         
self-signed-certificates/0*  active    idle   10.1.146.15         

Integration provider                   Requirer                    Interface         Type     Message
mongodb-k8s:database                   sdcore-nrf:database         mongodb_client    regular  
mongodb-k8s:database-peers             mongodb-k8s:database-peers  mongodb-peers     peer     
sdcore-nrf:fiveg-nrf                   sdcore-amf:fiveg-nrf        fiveg_nrf         regular  
self-signed-certificates:certificates  sdcore-amf:certificates     tls-certificates  regular  
self-signed-certificates:certificates  sdcore-nrf:certificates     tls-certificates  regular 

Juju debug-log:

core-amf-0: 21:49:50 INFO unit.sdcore-amf/0.juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/test-integration-twnt/services/sdcore-amf "HTTP/1.1 200 OK"
unit-sdcore-amf-0: 21:49:50 INFO unit.sdcore-amf/0.juju-log Kubernetes service 'sdcore-amf' patched successfully
unit-sdcore-amf-0: 21:49:51 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-mongodb-k8s-0: 21:50:36 INFO juju.worker.uniter.operation ran "database-relation-departed" hook (via hook dispatching script: dispatch)
unit-sdcore-amf-0: 21:50:37 INFO juju.worker.uniter.operation ran "database-relation-departed" hook (via hook dispatching script: dispatch)
unit-mongodb-k8s-0: 21:50:37 INFO unit.mongodb-k8s/0.juju-log database:4: Remove relation user: relation-4
unit-mongodb-k8s-0: 21:50:37 INFO unit.mongodb-k8s/0.juju-log database:4: Update relation user: relation-1 on free5gc
unit-mongodb-k8s-0: 21:50:37 INFO unit.mongodb-k8s/0.juju-log database:4: Updating relation data according to diff
unit-sdcore-amf-0: 21:50:37 INFO juju.worker.uniter.operation ran "database-relation-broken" hook (via hook dispatching script: dispatch)
unit-mongodb-k8s-0: 21:50:37 INFO juju.worker.uniter.operation ran "database-relation-broken" hook (via hook dispatching script: dispatch)

@@ -97,6 +104,12 @@ def __init__(self, *args):
self._database = DatabaseRequires(
self, relation_name="database", database_name=DATABASE_NAME
)
# Setting attributes to detect broken relations until
# https://github.com/canonical/operator/issues/940 is fixed
self._database_relation_breaking = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in no rush to use collect status and I feel like I'd rather wait for the ops fix instead of forcing this change now and reverting some of it when the fix is available. This is quite a large change already right before we want to promote to beta and going ahead with it makes me slightly nervous. I'd at least wait until we have our charms in beta.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will freeze the PR's till the ops issue is fixed. Thank you!

@@ -97,6 +104,12 @@ def __init__(self, *args):
self._database = DatabaseRequires(
self, relation_name="database", database_name=DATABASE_NAME
)
# Setting attributes to detect broken relations until
# https://github.com/canonical/operator/issues/940 is fixed
self._database_relation_breaking = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @benhoyt,

According to Juju documentation https://juju.is/docs/sdk/constructs#heading--framework,
The attributes such as self._database_relation_breaking should be set to their default values upon every hook execution. However, I tested it and those states are kept somehow.
I both modified the integration test and tested manually by adding additional config-changed hook to see the effect.
After relation broken, running a config-changed hook did not reset the init attribute state which conflicts with the documentation. If the documentation was correct, the integration tests should fail in this PR but they are not failing (?).
Please enlighten us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatici I am not experiencing this behaviour. I just brutally added some logs in this revision.
What I did:

  1. Deploy all required charms and properly relate everything.
  2. Remove relation between amf and mongodb-k8s (database).
  3. Run juju config amf external-amf-ip=1.1.1.1
unit-amf-0: 16:33:35 CRITICAL unit.amf/0.juju-log database:5: self._database_relation_breaking: False (class init)
unit-amf-0: 16:33:35 CRITICAL unit.amf/0.juju-log database:5: self._database_relation_breaking: True (event: <RelationBrokenEvent via AMFOperatorCharm/on/database_relation_broken[105]>)
unit-amf-0: 16:33:35 INFO juju.worker.uniter.operation ran "database-relation-broken" hook (via hook dispatching script: dispatch)
unit-amf-0: 16:33:52 CRITICAL unit.amf/0.juju-log self._database_relation_breaking: False (class init)
unit-amf-0: 16:33:52 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)

First line prints the default value of _database_relation_breaking on class init when the relation broken event is triggered.
Second line is right after we update the attribute in the same hook.
In the third line you see the hook completes.
In the fourth line the class is instatiated again (for the config-changed hook) and the attribute is reset to its default value.
In the fifth line you see the hook completes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatici First, there's a discrepancy here between Harness tests (with the same Harness instance) and reality. The charm instance isn't reinitialized between hook executions when the Harness triggers events.

This also applies to deferred, which may be what you're seeing here? Normally the charm is a completely new instance (with attributes reset by __init__) every hook execution, but if an event was deferred, the deferred events are run first against the charm instance, and then the "main" event also executed on that same charm instance (without resetting attributes).

These discrepancies are a problem that we hope to address that this cycle. See canonical/operator#736

So I wonder if the difference @dariofaccin is seeing is because there were no deferred events in that case (for whatever reason).

CC: @tonyandrewmeyer for visibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benhoyt Many thanks for your help and quick response!

@gatici
Copy link
Contributor Author

gatici commented Feb 19, 2024

The change is done in PR #84 because of merge conflicts.

@gatici gatici closed this Feb 19, 2024
@dariofaccin dariofaccin deleted the TELCO-881-use-ops-CollectStatus branch May 10, 2024 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants