Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement diff in ceph_orch_apply.py #296

Closed
wants to merge 1 commit into from

Conversation

foyerunix
Copy link

Hello,

Dynamic attributes breaking idempotency will be added when fetching the current orch spec from a running cluster. This simple PR remove these attributes before comparing the current and the expected spec and also implement diff mode. This will help users build idempotents roles and playbooks.

Best Regards.

@foyerunix foyerunix force-pushed the patch-1 branch 2 times, most recently from 7a3da03 to 3d37038 Compare May 24, 2024 06:51
@foyerunix
Copy link
Author

Hello,

I also fixed another issue where we cannot have multiples services of the same type with different names as yaml.safe_load expect to deserialize only one document.

Best Regards.

@asm0deuz
Copy link
Collaborator

asm0deuz commented May 27, 2024

Hello,

Dynamic attributes breaking idempotency will be added when fetching the current orch spec from a running cluster. This simple PR remove these attributes before comparing the current and the expected spec and also implement diff mode. This will help users build idempotents roles and playbooks.

Best Regards.

Thank you for raising this issue but I wonder if my initial code wouldn't be better https://github.com/ceph/cephadm-ansible/pull/291/files as with your solution any changes in the "dynamic attributes" will require a code change while in my case I only compare attributes coming from "expected_spec".
I don't know why at a later stage I lost track of the fact that I shouldn't compare "dynamic attributes" and just made a simple dict comparison. Feel free to share your thoughts

@foyerunix
Copy link
Author

Hello @asm0deuz,

This PR does 3 things:

  • Fix the case when we have multiples instances of the same service_type with different service_name
  • Implement diff mode
  • Ensure we don't falsely report the task as "changed" because of the "dynamic attributes" (I don't know if this is the good name for them).

As I understand your code in #291 was to fix the third issue.

I think you created this code as you saw that some attributes are implicitly added, like these one for OSDs in my case:

  filter_logic: AND
  objectstore: bluestore

With my code the user have to put back the implicitly added attributes in the playbook spec file or he get a "changed" in Ansible.

Therefore we have two choices:

  • The user is informed that some attributes were automatically added at the cost of fixing his script
  • The user don't have to fix his script but might ignore that some attributes were added

As a sysadmin (I'm not a developer), I would prefer the first. I might have to deploy a cluster with a new Ceph version or a new OS using the same Ansible playbook and variables. When doing this I would like to minimize the things that can get me to a different result than the first time. By forcing me to include the implicitly added attributes to my spec file, it ensure that a new version of Ceph or a new OS (or whatever could interfere with the Ceph configuration process) won't add or modify these attributes silently. This will prevent hours or more of debugging to find what is different between the two clusters.

Therefore maintaining a list of safe to ignore attributes is better for cluster administrator IMHO.

Best Regards.

@guits
Copy link
Collaborator

guits commented May 29, 2024

As a sysadmin (I'm not a developer), I would prefer the first. I might have to deploy a cluster with a new Ceph version or a new OS using the same Ansible playbook and variables.

In theory, this is wrong. If you deploy a cluster with a new Ceph release, you have to use the matching cephadm-ansible version. This means that if new attributes are introduced in a given Ceph release, they must be recognized by the corresponding cephadm-ansible version.

library/ceph_orch_apply.py Outdated Show resolved Hide resolved
@guits
Copy link
Collaborator

guits commented May 29, 2024

Therefore we have two choices:

  • The user is informed that some attributes were automatically added at the cost of fixing his script

From experience, I would argue that almost nobody looks at the Ansible log (which can be very verbose, by the way) as long as the playbook succeeds.

In my opinion, one of the fundamental principles of Ansible is idempotency. Therefore, both current workflow and what you suggest here don't adhere to this principle.

@foyerunix
Copy link
Author

Hello @guits,

From experience, I would argue that almost nobody looks at the Ansible log (which can be very verbose, by the way) as long as the playbook succeeds.

I don't agree with this. Nobody will look at the tasks which are "ok" or "skipping", but Ansible users will definitively look at tasks which are "changed". At least for the subsequent runs of the playbook (the first run being obviously full of "changed" tasks). This is how you fix idempotency issues, by running the playbook multiples times.

Anyway I edited my PR to only fix the error arising when we have multiples orch files of the same service_type but with different service_name. An use case for this is to have multiples orch files of host service type but with a different location.

Can we agree to merge this ?

Best Regards.

@asm0deuz
Copy link
Collaborator

asm0deuz commented Jun 4, 2024

@foyerunix Can you put a description and sign-off in your commit? Thx

There are valid use cases where we apply multiples
service specs of the same type but with a
different name to distinguish them. We need to
specify the service_name while retrieving in these
cases to avoid trying to deserialze multiples YAML
files.

Signed-off-by: Foyer Unix <foyerunix@foyer.lu>
@foyerunix
Copy link
Author

Hello @asm0deuz,

Done.

Best Regards.

@guits
Copy link
Collaborator

guits commented Jun 5, 2024

I don't agree with this. Nobody will look at the tasks which are "ok" or "skipping", but Ansible users will definitively look at tasks which are "changed".

What you said could be true for very small playbooks, but again, from experience, as soon as you start having hundreds of ansible tasks, I have a hard time imagining someone taking a look at so many 'changed' tasks.

This is how you fix idempotency issues, by running the playbook multiples times.

I think there's some confusion here. The concern is not regarding infra/config changes here. We are talking about potential code changes between two different Ceph releases which should be transparent for the user.

@foyerunix foyerunix closed this Jan 3, 2025
@foyerunix foyerunix deleted the patch-1 branch January 3, 2025 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants