Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating/Updating too many job templates in parallel seems to be too much for controller API #832

Closed
dbk-rabel opened this issue May 21, 2024 · 8 comments · Fixed by #844
Labels
bug Something isn't working enhancement New feature or request

Comments

@dbk-rabel
Copy link
Contributor

Hi.

We are using the controller_configuration to configure our controller.
Here is a snippet from our playbook:

- name: Ensure organizations, teams, applications and credentials are present
  ansible.builtin.include_role:
    name: infra.controller_configuration.dispatch
  vars:
    controller_configuration_dispatcher_roles:
      - {role: organizations, var: controller_organizations, tags: organizations}
      - {role: teams, var: controller_teams, tags: teams}
      - {role: users, var: controller_user_accounts, tags: users}
      - {role: applications, var: controller_applications, tags: applications}
      - {role: credentials, var: controller_credentials, tags: credentials}
      - {role: credential_types, var: controller_credential_types, tags: credential_types}

- name: Ensure organizations have their pah credentials attached
  when: infrastructure_privileged_mode
  ansible.builtin.include_role:
    # We cannot use the the dispatcher role here, because it overrides 'assign_galaxy_credentials_to_org'
    name: infra.controller_configuration.organizations
  vars:
    assign_galaxy_credentials_to_org: true

- name: Ensure projects, inventories, inventory sources and execution environments are present
  ansible.builtin.include_role:
    name: infra.controller_configuration.dispatch
  vars:
    controller_configuration_dispatcher_roles:
      - {role: projects, var: controller_projects, tags: projects}
      - {role: inventories, var: controller_inventories, tags: inventories}
      - {role: inventory_sources, var: controller_inventory_sources, tags: inventory_sources}
      - {role: execution_environments, var: controller_execution_environments, tags: execution_environments}

- name: Ensure inventory sources have tried to sync
  ansible.builtin.include_role:
    name: infra.controller_configuration.inventory_source_update
    apply:
      ignore_errors: true

- name: Ensure job templates, workflow job templates and roles are present
  ansible.builtin.include_role:
    name: infra.controller_configuration.dispatch
  vars:
    controller_configuration_dispatcher_roles:
      - {role: job_templates, var: controller_templates, tags: job_templates}
      - {role: workflow_job_templates, var: controller_workflows, tags: workflow_job_templates}
      - {role: roles, var: controller_roles, tags: roles}
      - {role: schedules, var: controller_schedules, tags: schedules}

- name: Ensure that the objects which are not defined in git are absent
  when: infrastructure_cleanup
  block:
    - name: Generate a list which contains all objects which should be present on the controller or deleted from the controller
      ansible.builtin.include_tasks:
        file: object_diff.yml

    - name: Ensure workflow job templates, job templates, inventory sources, inventories and projects which are not defined in git are absent
      ansible.builtin.include_role:
        name: infra.controller_configuration.dispatch
      vars:
        controller_configuration_dispatcher_roles:
          - {role: workflow_job_templates, var: controller_workflows, tags: workflow_job_templates}
          - {role: job_templates, var: controller_templates, tags: job_templates}
          - {role: inventory_sources, var: controller_inventory_sources, tags: inventory_sources}
          - {role: inventories, var: controller_inventories, tags: inventories}
          - {role: projects, var: controller_projects, tags: projects}

    - name: Ensure credentials and organizations which are not defined in git are absent
      ansible.builtin.include_role:
        name: infra.controller_configuration.dispatch
      vars:
        controller_configuration_dispatcher_roles:
          - {role: credentials, var: controller_credentials, tags: credentials}
          - {role: organizations, var: controller_organizations, tags: organizations}

This more or less works fine.

But now we seem to be hit by Errno 104 "Connection reset by peer" at async_status for job templates. (Interestingly only at the bottom after the object_diff.)

It seems to have something to do with the controller API not being able to handle so many calls at the same time. If we just delete a few job templates from controller_job_templates everything is working fine again.

It seems there is no way to slow things down a little bit, is there? Because that is what we would need.

We tried increasing controller_configuration_job_templates_async_delay , but it does not seem to work. The problem is already on the first try.

Yours
David

@dbk-rabel dbk-rabel added bug Something isn't working new New issue, this should be removed once reviewed labels May 21, 2024
@djdanielsson
Copy link
Collaborator

yes this is likely an issue with object_diff it was quickly written based off an idea and really needs to be rewritten from the ground up to fix a lot of issues with it.

@djdanielsson djdanielsson added the enhancement New feature or request label May 28, 2024
@dbk-rabel
Copy link
Contributor Author

dbk-rabel commented Jun 12, 2024

Thanks for the reply.

But I do not exactly understand what is happening.

Having too many API calls in a too short amount of time is probably the issue. But I don't see a way to slow down the execution of the controller_configurationt roles. Or is there any?

Yours
David

@djdanielsson djdanielsson removed the new New issue, this should be removed once reviewed label Jun 14, 2024
@dbk-rabel
Copy link
Contributor Author

The PR does seem to help here. Thanks a lot!

@djdanielsson
Copy link
Collaborator

were you able to test the PR changes?

@dbk-rabel
Copy link
Contributor Author

were you able to test the PR changes?

By now I just added "pause: 1" locally and it helped.

If you want, I can test the PR as a whole. But it might not be before tomorrow.

@djdanielsson
Copy link
Collaborator

yea if you are able that would be great, and you could even try .2 or something possibly to see if that is enough to fix the issue

@dbk-rabel
Copy link
Contributor Author

I will do so. Unfortunately I am not able to reproduce the issue locally at the moment. It only fails when I run the Playbook on AAP2 controller.

That means 2 things:

  1. I'm not sure anymore if the PR really fixes our issue, because this was tested locally.
  2. To properly test it, I'll have to build the colleciton and load it into the PAH of our test environment. So I'll have to do it later, because it will take a few moments ;)

David

@dbk-rabel
Copy link
Contributor Author

Ok. I was finally able to test it. I put 100 dummy Job Templates in the inventory for that. A pause of 0.2 does not seem to be enough. But 0.5 works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants