Skip to content

Development notes

Steve Brasier edited this page Apr 19, 2022 · 28 revisions

Interfaces

What information is required as input to the cluster/nodes.

Groups:

  • login
  • compute
  • control

Group/host vars:

Odd things

  • For smslabs, control node needs to know login private IP because openondemand_servername is defined using it in group_vars/all/openondemand.yml as we use SOCKS proxy to access. But generally, grafana (default: control) will need to know openondemand (default: login) external address.

Full list for everything cluster:

  • openhpc_cluster_name: Cluster name. NO DEFAULT, REQUIRED. Required for all openhpc hosts here, but I think this is over-broad: actually probably only control host & db host ( openhpc.enable=database) should require it. NB: slurm.conf templating assumes this is only done on a single controller!

  • openhpc_slurm_control_host: Slurmctld address. Default in common:all:openhpc = {{ groups['control'] | first }}.

    • NB: maybe should use .internal_address?
    • Host requirements & comments as above.
    • Note Slurm assumes slurmdbd and slurm.conf are in same directory, how does this work configless?
  • openhpc_slurm_partitions: Partition definitions. Default in common:all:openhpc is single 'compute' partition. NB: requires group "{{ openhpc_cluster_name }}_compute" in environment inventory. Could check groups during validation??

    • Host requirements & comments as above (but for control only)
  • nfs_server. Default in common:all:nfs is nfs_server_default -> "{{ hostvars[groups['control'] | first ].internal_address }}".

  • elasticsearch_address: Default in common:all:defaults is {{ hostvars[groups['opendistro'].0].api_address }}

  • prometheus_address: Default in common:all:defaults is {{ hostvars[groups['prometheus'].0].api_address }}

  • openondemand_address: Default in common:all:defaults is {{ hostvars[groups['openondemand'].0].api_address if groups['openondemand'] | count > 0 else '' }}

  • All the secrets in environment:all:secrets - see secret role's defaults:

    • grafana, elasticsearch, mysql (x2) passwords (all potentially depending on group placement)
    • munge key (for all openhpc nodes)

Running install tasks only

Which roles can we ONLY run the install tasks from, to build a cluster-independent(*)/no-config image?

In-appliance roles:

  • basic_users: n/a
  • block_devices: n/a
  • filebeat: n/a but downloads Docker container at service start)
  • grafana-dashboards: Downloads grafana dashboards
  • grafana-datasources: n/a
  • hpctests: n/a but reqd. packages are installed as part of openhpc_default_packages.
  • opendistro: n/a but downloads Docker container at service start.
  • openondemand:
  • passwords: n/a
  • podman: prereqs.yml Does package installs

Out of appliance roles:

  • stackhpc.nfs: [main.yml(https://github.com/stackhpc/ansible-role-cluster-nfs/blob/master/tasks/main.yml) installs packages.
  • stackhpc.openhpc: Required and openhpc_packages (see above) installed in install.yml but requires openhpc_slurm_service fact set from main.yml.
  • cloudalchemy.node_exporter:
    • install.yml does binary download from github but also propagation. Could pre-download it and use node_exporter_binary_local_dir but install.yml still needs running as it does user creation too.
    • selinux.yml also does package installations
  • cloudalchemy.blackbox-exporter: Currently unused.
  • cloudalchemy.prometheus: install.yml. Same comments as for cloudalchemy.node_exporter above.
  • cloudalchemy.alertmanager: Currently unused.
  • cloudalchemy.grafana: install.yml does package updates.
  • geerlingguy.mysql: setup-RedHat.yml does package updates BUT needs variables.yml running to load appropriate variables.
  • jriguera.configdrive: Unused, should be deleted.
  • osc.ood: See openondemand above.
  • It's not really cluster-independent as which features are turned on where may vary.
Clone this wiki locally