-
Notifications
You must be signed in to change notification settings - Fork 26
Development notes
What information is required as input to the cluster/nodes.
Groups:
login
compute
control
Group/host vars:
Odd things
- For smslabs, control node needs to know login private IP because
openondemand_servername
is defined using it in group_vars/all/openondemand.yml as we use SOCKS proxy to access. But generally,grafana
(default: control) will need to know openondemand (default: login) external address.
Full list for everything
cluster:
-
openhpc_cluster_name
: Cluster name. NO DEFAULT, REQUIRED. Required for allopenhpc
hosts here, but I think this is over-broad: actually probably only control host & db host ( openhpc.enable=database) should require it. NB: slurm.conf templating assumes this is only done on a single controller! -
openhpc_slurm_control_host
: Slurmctld address. Default in common:all:openhpc ={{ groups['control'] | first }}
.-
NB: maybe should use
.internal_address
? - Host requirements & comments as above.
- Note Slurm assumes slurmdbd and slurm.conf are in same directory, how does this work configless?
-
NB: maybe should use
-
openhpc_slurm_partitions
: Partition definitions. Default in common:all:openhpc is single 'compute' partition. NB: requires group"{{ openhpc_cluster_name }}_compute"
in environment inventory. Could check groups during validation??- Host requirements & comments as above (but for control only)
-
nfs_server
. Default in common:all:nfs isnfs_server_default
->"{{ hostvars[groups['control'] | first ].internal_address }}"
. -
elasticsearch_address
: Default in common:all:defaults is{{ hostvars[groups['opendistro'].0].api_address }}
-
prometheus_address
: Default in common:all:defaults is{{ hostvars[groups['prometheus'].0].api_address }}
-
openondemand_address
: Default in common:all:defaults is{{ hostvars[groups['openondemand'].0].api_address if groups['openondemand'] | count > 0 else '' }}
-
All the secrets in environment:all:secrets - see secret role's defaults:
- grafana, elasticsearch, mysql (x2) passwords (all potentially depending on group placement)
- munge key (for all openhpc nodes)
Which roles can we ONLY run the install tasks from, to build a cluster-independent(*)/no-config image?
In-appliance roles:
- basic_users: n/a
- block_devices: n/a
- filebeat: n/a but downloads Docker container at service start)
- grafana-dashboards: Downloads grafana dashboards
- grafana-datasources: n/a
- hpctests: n/a but reqd. packages are installed as part of
openhpc_default_packages
. - opendistro: n/a but downloads Docker container at service start.
- openondemand:
-
main.yml
unnamed task does rpm installs using osc.ood:install-rpm.yml -
main.yml
unnamed task does rpm installs using pam_auth.yml. -
main.yml
[unnamed task] does git downloads using osc.ood:install-apps.yml -
jupyter_compute.yml
: Does package installs -
vnc_compute.yml
: Does package installs
-
- passwords: n/a
- podman:
prereqs.yml
Does package installs
Out of appliance roles:
- stackhpc.nfs: [main.yml(https://github.com/stackhpc/ansible-role-cluster-nfs/blob/master/tasks/main.yml) installs packages.
- stackhpc.openhpc: Required and
openhpc_packages
(see above) installed in install.yml but requiresopenhpc_slurm_service
fact set frommain.yml
. - cloudalchemy.node_exporter:
-
install.yml does binary download from github but also propagation. Could pre-download it and use
node_exporter_binary_local_dir
but install.yml still needs running as it does user creation too. - selinux.yml also does package installations
-
install.yml does binary download from github but also propagation. Could pre-download it and use
- cloudalchemy.blackbox-exporter: Currently unused.
- cloudalchemy.prometheus: install.yml. Same comments as for
cloudalchemy.node_exporter
above. - cloudalchemy.alertmanager: Currently unused.
- cloudalchemy.grafana: install.yml does package updates.
- geerlingguy.mysql: setup-RedHat.yml does package updates BUT needs variables.yml running to load appropriate variables.
- jriguera.configdrive: Unused, should be deleted.
- osc.ood: See
openondemand
above.
- It's not really cluster-independent as which features are turned on where may vary.