Skip to content
Jarvist Moore Frost edited this page May 17, 2024 · 2 revisions

Imperial HX1

DO NOT FORGET

You MUST submit to the HX1 queue, otherwise your jobs will queue and then evaporate.

i.e. qsub -q hx mace_foundation_test.qsub

Background

HX1 is our new (2024) cluster at Imperial. It is a thing of beauty and a joy to behold.

They are currently restricting access to people who know what they are doing, so the shared file system is not being overloaded, and generally the queues are short. It is therefore a thing of beauty and a joy to work on.

https://icl-rcs-user-guide.readthedocs.io/en/latest/hpc/pilot/hx1/

Specs:

  • Compute nodes: Lenovo SD630v2 servers each with 2 x Intel Xeon Platinum 8358 (Ice Lake) 2.60GHz 32-core processors; 64 cores per node; 288 nodes; 18,432 compute cores; 512 GB RAM per node
  • GPU nodes: Lenovo servers each with 4 x NVIDIA A100 80 GB RAM GPUs; 2 x Intel Xeon Platinum 8360Y (Ice Lake) 2.40GHz 36-core processors; 1 TB RAM per node; 15 nodes; 60 GPUs in total
  • Interconnect: Mellanox ConnectX-6 HDR200 (200 Gbit/s) InfiniBand
  • Storage: 2 PB IBM Spectrum Scale (GPFS) on Lenovo DSS-G Storage system. Approximately 300TB of additional solid stage storage

MACE on HX1

# Jarvist Moore Frost - build MACE on HX1
# 16th May 2024 - first version

module load PyTorch/1.12.1-foss-2022a-CUDA-11.7.0

eval "$(~/miniconda3/bin/conda shell.bash hook)"

# now adapted from https://mace-docs.readthedocs.io/en/latest/guide/installation.html

conda create --name mace
conda activate mace

# (optional) Install MACE's dependencies from Conda as well
conda config --add channels conda-forge
conda config --set channel_priority strict

conda install numpy scipy matplotlib ase opt_einsum prettytable pandas e3nn

# Clone and install MACE (and all required packages)
git clone https://github.com/ACEsuit/mace.git
pip install ./mace
Clone this wiki locally