Update 2_parallel_simulations.ipynb

amazon-braket · rmshaffer · Dec 2, 2024 · Sep 12, 2024 · Sep 26, 2024 · Oct 9, 2024
commit 1c07093bd26e2a658f3d4bf326f38dae8a88bfc3
diff --git a/examples/cuda_quantum/byoc_job/2_parallel_simulations.ipynb b/examples/cuda_quantum/byoc_job/2_parallel_simulations.ipynb
@@ -7,9 +7,9 @@
    "source": [
     "# Parallel simulations on multiple GPUs\n",
     "\n",
-    "For a circuit with a large qubit count, a single GPU may not be able to host a statevector due to the memory constraint. CUDA-Q supports distribution of a statevector over multiple GPUs and nodes. In addition, many quantum algorithms require sampling a batch of circuits. For example, evaluating a Hamiltonian requires evaluating many terms of the Hamiltonian. For variational algorithms, it often requires sampling a parametric circuit with many different sets of parameters. For error mitigation algorithms, it often requires sampling a large number of unrelated circuits. \n",
+    "Many quantum algorithms require sampling a batch of circuits and observables. For example, evaluating a Hamiltonian requires evaluating many terms of the Hamiltonian. For variational algorithms, it often requires sampling a parametric circuit with many different sets of parameters. For error mitigation algorithms, it often requires sampling a large number of unrelated circuits. \n",
     "\n",
-    "In this notebook, you will learn how to use parallelization to tackle these challenges. With CUDA-Q and Braket Jobs, simulations of statevectors and circuit batches can be parallelized over multiple GPUs.\n"
+    "In this notebook, you will learn how to use parallelization to tackle these challenges. With CUDA-Q and Braket Jobs, the simulation of a batch of observables and circuits can be parallelized over multiple GPUs.\n"
    ]
   },
   {
@@ -135,9 +135,9 @@
    "id": "083df36f-8ec4-468d-8e68-36c1fc1f4f5c",
    "metadata": {},
    "source": [
-    "## Distribution of a statevector over multiple GPUs\n",
+    "## Parallelize the simulation of a batch of observables\n",
     "\n",
-    "Let's tackle the same problem again. But this time, we will run the simulation on multiple GPUs across multiple nodes using the [MPI interface](https://nvidia.github.io/cuda-quantum/latest/using/install/data_center_install.html#mpi). To do so, you add the keyword argument `execution=cudaq.parallel.mpi` to the `cudaq.observe()` call. With this keyword argument, CUDA-Q will distribute the simulation over the GPUs.\n",
+    "Let's tackle the same problem again. But this time, we will run the simulation on multiple GPUs across multiple nodes using the [MPI interface](https://nvidia.github.io/cuda-quantum/latest/using/install/data_center_install.html#mpi). To do so, you add the keyword argument `execution=cudaq.parallel.mpi` to the `cudaq.observe()` call. With this keyword argument, CUDA-Q will distribute the simulation over the GPUs available in a job.\n",
     "\n",
     "In order for CUDA-Q to distribute the simulation, there are some prerequisites. First, the job needs to run on instances that have many GPUs. To achieve this, you can specify the instance type that has multiple GPUs (e.g., ml.p3.8xlarge). If the number of GPUs on a single instance is not enough, you can extend the parallelization to multiple nodes by specifying `instanceCount` being larger than 1. Then, you need to add a hyperparameter `sagemaker_mpi_enabled=True` to the job which will initialize the job environment to support parallelization with MPI. Next, you need to select a CUDA-Q backend that supports distribution (e.g., `nvidia` backend with the `mqpu` option). Finally, you need to initialize the MPI interface in your CUDA-Q code. The code snippet below provides example of all these steps."
    ]
@@ -244,7 +244,7 @@
    "id": "08ab1628-4622-4e24-8cdd-f4a686b05e5a",
    "metadata": {},
    "source": [
-    "In this example, the circuit batch is formed by a single parametric circuit with many different sets of parameters. To assign a parameter set to a particular GPU, you can use the `qpu_id` keyword in the `cudaq.observe_async()` call. For example, to assign a simulation to GPU with rank 5, you set `qpu_id=5`."
+    "In this example, the circuit batch is formed by a single parametric circuit with many different sets of parameters. To assign a parameter set to a particular GPU, you can use the `qpu_id` keyword in the `cudaq.observe_async()` call. For example, to assign a simulation to GPU with rank 5, you set `qpu_id=5`. "
    ]
   },
   {
@@ -328,6 +328,27 @@
     "print(\"Job ARN: \", parallel_batch_job.arn)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4a30ffdb-dbc5-4000-af8e-c63dd713d153",
+   "metadata": {},
+   "source": [
+    "Currently, `observe_async()` only supports distribution over GPUs on the same node, so the `qpu_id` needs to be econsistent with the number of GPUs of a single instance used in the job. However, if you wish to distribution the circuit batch over multiple nodes, you can manually assign different circuit batch to different node with the following MPI logic:\n",
+    "```\n",
+    "ngpu_per_node = ... # number of gpus per node\n",
+    "circuit_batch_0 = ... # circuit batch for node 0\n",
+    "circuit_batch_1 = ... # circuit batch for node 1\n",
+    "\n",
+    "if cudaq.mpi.rank()//ngpu_per_node==0:\n",
+    "    for circuit in circuit_batch_0:\n",
+    "        cudaq.observe_async(circuit, hamiltonian, shots_count=n_shots, qpu_id=qpu_id)\n",
+    "if cudaq.mpi.rank()//ngpu_per_node==1:\n",
+    "    for circuit in circuit_batch_1:\n",
+    "        cudaq.observe_async(circuit, hamiltonian, shots_count=n_shots, qpu_id=qpu_id)\n",
+    "    \n",
+    "```"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "18004fe8-24a0-4316-ab0a-a4e0aac6ba1e",