diff --git a/src/docs/ocean/ocean-aks-cloud-cluster-overview.md b/src/docs/ocean/ocean-aks-cloud-cluster-overview.md index 1a60c6cda..a7bb2edfc 100644 --- a/src/docs/ocean/ocean-aks-cloud-cluster-overview.md +++ b/src/docs/ocean/ocean-aks-cloud-cluster-overview.md @@ -3,7 +3,8 @@ The Cloud Cluster Overview dashboard provides enhanced functionality to analyze the Ocean Autoscaler actions with high granularity and gain deeper insights into its behavior. This topic delves into this dashboard's various components and sections, offering a detailed exploration of its capabilities. -Ocean continuously analyzes the utilization of your nodes in the cloud infrastructure. It automatically scales compute resources to optimize utilization and availability. It achieves this by intelligently combining spot, reserved, and regular compute instances. +Ocean continuously analyzes the utilization of your nodes in the cloud infrastructure. It automatically scales compute resources to optimize utilization and availability. +It achieves this by intelligently combining spot, reserved, and regular compute instances. * Ocean Savings panel shows the amount of money, CPU, memory (GiB/TiB), and GPU compute resources saved when you utilize Ocean to manage your Kubernetes cluster. Specifically, these are savings from running spot instances, bin packing, and reverting to lower-cost nodes. * The Ocean Managed Nodes and Resources panel shows information about your Ocean-managed and unmanaged nodes and your managed CPU, memory, and GPU resources. @@ -18,12 +19,13 @@ To access the Ocean Cluster Overview dashboard: ## Ocean Savings Panel -![savings-tab-sample2](https://github.com/user-attachments/assets/d36f3ea4-215a-4eb1-bad3-a6acdd3533f6) +![aks-run-on-spot](https://github.com/user-attachments/assets/a2576d2c-1cf3-4d4e-b92b-ca8aef0fff03) The Ocean Savings panel contains a set of savings widgets (displayed as tabs), which show your savings according to Ocean’s main autoscaling processes for cluster optimization in a selected time range: * Running on Spot: Savings from running on spot nodes instead of OD nodes. * Bin Packing: Ocean proactively identifies underutilized nodes and efficiently bin-packs the pods on them to scale down the nodes and reduce the cluster cost. +* Revert to Lower Cost: Applied to nodes with underutilized compute resources that cannot be scaled down from the cluster's set of nodes. In this panel: @@ -55,7 +57,7 @@ This process ensures high resource utilization, reducing the number of nodes req To view these savings, click the **Bin Packing** tab (unless already displayed). -![bin-packing-tab](https://github.com/user-attachments/assets/9f1de767-b7c3-4336-9f16-bc150a914397) +![aks-bin-packing](https://github.com/user-attachments/assets/3ac56cac-9637-4806-865f-a2ae4771622c) This tab displays: @@ -69,10 +71,14 @@ This tab displays: * Lifecycle: (regular, Savings Plans, Reserved Instances, and spots). * Scale-Down Timestamp, for example, 06/25/2023 09:23:15 -
- View image +
+ View image... + +
- + + +
@@ -80,10 +86,53 @@ This tab displays: * CPU resources saved in vCPU Hours. * Memory resources saved in GiB Hours. * GPU resources saved in GPU Hours. + + ## Ocean Savings from Reverting to a Lower Cost Node + +This process is applied to nodes with underutilized compute resources that cannot be scaled down from the cluster's set of nodes. For example, suppose a pod was initially placed on a more expensive node due to resource constraints. In that case, Ocean Autoscaler can replace it with a less expensive node when it becomes available, consequently saving costs. This tab lets you see how much you save from this dynamic resource allocation. + +Ocean savings for reverting to a lower-cost node are calculated from the difference in price between the old node and the new node. For more information, see [Revert to Lower Cost](https://docs.spot.io/ocean/features/revert-to-lower-cost-node?id=revert-to-lower-cost-node) Node. + +To view these savings, click the **Revert to Lower Cost** tab (unless already displayed). + +![aks-rev-to-lower-cost](https://github.com/user-attachments/assets/d6dd4598-7e9e-4348-a98a-e68fb0fdd478) + +This tab displays: + +* Nodes reverted to lower cost: If one or more nodes have been reverted to lower cost, the number of reverted nodes appears. Click to view the number of nodes reverted to lower cost in the selected time range in the Revert to Lower Cost window. View the types and costs of the nodes, etc). + +* Avg. percentage hourly cost saved: Avg. percentage cost reduction from reverting to the new VM type [SUM (% hourly cost saved)]. + + * Pie chart - Virtual Node Group percentage breakdown for nodes. + + List showing replacement information: an entry for each reverted node is listed with these details: + + * Node pools for the original and reverted nodes. + * Number nodes in the old and new node pools. + * Hourly cost of the original and reverted nodes. Displays the total cost of all nodes in the same node pool: [nodes count * hourly cost]. + * Hourly cost saved as a percentage: Cost reduction from converting to the new VM type: [old hourly cost - SUM (new hourly cost) * 100]. + * The Virtual Node Group's name (click the link for a listed VNG to display your custom VNG details). + * Scale-down timestamp in format MM/DD/YYYY HH: MM: SS + +
+ View image... + +
+ + + + +
+
+ +* Resource savings from reverting to lower cost in the following units: + * CPU resources saved in vCPU Hours. + * Memory resources saved in GiB Hours. + * GPU resources saved in GPU Hours. ## Ocean Managed Nodes and Resources Panel -![managed-nodes-resources](https://github.com/user-attachments/assets/3b5a9bb0-b3d2-4b75-b510-88f5de04afbc) + This panel contains a set of widgets that display categorized information on your managed nodes and resources. @@ -243,7 +292,7 @@ Total Allocation Calculation: ## Resource Allocation Panel -![resource-allocation-sample](https://github.com/user-attachments/assets/7aa25b61-8260-4aa4-a122-274786d1ef15) + This panel displays a **cluster-level** summary with widgets for CPU /Memory /GPU resources allocated to pods. You can review allocation trends over time. Use this information to verify that infrastructure utilization is maintained at 70-80%. * CPU @@ -264,10 +313,14 @@ To view more details: > * % Workload allocation. > * % Total allocation (Including headroom). -
- View image +
+ Click to view image + +
+ + -![nodes-tab](https://github.com/user-attachments/assets/75ec851a-71c0-4170-8c78-8361f3944f16) +