You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:0A:00.0 Off | Off |
| 0% 37C P8 13W / 450W | 1174MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
Information
Docker + cli
pip + cli
pip + usage of Python interface
Tasks
An officially supported CLI command
My own modifications
Reproduction
embedded around 13k documents, waited 6 hours after the embedding batches, the memory GPU memory is still allocated to the process and not freed by embedding memory artifact (not talking about the model itself).
is there a way to free the memory outside of restarting the container ?
Memory just after the Container Start
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:0A:00.0 Off | Off |
| 0% 39C P2 73W / 450W | 1174MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 128020 C /app/.venv/bin/python 1164MiB |
+-----------------------------------------------------------------------------------------+
Memory 6 hours after the encoding batch is completed
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:0A:00.0 Off | Off |
| 0% 37C P8 13W / 450W | 10516MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1931 C /app/.venv/bin/python 10506MiB |
+-----------------------------------------------------------------------------------------+
The text was updated successfully, but these errors were encountered:
Aelentel
changed the title
GPU Memory is not freed after embdding operations
GPU Memory is not freed after embedding operations
Jan 14, 2025
According to the linked issue, power of 2 and multiple of 8, I'll try 16 for the batch size.
However the memory usage is still problematic, the model itself is around 1.2 GB when loaded (as saw in the startup nvidia-smi) is there a way to cleanup the used memory after heavy usage ?.
Maybe restart/respawn the model inference process ? Can it be done via api ? Or is that a problem at a lower level (Cuda or torch)
Give or take the container get ready to process in less than a minute, so I think I can implement a restart after batching, but a less forcefull solution would help long term running and availability in production.
Note that by freeing memory, I talk about the inferred vectors, not the memory taken by the model.
System Info
Infinity docker image : michaelf34/infinity:0.0.74
Docker compose command and deploy parts
GPU Card
Information
Tasks
Reproduction
embedded around 13k documents, waited 6 hours after the embedding batches, the memory GPU memory is still allocated to the process and not freed by embedding memory artifact (not talking about the model itself).
is there a way to free the memory outside of restarting the container ?
Memory just after the Container Start
Memory 6 hours after the encoding batch is completed
The text was updated successfully, but these errors were encountered: