Skip to content

Commit

Permalink
add manifests
Browse files Browse the repository at this point in the history
  • Loading branch information
samos123 committed Nov 2, 2024
1 parent ecccafa commit da2f09c
Showing 1 changed file with 24 additions and 0 deletions.
24 changes: 24 additions & 0 deletions manifests/models/llama-3.1-70b-instruct-fp8-gh200.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Source: models/templates/models.yaml
apiVersion: kubeai.org/v1
kind: Model
metadata:
name: llama-3.1-70b-instruct-fp8-gh200
spec:
features: [TextGeneration]
owner:
url: hf://neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8
engine: VLLM
args:
- --max-model-len=32768
- --max-num-batched-token=32768
- --max-num-seqs=512
- --gpu-memory-utilization=0.9
- --enable-prefix-caching
- --enable-chunked-prefill
- --disable-log-requests
- --kv-cache-dtype=fp8
- --enforce-eager
env:
VLLM_ATTENTION_BACKEND: FLASHINFER
targetRequests: 512
resourceProfile: nvidia-gpu-gh200:1

0 comments on commit da2f09c

Please sign in to comment.