Benchmark Guide

Contents

Benchmark Guide#

Use benchmarks/run_benchmark.sh to run a throughput/latency benchmark against a deployed Llumnix gateway. The script submits a Kubernetes Job, waits for the pod to be scheduled, and prints the commands needed to retrieve results.

Quick Start#

cd benchmarks/
./run_benchmark.sh -n <namespace>

# Example
./run_benchmark.sh -n llumnix

By default the script runs:

vllm bench serve \
  --base-url http://gateway:8089 \
  --model Qwen/Qwen2.5-7B \
  --num-prompts 100 \
  --ready-check-timeout-sec 0 \
  --request-rate 5 \
  --save-result \
  --save-detailed \
  --result-dir /tmp/benchmark-result

Options#

Option	Description	Default
`-n`, `--namespace`	Target Kubernetes namespace (required)	—
`-i`, `--image`	Benchmark container image	`llumnix-registry.cn-beijing.cr.aliyuncs.com/llumnix/vllm:20260130-105854`
`-j`, `--job-name`	Kubernetes Job name	`llumnix-benchmark`
`-c`, `--command`	Full benchmark command to run inside the pod	see above
`-t`, `--ttl`	Seconds before the job is auto-deleted after failure	`3600`

Retrieving Results#

After the benchmark succeeds, the pod sleeps indefinitely so you can copy results at any time. The script prints the exact command with the actual pod name, for example:

# Copy the entire result directory to local
kubectl cp llumnix-benchmark-fxfx9:/tmp/benchmark-result ./llumnix-benchmark-fxfx9 -n llumnix

# Delete the job once you are done
kubectl delete job llumnix-benchmark -n llumnix

The result directory (/tmp/benchmark-result/) contains:

File	Description
`benchmark-log.txt`	Full stdout/stderr of the benchmark run
`*.json`	Result files produced by `--save-result` / `--save-detailed`

On failure the pod exits immediately and the job is auto-deleted after --ttl seconds.

Custom Commands#

Pass a custom benchmark command with -c. If you use --save-result or --save-detailed, always set --result-dir /tmp/benchmark-result so the output files are placed alongside the log and included in the kubectl cp:

./run_benchmark.sh -n llumnix \
  --command "vllm bench serve \
    --base-url http://gateway:8089 \
    --model Qwen/Qwen2.5-7B \
    --num-prompts 500 \
    --request-rate 10 \
    --save-result \
    --save-detailed \
    --result-dir /tmp/benchmark-result"

Useful Commands#

# Check job status
kubectl get job llumnix-benchmark -n llumnix

# Follow live logs
kubectl logs -f job/llumnix-benchmark -n llumnix

# Delete job manually
kubectl delete job llumnix-benchmark -n llumnix