Batch Inference#
Llumnix LLM Gateway supports OpenAI’s Batch API (i.e., batch inference), following the OpenAI interface specification. Batch inference processes large volumes of inference requests asynchronously, making it ideal for running during idle periods to fully utilize machine resources.
The workflow is as follows:
Upload a file. Upload a JSONL file containing multiple inference requests.
Create a batch task. Create a batch task using the uploaded input file. LLM Gateway automatically shards the input file and processes multiple shards in parallel, sending inference requests to the inference service and recording the results.
Wait for the batch task to complete. You can check the task status through the query API.
Download the results. Once the task is completed, an output file and an error file (if any) are generated. You can retrieve the results through the file API.
Architecture#
graph LR
Client[Client]
subgraph Gateway["LLM Gateway"]
FileAPI[File API]
BatchAPI[Batch API]
Reactor[Shard & Parallel Dispatch]
InferenceAPI[Inference API]
end
OSS[(OSS)]
Redis[(Redis)]
Scheduler[Scheduler]
Engine1[Inference Engine]
Engine2[Inference Engine]
Client -->|"1. Upload file"| FileAPI
FileAPI -->|store| OSS
Client -->|"2. Create batch"| BatchAPI
BatchAPI -->|read/write metadata| Redis
BatchAPI --> Reactor
Reactor -->|"3. HTTP loopback<br/>(127.0.0.1)"| InferenceAPI
InferenceAPI --> Scheduler
Scheduler --> Engine1
Scheduler --> Engine2
Reactor -->|"4. Write results"| OSS
Client -->|"5. Download results"| FileAPI
FileAPI -->|read| OSS
Service Deployment Example#
To enable Batch API support in LLM Gateway, you need:
An OSS bucket for storing batch files (input, output, and temporary files), with valid access credentials.
A Redis instance for storing batch metadata.
A scheduler and inference engine (e.g., vLLM) deployed alongside the gateway.
A complete deployment example is provided under deploy/batch-inference/. It consists of the following components:
Component |
Description |
|---|---|
|
Redis instance for service discovery and batch metadata storage. |
|
Scheduler for load-balanced request routing. |
|
vLLM inference engine (the actual LLM serving backend). |
|
LLM Gateway with Batch API enabled. |
Gateway Configuration (gateway.yaml)#
The gateway is the core component for Batch API. Below are the key configuration parameters. Fields marked # required must be filled in by the user.
args:
- "--port"
- "8089"
- "--scheduler-endpoints"
- "scheduler:8088"
- "--llm-backend-discovery"
- "redis"
- "--scheduler-discovery"
- "endpoints"
- "--discovery-redis-host"
- "redis"
- "--discovery-redis-port"
- "6379"
- "--scheduling-policy"
- "load-balance"
- "--tokenizer-path"
- "/tokenizers/Qwen/Qwen3-30B-A3B-FP8"
- "--max-model-len"
- "40960"
- "--enable-full-mode-scheduling=false"
# Batch API specific configuration
- "--batch-oss-path"
- "" # required: OSS bucket path, e.g. "my-bucket/batch-prefix"
- "--batch-oss-endpoint"
- "" # required: OSS endpoint, e.g. "oss-cn-beijing.aliyuncs.com"
- "--batch-redis-addrs"
- "redis:6379" # required: Redis address for batch metadata
- "--batch-redis-username"
- "" # required: Redis username (empty string if no auth)
- "--batch-redis-password"
- "" # required: Redis password (empty string if no auth)
env:
- name: "OSS_ACCESS_KEY_ID"
value: "" # required: your OSS access key ID
- name: "OSS_ACCESS_KEY_SECRET"
value: "" # required: your OSS access key secret
Inference Engine Configuration (neutral.yaml)#
The inference engine runs the actual model. Make sure the model name matches the gateway’s --tokenizer-path:
CUDA_VISIBLE_DEVICES=$i vllm serve \
Qwen/Qwen3-30B-A3B-FP8 \ # model name
--port $PORT \
--trust-remote-code \
--gpu-memory-utilization 0.7 \
...
Deploying#
Deploy all components using kustomize:
cd deploy/
bash group_deploy.sh batch-inference-test batch-inference
Deployment Parameters#
Batch API Parameters (Gateway)#
Parameter |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
OSS bucket path for storing batch files (input, output, and temporary files). |
|
Yes |
- |
OSS endpoint URL. |
|
Yes |
|
Redis address for batch metadata storage. |
|
Yes |
|
Redis username. |
|
Yes |
|
Redis password. |
|
No |
8 |
Concurrency for processing batch shards. |
|
No |
1000 |
Maximum number of lines per shard. |
|
No |
3m |
Timeout for a single request (Go duration format). |
|
No |
3 |
Retry count for a single request. |
|
No |
0 |
Override the model max length from |
Environment Variables (Gateway)#
Variable |
Required |
Description |
|---|---|---|
|
Yes |
Access key ID for OSS authentication. |
|
Yes |
Access key secret for OSS authentication. |
Usage Guide#
Before using the Batch API, you need to have the service endpoint URL, access credentials, and the input file for the batch task ready.
Refer to the OpenAI batch usage workflow:
1. Upload the input file via the File API#
curl -s "<YOUR_GATEWAY_URL>/v1/files" \
-H "Authorization: Bearer <YOUR_TOKEN>" \
-F purpose="batch" \
-F file="@input.jsonl"
Example input.jsonl:
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "Qwen3-VL-2B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
...
Response Example
{
"bytes": 564813,
"created_at": 1765868482,
"filename": "input.jsonl",
"id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"object": "file",
"purpose": "batch"
}
The file will be stored at oss://your-bucket/path/to/prefix/batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b
2. Create a Batch using the file ID#
curl -s "<YOUR_GATEWAY_URL>/v1/batches" \
-H "Authorization: Bearer <YOUR_TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"endpoint": "/v1/chat/completions",
"completion_window": "24h"
}'
Response Example
{
"id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
"object": "batch",
"endpoint": "/v1/chat/completions",
"model": "",
"errors": null,
"input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"completion_window": "24h",
"status": "pending",
"output_file_id": null,
"error_file_id": null,
"created_at": 1765868672,
"in_progress_at": null,
"expires_at": 1765955072,
"finalizing_at": null,
"completed_at": null,
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": {
"total": 0,
"completed": 0,
"failed": 0
},
"usage": null,
"metadata": null
}
3. Query Batch Status#
curl -s "<YOUR_GATEWAY_URL>/v1/batches/batch_98d4d6e3-c7ec-4aa9-969e-fb8531059523" \
-H "Authorization: Bearer <YOUR_TOKEN>" \
-H "Content-Type: application/json"
Response Example
{
"id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
"object": "batch",
"endpoint": "/v1/chat/completions",
"model": "",
"errors": null,
"input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"completion_window": "24h",
"status": "completed",
"output_file_id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
"error_file_id": null,
"created_at": 1765868672,
"in_progress_at": 1765868674,
"expires_at": 1765955072,
"finalizing_at": 1765868740,
"completed_at": 1765868741,
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": {
"total": 2160,
"completed": 2160,
"failed": 0
},
"usage": null,
"metadata": {}
}
4. Retrieve Batch Results#
When the batch status is completed, you can retrieve the results via the File API. The file ID is the batch’s output_file_id, and the actual location is oss://your-bucket/path/to/prefix/batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d
curl -s "<YOUR_GATEWAY_URL>/v1/files/batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d/content" \
-H "Authorization: Bearer <YOUR_TOKEN>" > output.jsonl
Example output.jsonl content:
{"id":"batch_5f968571-b0b6-413f-a2a8-69bf750112af","custom_id":"request-1","response":{"status_code":200,"request_id":"282f82b5-577a-44f3-9bf7-a17522ac7d1c","body":{"id":"chatcmpl-282f82b5-577a-44f3-9bf7-a17522ac7d1c","object":"chat.completion","created":1765868675,"model":"Qwen3-VL-2B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! How can I assist you today?","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":22,"total_tokens":32,"completion_tokens":10,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}}}
...
Batch Task Statuses#
Status List#
Status |
Description |
Phase |
Actions |
|---|---|---|---|
pending |
Pending: Waiting to be processed. |
Preparation |
Can be cancelled. |
validating |
Validating: Checking the input file format and validating content parameters. |
Preparation |
Can be cancelled (if validation fails, the task transitions to |
failed |
Failed: Input file or task parameter validation failed; the task cannot start processing. |
Terminal (Failure) |
No actions available. |
in_progress |
In Progress: The task has been picked up and is executing batch requests. |
Processing |
Can be cancelled. |
finalizing |
Finalizing: All shards have been processed; waiting for aggregation. |
Terminal (Processing) |
No actions available. |
finalize |
Finalize: Aggregating results and generating output files. |
Terminal (Completed) |
No actions available. |
completed |
Completed: All requests have been processed; the output file is available for download. |
Terminal (Success) |
No actions available. |
cancelling |
Cancelling: The user has requested cancellation; the system is stopping in-progress requests. |
Terminal (Cancelling) |
No actions available. |
cancelled |
Cancelled: The task was successfully cancelled before completion. |
Terminal (Cancelled) |
No actions available. |
expired |
Expired: The task did not complete within the specified time window; the system has automatically terminated it. |
Terminal (Failure) |
No actions available. |
API Reference#
The LLM Gateway Batch APIs are largely consistent with the OpenAI specification. The following APIs are supported:
Batch#
Batch Object#
Parameter |
Type |
Description |
|---|---|---|
id |
string |
Unique identifier for the batch task. |
object |
string |
The type of the object, always “batch”. |
endpoint |
string |
The API endpoint to be called by the batch task (e.g., /v1/chat/completions). Currently supported endpoints: |
model |
string |
The model used in batch requests. Currently empty. |
errors |
object |
Error details when the task fails (only present when status is |
errors.data |
array |
Error information generated during validation. |
errors.data[].code |
string |
Error code. |
errors.data[].line |
int |
Not yet supported; always 0. |
errors.data[].message |
string |
Error message. |
errors.data[].param |
string |
Not yet supported; always empty. |
input_file_id |
string |
The ID of the file containing input requests (JSONL format). |
completion_window |
string |
The time window for task completion (e.g., 24h). |
status |
string |
The current status of the batch task (e.g., validating, in_progress, completed). |
output_file_id |
string |
The ID of the file containing processed responses (only present when the task completes successfully). |
error_file_id |
string |
The ID of the file containing failed requests. |
created_at |
integer |
The time the task was created (Unix timestamp in seconds). |
in_progress_at |
integer |
The time the task started processing (Unix timestamp in seconds). |
expires_at |
integer |
The configured task expiration time; tasks not completed by this time will expire (Unix timestamp in seconds). |
finalizing_at |
integer |
The time the task entered the finalizing phase (Unix timestamp in seconds). |
completed_at |
integer |
The time the task completed successfully (Unix timestamp in seconds). |
failed_at |
integer |
The time the task failed (Unix timestamp in seconds). |
expired_at |
integer |
The time the task expired (Unix timestamp in seconds). |
cancelling_at |
integer |
The time the task entered the cancelling phase (Unix timestamp in seconds). |
cancelled_at |
integer |
The time the task was actually cancelled (Unix timestamp in seconds). |
request_counts |
object |
Statistics on the number of requests in the task (total, completed, failed, etc.). |
request_counts.total |
integer |
Total number of requests. |
request_counts.completed |
integer |
Number of successful requests. |
request_counts.failed |
integer |
Number of failed requests. |
usage |
object |
Not yet supported. Usage information consumed by the task (e.g., token counts). |
metadata |
map |
Optional user-provided key-value metadata. |
Create Batch: POST /v1/batches#
Request Example
curl -s "<YOUR_GATEWAY_URL>/v1/batches" \
-H "Authorization: Bearer <YOUR_TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"endpoint": "/v1/chat/completions",
"completion_window": "24h"
}'
Response Example
{
"id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
"object": "batch",
"endpoint": "/v1/chat/completions",
"model": "",
"errors": null,
"input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"completion_window": "24h",
"status": "completed",
"output_file_id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
"error_file_id": null,
"created_at": 1765868672,
"in_progress_at": 1765868674,
"expires_at": 1765955072,
"finalizing_at": 1765868740,
"completed_at": 1765868741,
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": {
"total": 2160,
"completed": 2160,
"failed": 0
},
"usage": null,
"metadata": {}
}
Request Parameters
Parameter |
Type |
Required |
Description |
|---|---|---|---|
input_file_id |
string |
Yes |
The ID of an uploaded file. The file must be a valid JSONL format, containing at most 50,000 requests and with a maximum file size of 200 MB. |
endpoint |
string |
Yes |
The API endpoint for the batch requests. |
completion_window |
string |
No |
The time window for batch task completion. Currently only |
metadata |
map |
No |
Metadata for the batch task. Supports up to 16 key-value pairs, where keys can be up to 16 characters and values up to 512 characters. |
output_expires_after |
object |
No |
Not yet supported; all files will not be automatically expired or cleaned up. Expiration policy for the batch output or error files. |
Response Parameters
Returns the created Batch object.
Get Batch: GET /v1/batches/{batch_id}#
Request Example
curl -s "<YOUR_GATEWAY_URL>/v1/batches/batch_98d4d6e3-c7ec-4aa9-969e-fb8531059523" \
-H "Authorization: Bearer <YOUR_TOKEN>" \
-H "Content-Type: application/json"
Response Example
{
"id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
"object": "batch",
"endpoint": "/v1/chat/completions",
"model": "",
"errors": null,
"input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"completion_window": "24h",
"status": "completed",
"output_file_id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
"error_file_id": null,
"created_at": 1765868672,
"in_progress_at": 1765868674,
"expires_at": 1765955072,
"finalizing_at": 1765868740,
"completed_at": 1765868741,
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": {
"total": 2160,
"completed": 2160,
"failed": 0
},
"usage": null,
"metadata": {}
}
Request Parameters
Parameter |
Type |
Required |
Description |
|---|---|---|---|
batch_id |
string |
Yes |
The ID of the batch task to query. |
Response Parameters
Returns the queried Batch object.
Cancel Batch: POST /v1/batches/{batch_id}/cancel#
Only tasks in validating or in_progress status can be cancelled.
Request Example
curl -s "<YOUR_GATEWAY_URL>/v1/batches/batch_98d4d6e3-c7ec-4aa9-969e-fb8531059523/cancel" \
-H "Authorization: Bearer <YOUR_TOKEN>"
-X POST
Response Example
{
"id": "batch_93559b00-67bf-4615-895e-16fd30196ecb",
"object": "batch",
"endpoint": "/v1/chat/completions",
"model": "",
"errors": null,
"input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"completion_window": "24h",
"status": "cancelling",
"output_file_id": null,
"error_file_id": null,
"created_at": 1765870619,
"in_progress_at": 1765870620,
"expires_at": 1765957019,
"finalizing_at": null,
"completed_at": null,
"failed_at": null,
"expired_at": null,
"cancelling_at": 1765870629,
"cancelled_at": null,
"request_counts": {
"total": 2160,
"completed": 0,
"failed": 0
},
"usage": null,
"metadata": {}
}
Request Parameters
Parameter |
Type |
Required |
Description |
|---|---|---|---|
batch_id |
string |
Yes |
The ID of the batch task to cancel. |
Response Parameters
Returns the cancelled Batch object.
List Batches: GET /v1/batches#
Request Example
curl -s "<YOUR_GATEWAY_URL>/v1/batches" \
-H "Authorization: Bearer <YOUR_TOKEN>"
Response Example
{
"data": [
{
"id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
"object": "batch",
"endpoint": "/v1/chat/completions",
"model": "",
"errors": null,
"input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"completion_window": "24h",
"status": "completed",
"output_file_id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
"error_file_id": null,
"created_at": 1765868672,
"in_progress_at": 1765868674,
"expires_at": 1765955072,
"finalizing_at": 1765868740,
"completed_at": 1765868741,
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": {
"total": 2160,
"completed": 2160,
"failed": 0
},
"usage": null,
"metadata": {}
}
],
"first_id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
"has_more": false,
"last_id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
"object": "list"
}
Request Parameters
Parameter |
Type |
Required |
Description |
|---|---|---|---|
limit |
integer |
No |
Maximum number of results to return. |
after |
string |
No |
The last batch_id from the previous request. |
Response Parameters
Parameter |
Type |
Description |
|---|---|---|
object |
string |
Always “list”. |
data |
array |
A list of Batch objects, sorted in reverse chronological order (newest first). |
first_id |
string |
The first batch_id in this response. |
last_id |
string |
The last batch_id in this response. |
has_more |
bool |
Whether there are more results after the last batch_id. |
File#
File Object#
Parameter |
Type |
Description |
|---|---|---|
id |
string |
Unique identifier for the file. |
object |
string |
The type of the object, always “file”. |
bytes |
integer |
The size of the file in bytes. |
created_at |
integer |
The time the file was created (Unix timestamp in seconds). |
expires_at |
integer |
The time the file will expire (Unix timestamp in seconds). |
filename |
string |
The filename specified by the user during upload. |
purpose |
string |
The purpose of the file, always “batch”. |
Input File Format
Parameter |
Type |
Required |
Description |
|---|---|---|---|
custom_id |
string |
Yes |
A custom request ID; must be unique across all requests. |
method |
string |
Yes |
The HTTP method for the inference request, typically POST. |
url |
string |
Yes |
The endpoint for the inference request. Currently must match the endpoint specified when creating the batch task. |
body |
object |
Yes |
The request body for the inference request; it is forwarded to the inference service without modification. |
Output File Format
Note: The order of responses in the output file is not guaranteed to match the order of requests in the input file. Use custom_id to match each request with its response.
Parameter |
Type |
Description |
|---|---|---|
id |
string |
The batch_id. |
custom_id |
string |
The custom request ID. |
response |
object |
The response for the request. |
response.status_code |
int |
The HTTP status code returned by the inference service. |
response.request_id |
string |
The inference request ID. |
response.body |
object |
The response body returned by the inference service. |
error |
object |
Error information; present when an error occurs. Possible causes include inference service unavailability, invalid request format, or LLM Gateway internal errors. |
error.code |
string |
The specific error code. |
error.message |
string |
The specific error message. |
Upload File: POST /v1/files#
Request Example
curl -s "<YOUR_GATEWAY_URL>/v1/files" \
-H "Authorization: Bearer <YOUR_TOKEN>" \
-F purpose="batch" \
-F file="@input.jsonl"
Response Example
{
"bytes": 564813,
"created_at": 1765868482,
"filename": "input.jsonl",
"id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"object": "file",
"purpose": "batch"
}
Request Parameters
Parameter |
Type |
Required |
Description |
|---|---|---|---|
file |
File |
Yes |
The file to upload (multipart/form-data). |
purpose |
string |
Yes |
The purpose of the file; must be “batch” (multipart/form-data). |
Response Parameters
Returns the created File object.
List Files: GET /v1/files#
Request Example
curl -s "<YOUR_GATEWAY_URL>/v1/files" \
-H "Authorization: Bearer <YOUR_TOKEN>" \
-H "Content-Type: application/json"
Response Example
{
"data": [
{
"id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
"object": "file",
"bytes": 1740241,
"created_at": 1765868741,
"expires_at": 0,
"filename": "output.jsonl",
"purpose": "batch_output"
},
{
"id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"object": "file",
"bytes": 564813,
"created_at": 1765868482,
"expires_at": 0,
"filename": "1.jsonl",
"purpose": "batch"
}
],
"first_id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
"has_more": false,
"last_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"object": "list"
}
Request Parameters
None.
Response Parameters
Parameter |
Type |
Description |
|---|---|---|
object |
string |
Always “list”. |
data |
array |
A list of File objects, sorted in reverse chronological order (newest first). |
first_id |
string |
The first file_id in this response. |
last_id |
string |
The last file_id in this response. |
Get File: GET /v1/files/{file_id}#
Request Example
curl -s "<YOUR_GATEWAY_URL>/v1/files/batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b" \
-H "Authorization: Bearer <YOUR_TOKEN>"
Response Example
{
"id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
"object": "file",
"bytes": 564813,
"created_at": 1765868482,
"expires_at": 0,
"filename": "input.jsonl",
"purpose": "batch"
}
Request Parameters
Parameter |
Type |
Required |
Description |
|---|---|---|---|
file_id |
string |
Yes |
The ID of the file to query. |
Response Parameters
Returns the queried File object.
Delete File: DELETE /v1/files/{file_id}#
Note: This only deletes the file metadata. The actual file on OSS will not be deleted.
Request Example
curl -s "<YOUR_GATEWAY_URL>/v1/files/batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b" \
-H "Authorization: Bearer <YOUR_TOKEN>"
-X DELETE
Response Example
{
"deleted": true,
"id": "batch_output_a31a8f26-3abe-4522-9e3f-5c845fa56af7",
"object": "file"
}
Request Parameters
Parameter |
Type |
Required |
Description |
|---|---|---|---|
file_id |
string |
Yes |
The ID of the file to delete. |
Response Parameters
Parameter |
Type |
Description |
|---|---|---|
id |
string |
The file ID. |
object |
string |
The file type, always “file”. |
deleted |
bool |
true |
Get File Content: GET /v1/files/{file_id}/content#
Request Example
curl -s "<YOUR_GATEWAY_URL>/v1/files/batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b/content" \
-H "Authorization: Bearer <YOUR_TOKEN>"
Response Example
{"id":"batch_5f968571-b0b6-413f-a2a8-69bf750112af","custom_id":"request-1","response":{"status_code":200,"request_id":"282f82b5-577a-44f3-9bf7-a17522ac7d1c","body":{"id":"chatcmpl-282f82b5-577a-44f3-9bf7-a17522ac7d1c","object":"chat.completion","created":1765868675,"model":"Qwen3-VL-2B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! How can I assist you today?","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":22,"total_tokens":32,"completion_tokens":10,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}}}
...
Request Parameters
Parameter |
Type |
Required |
Description |
|---|---|---|---|
file_id |
string |
Yes |
The ID of the file whose content to retrieve. |
Response Parameters
Returns the file content.