Skip to main content
Ctrl+K

Llumnix

  • Llumnix Documentation

Getting Started

  • Quick Start
  • Deployment Guide
  • Llumlet Configuration Guide
  • Benchmark Guide

Development

  • Development Guide
  • Build Images

Design Documents

  • Architecture Overview
  • Gateway
    • Gateway Architecture
    • PDD Forwarding Protocol
    • Batch Inference
    • Traffic Splitting
  • Scheduler
    • Policy Framework
    • Instant and Accurate Load
    • Cache-aware Scheduling
    • Predictor-Enhanced Scheduling
    • SLO-aware Scheduling
    • Adaptive PD Scheduling
    • Rescheduler
  • Llumlet
    • Llumlet and Llumlet Proxy
    • Real-time Instance Status Tracking
    • Request Migration
  • Llumnix-KV
    • Hybrid Connector
    • Blade-KVT (KV Transfer)
  • .md

Scheduler

Scheduler#

Llumnix’s scheduler handles dynamic request scheduling for distributed LLM inference.

Contents

  • Policy Framework
    • Scheduling modes: full-mode and lite-mode
    • Metrics: quantifying instance load and latency
    • Filters: filtering candidate instances
    • Selectors: choosing the final instance
    • Built-in policies and modes
    • Scheduling features
    • Usage and extension guidelines
    • Future work
  • Instant and Accurate Load
    • Goal and definition
    • Existing paradigms
    • Limitations of existing paradigms
    • Llumnix Ial solutions
  • Cache-aware Scheduling
    • Introduction
    • Design and implementation
    • Usage
  • Predictor-Enhanced Scheduling
    • Introduction
    • Design and Implementation
    • Configuration
    • Current Limitations and Future Direction
  • SLO-aware Scheduling
    • Overview
    • Generating Profiling Data
    • Latency Prediction
    • Scheduling Pipeline
    • Policy Configuration
    • Best Practices
  • Adaptive PD Scheduling
    • Introduction
    • Prerequisites
    • Overview
    • Reserve State Management
    • Scheduling Strategy
    • Rescheduling Strategy
    • Configuration
    • Performance
    • Future Work
  • Rescheduler
    • Introduction
    • Architecture
    • Rescheduling Policies
    • Migration Implementation
    • Configuration
    • Deployment Modes

previous

Traffic Splitting

next

Policy Framework

By AlibabaPAI

© Copyright 2026, AlibabaPAI Team.