Scheduler# Llumnix’s scheduler handles dynamic request scheduling for distributed LLM inference. Contents Policy Framework Scheduling modes: full-mode and lite-mode Metrics: quantifying instance load and latency Filters: filtering candidate instances Selectors: choosing the final instance Built-in policies and modes Scheduling features Usage and extension guidelines Future work Instant and Accurate Load Goal and definition Existing paradigms Limitations of existing paradigms Llumnix Ial solutions Cache-aware Scheduling Introduction Design and implementation Usage Predictor-Enhanced Scheduling Introduction Design and Implementation Configuration Current Limitations and Future Direction SLO-aware Scheduling Overview Generating Profiling Data Latency Prediction Scheduling Pipeline Policy Configuration Best Practices Adaptive PD Scheduling Introduction Prerequisites Overview Reserve State Management Scheduling Strategy Rescheduling Strategy Configuration Performance Future Work Rescheduler Introduction Architecture Rescheduling Policies Migration Implementation Configuration Deployment Modes