Scheduler# Llumnix’s scheduler handles dynamic request scheduling for distributed LLM inference. Contents Policy Framework Scheduling modes: full-mode and lite-mode Metrics: quantifying instance load and latency Filters: filtering candidate instances Selectors: choosing the final instance Built-in policies and modes Scheduling features Future work Instant and Accurate Load Goal and definition Existing paradigms Limitations of existing paradigms Llumnix Ial solutions Cache-aware Scheduling Introduction Design and implementation Predictor-Enhanced Scheduling Introduction Design and Implementation Current Limitations and Future Direction SLO-aware Scheduling Overview Generating Profiling Data Latency Prediction Scheduling Pipeline Adaptive PD Scheduling Introduction Prerequisites Overview Reserve State Management Scheduling Strategy Rescheduling Strategy Performance Future Work Rescheduler Introduction Architecture Rescheduling Policies Migration Implementation