AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
January 23, 2025
LLM inference can be huge, particularly, with long contexts. In this on-demand video, Junchen Jiang, Assistant Professor at University of Chicago, presents a 10x solution for long contexts inference: an easy-to-deploy stack over multiple vLLM engines with tailored KV-cache backend.
Videos:
Presentation Slides:
Complete the form below to access the full overview:
.png)