Viewing 1 current event matching “inference” by Date.
| Sort By: Date | Event Name, Location , Default |
|---|---|
|
Thursday
May 14
|
Cloud Native May Meetup: Change-driven Architecture and Serving Inference at Scale – Reperio Health Cloud Native PDX May: Change-driven Architecture and Serving Inference at Scale This May is all about scaling; scaling your infrastructure management, or scaling your LLM inference serving. Join us to find out about some open source tools to make managing large modern stacks easier. Date: Thursday, May 14 Time: 5:30–7:30 PM Location: Reperio Health, 4784 SE 17th Ave Suite 120, Portland, OR Recording: Talks are typically recorded (opt-in by speakers) A big thank you to Microsoft for sponsoring food & beverage, and to Reperio Health, our venue host. Drasi, a new take on Change Driven Architectures: Aman Singh, Microsoft Modern cloud-native systems constantly generate data changes, and applications often need to react to them. Building change-driven solutions that respond to specific changes in distributed data is challenging. This talk introduces Drasi, a CNCF Sandbox project that simplifies the design and implementation of change-driven architectures using Graph Queries and pluggable components. For example, with Drasi you can declaratively write automation to detect and respond to running containers with newly identified vulnerabilities across pods and deployments in a Kubernetes cluster. Join us for a walkthrough of real-world use cases that show how Drasi’s approach brings structure and responsiveness to complex distributed environments - without writing custom code. Dynamo: Large Scale Distributed Inference David Zeir, Director, DL System Software, Nvidia Neelay Shah, Distinguished Engineer, Nvidia This talk introduces Dynamo, NVIDIA's open-source Kubernetes-native distributed inference platform. We'll cover the problem space, walk through Dynamo's architecture — disaggregated prefill/decode, KV-cache-aware routing, and a transport layer that moves KV blocks directly between GPUs — and dig into the Kubernetes integration for scheduling, autoscaling, and graceful failure handling. We'll close with a demo of Dynamo serving a real workload. |
Viewing 0 past events matching “inference” by Date.
| Sort By: Date | Event Name, Location , Default |
|---|---|
| No events were found. | |