Export or edit this event...

Portland Linux/Unix Group General Monthly Meeting: Inside AI Supercomputers, with Jesse Lopez

Portland State University Fourth Avenue Building (FAB) Room FAB 86-01
1900 Southwest 4th Avenue
Portland, OR 97201, US (map)

Enter through the Engineering Building. The room is downstairs, follow the signs. If the outside door is locked and there isn't anyone to let you in, look for the sign and a cell number to text.

Website

Description

Summary:

Inside AI Supercomputers: From GPUs to Multi-DC Clusters

Large language models and other frontier AI systems are trained on clusters with thousands to over a hundred thousand GPUs. But what does that infrastructure actually look like? This talk walks through the anatomy of an AI supercomputer from the ground up: individual GPUs, multi-GPU nodes, racks, and full clusters. We'll cover the three pillars of compute, storage, and networking, then look at how training and inference workloads place very different demands on hardware. Finally, we'll explore how Linux runs the show at every layer from the OS on each node, to InfiniBand fabric management, to job scheduling with Slurm, Kubernetes, and Ray.

No AI/HPC background required - just curiosity about what it takes to build and run the machines behind the models.

Bio:

Jesse Lopez is an AI/ML and Technical Program Manager in the Azure HPC/AI organization where he helps deploy large-scale AI infrastructure and works with customers to put it to use. A former scientist, he has a background in high-performance computing, AI/ML, and has been a Linux user since the nineteen hundreds.

Share

Tags