Export to
Wednesday, March 18, 2026 at 2:32pm.
Portland Linux/Unix Group General Monthly Meeting: Inside AI Supercomputers, with Jesse Lopez
Enter through the Engineering Building. The room is downstairs, follow the signs. If the outside door is locked and there isn't anyone to let you in, look for the sign and a cell number to text.
Website
Description
Summary:
Inside AI Supercomputers: From GPUs to Multi-DC Clusters
Large language models and other frontier AI systems are trained on clusters with thousands to over a hundred thousand GPUs. But what does that infrastructure actually look like? This talk walks through the anatomy of an AI supercomputer from the ground up: individual GPUs, multi-GPU nodes, racks, and full clusters. We'll cover the three pillars of compute, storage, and networking, then look at how training and inference workloads place very different demands on hardware. Finally, we'll explore how Linux runs the show at every layer from the OS on each node, to InfiniBand fabric management, to job scheduling with Slurm, Kubernetes, and Ray.
No AI/HPC background required - just curiosity about what it takes to build and run the machines behind the models.
Bio:
Jesse Lopez is an AI/ML and Technical Program Manager in the Azure HPC/AI organization where he helps deploy large-scale AI infrastructure and works with customers to put it to use. A former scientist, he has a background in high-performance computing, AI/ML, and has been a Linux user since the nineteen hundreds.