AI Infra Summit - Workshop Agenda

Agenda Days:

Tuesday, 9 September: Tuesday, 9 Sep, 2025

3:30-5:30

AI Chip Design Startup Forum - Hosted by Plug and Play & Synopsys

Thinking of building your own AI Chip? Join us for a panel discussion, “Building & Scaling an AI Chip Startup,” where founders and leaders from AI chip startups will share their experiences and insights on growing their companies from initial idea to first tapeout and beyond. Discover how they addressed key challenges such as securing seed funding, hiring the right team, and identifying the best customer and market fit. We’ll also discuss how they tackled strategic and operational challenges to come out ahead in this dynamic industry.

This forum event will be followed by a networking happy hour at 5:30pm.

Sponsor(s):

Synopsys

Time:
3:30-5:30 PM
Session Type:
General Session (Presentation)
Wednesday, 10 Sep, 2025

1:30 PM

Scaling LLM Inference with vLLM and AWS Tranium

Join us in this hands-on workshop to learn how to deploy and optimize large language models (LLMs) for scalable inference at enterprise scale. Participants will learn to orchestrate distributed LLM serving with vLLM on Amazon EKS, enabling robust, flexible, and highly available deployments. The session demonstrates how to utilize AWS Trainium hardware within EKS to maximize throughput and cost efficiency, leveraging Kubernetes-native features for automated scaling, resource management, and seamless integration with AWS services.
Location: Room 206
Duration: 1 hour

Sponsor(s):

AWS

Speaker(s):

Author:

Asheesh Goja

Principal GenAI Solutions Architect
AWS

... read more

Asheesh Goja
Principal GenAI Solutions Architect
AWS

Author:

Pinak Panigrahi

Sr. Machine Learning Architect - Annapurna ML
AWS

... read more

Pinak Panigrahi
Sr. Machine Learning Architect - Annapurna ML
AWS

Session Type:
General Session (Presentation)

From Blind Spots to Breakthroughs: Real-Time AI Factory Observability that Cuts Costs and Boosts Performance

Your AI infrastructure is only as effective as your visibility into it, and right now, most teams are flying blind. In this hands-on workshop, you’ll learn how to use real-time observability to reduce costs, eliminate waste, and keep your AI Factory running at peak performance. We’ll dive into practical techniques to:
Identify GPU underutilization, throttling, and idle capacity across both cloud and on-premises deployments before they burn through your budget.
Monitor token usage for inference workloads (including NVIDIA NIM containers) to catch cost spikes and inefficiencies as they happen.
Correlate slow inference jobs or degraded model performance to root-cause issues anywhere in the stack, so you can fix problems without throwing more hardware or cloud spend at them.

Through live demonstrations, you’ll see how real-time telemetry and AI-driven correlation turn raw metrics into immediate, actionable insights, helping you cut unnecessary spend, speed up troubleshooting, and ensure your models deliver maximum value. If you’re responsible for making AI infrastructure faster, leaner, and more cost-efficient, this is the one workshop you can’t afford to miss.

Sponsor(s):

Virtana

Speaker(s):

Author:

Devin Avery

Product Development Architect
Virtana

Devin Avery brings over 20 years of experience in software engineering, specializing in enterprise and service provider software development. Currently serving as Product Development Architect at Virtana, he is driving the company’s approach for AI Factory Observability, Generative AI capabilities, and Infrastructure Observability. His career includes previous principal software engineering roles at Brocade and software engineering roles at CA Technologies, establishing a solid foundation in developing complex enterprise solutions.
Devin has a proven track record of designing and delivering scalable, high-quality software solutions through a disciplined, iterative, and use-case-driven design and testing philosophy. He holds multiple patents for his work on algorithms related to applying collection policies, traversing topologies, and testing abstractions. With a Bachelor of Science in Computer Science from the University of New Hampshire, Devin has a keen ability to decompose high-level user requirements into executable stories and effectively bridge the gap between development teams and product management. His skills have been instrumental in transforming legacy products into modern, customer-centric solutions.

... read more

Devin Avery
Product Development Architect
Virtana

Devin Avery brings over 20 years of experience in software engineering, specializing in enterprise and service provider software development. Currently serving as Product Development Architect at Virtana, he is driving the company’s approach for AI Factory Observability, Generative AI capabilities, and Infrastructure Observability. His career includes previous principal software engineering roles at Brocade and software engineering roles at CA Technologies, establishing a solid foundation in developing complex enterprise solutions.
Devin has a proven track record of designing and delivering scalable, high-quality software solutions through a disciplined, iterative, and use-case-driven design and testing philosophy. He holds multiple patents for his work on algorithms related to applying collection policies, traversing topologies, and testing abstractions. With a Bachelor of Science in Computer Science from the University of New Hampshire, Devin has a keen ability to decompose high-level user requirements into executable stories and effectively bridge the gap between development teams and product management. His skills have been instrumental in transforming legacy products into modern, customer-centric solutions.

Session Type:
General Session (Presentation)

2:45 PM

From Rack to Response: Build & Deploy Generative AI in 30 Minutes with NeuReality

Experience the future of GenAI inference architecture with NeuReality’s fully integrated, enterprise-ready NR1® Inference Appliance. In this hands-on workshop, you'll go from cold start to live GenAI applications in under 30 minutes using our AI-CPU-powered system. The NR1® Chip – the world’s first AI-CPU purpose built for interference – pairs with any GPU or AI accelerator and optimizes any AI data workload. We’ll walk you through setup, deployment, and real-time inference using models like LLaMA, Mistral, and DeepSeek on our disaggregated architecture—built for smooth scalability, superior price/performance and near 100% GPU utilization (vs <50% with traditional CPU/NIC architecture). Join us to see how NeuReality eliminates infrastructure complexity and delivers enterprise-ready performance and ROI today.
Location: Room 201
Duration: 1 hour

Sponsor(s):

NeuReality

Speaker(s):

Author:

Paul Piezzo

Enterprise Sales Director
NeuReality

... read more

Paul Piezzo
Enterprise Sales Director
NeuReality

Author:

Gaurav Shah

VP of Business Development
NeuReality

... read more

Gaurav Shah
VP of Business Development
NeuReality

Author:

Naveh Grofi

Customer Success Engineer
NeuReality

... read more

Naveh Grofi
Customer Success Engineer
NeuReality

Session Type:
Workshop

The Future of Enterprise AI Infrastructure: What The AI Deployments of Today Will Mean for Enterprise Deployments in the Future

DataBank, one of the nation’s leading data center operators, with more facilities in more markets than any other provider, has seen the future of enterprise AI infrastructure and knows how to help enterprises get there.
With a customer base that spans 2500+ enterprises – in addition to hyperscalers and emerging AI service providers – DataBank has a unique perspective on the trends and lessons learned from customer AI deployments to date, which include some of the industry’s first NVL72/GB200 installations.
In this 60-minute session, John Solensky, DataBank’s VP of Sales Engineering, and Mike Alvaro, DataBank’s Principal Solutions Architect, will share what DataBank has learned from its early GPU installations for hyperscalers and AI service providers, how those lessons were applied to later enterprise installations, the impact that next-generation GPUs are having on data center designs and solution costs, and the lessons for future enterprise deployments.
Location: Room 206
Duration: 1 hour

Sponsor(s):

DataBank

Session Type:
Workshop

Improve Price Performance for LLM Serving with vLLM on TPU & GKE

Dive into a hands-on workshop designed exclusively for AI developers. Learn to leverage the power of Google Cloud TPUs, the custom accelerators behind Google Gemini, for highly efficient LLM inference using vLLM. In this trial run for Google Developer Experts (GDEs), you'll build and deploy Gemma 3 27B on Trillium TPUs with vLLM and Google Kubernetes Engine (GKE). Explore advanced tooling like Dynamic Workload Scheduler (DWS) for TPU provisioning, Google Cloud Storage (GCS) for model checkpoints, and essential observability and monitoring solutions. Your live feedback will directly shape the future of this workshop, and we encourage you to share your experience with the vLLM/TPU integration on your social channels.
Location: Room 207
Duration: 1 hour

Sponsor(s):

Google

Speaker(s):

Author:

Niranjan Hira

Senior Product Manager
Google Cloud

As a Product Manager in our AI Infrastructure team, Hira looks out for how Google Cloud offerings can help customers and partners build more helpful AI experiences for users. With over 30 years of experience building applications and products across multiple industries, he likes to hog the whiteboard and tell developer tales.

... read more

Niranjan Hira
Senior Product Manager
Google Cloud

As a Product Manager in our AI Infrastructure team, Hira looks out for how Google Cloud offerings can help customers and partners build more helpful AI experiences for users. With over 30 years of experience building applications and products across multiple industries, he likes to hog the whiteboard and tell developer tales.

Session Type:
Workshop

4:00 PM

Runtime Attested HPC Cluster Reference Architecture for Confidential Computing

The rapid evolution of high-performance computing (HPC) clusters has been instrumental in driving transformative advancements in AI research and applications. These sophisticated systems enable the processing of complex datasets and support groundbreaking innovation. However, as their adoption grows, so do the critical security challenges they face, particularly when handling sensitive data in multi-tenant environments where diverse users and workloads coexist. Organizations are increasingly turning to Confidential Computing as a framework to protect AI workloads, emphasizing the need for robust HPC architectures that incorporate runtime attestation capabilities to ensure trust and integrity.
In this session, we present an advanced HPC cluster architecture designed to address these challenges, focusing on how runtime attestation of critical components – such as the kernel, Trusted Execution Environments (TEEs), and eBPF layers – can effectively fortify HPC clusters for AI applications operating across disjoint tenants. This architecture leverages cutting-edge security practices, enabling real-time verification and anomaly detection without compromising the performance essential to HPC systems.
Through use cases and examples, we will illustrate how runtime attestation integrates seamlessly into HPC environments, offering a scalable and efficient solution for securing AI workloads. Participants will leave this session equipped with a deeper understanding of how to leverage runtime attestation and Confidential Computing principles to build secure, reliable, and high-performing HPC clusters tailored for AI innovations.
Location: Room 201
Duration: 1 hour

Sponsor(s):

Confidential Computing Consortium

Speaker(s):

Author:

Jason Rogers

CEO
Invary

Jason Rogers is the Chief Executive Officer of Invary, a cybersecurity company that ensures the security and confidentiality of critical systems by verifying their Runtime Integrity. Leveraging NSA-licensed technology, Invary detects hidden threats and reinforces confidence in an existing security posture. Previously, Jason served as the Vice President of Platform at Matterport, successfully launched a consumer-facing IoT platform for Lowe's, and developed numerous IoT and network security software products for Motorola.

... read more

Jason Rogers
CEO
Invary

Jason Rogers is the Chief Executive Officer of Invary, a cybersecurity company that ensures the security and confidentiality of critical systems by verifying their Runtime Integrity. Leveraging NSA-licensed technology, Invary detects hidden threats and reinforces confidence in an existing security posture. Previously, Jason served as the Vice President of Platform at Matterport, successfully launched a consumer-facing IoT platform for Lowe's, and developed numerous IoT and network security software products for Motorola.

Author:

Ayal Yogev

CEO & Co-founder
Anjuna

... read more

Ayal Yogev
CEO & Co-founder
Anjuna

Session Type:
Workshop

Train and Deploy High-Performing AI Model Development at Scale

In this session, we will explore the end-to-end workflow of managing foundation model (FM) development on Amazon SageMaker HyperPod. Our discussion will cover both distributed model training and inference using frameworks like PyTorch and KubeRay. Additionally, we will dive into operational aspects, including system observability and resiliency features for scale and cost-performance using Amazon EKS on SageMaker HyperPod. By the end of this hands-on session, you will gain a robust understanding of training and deploying FMs efficiently on AWS. You will learn to leverage cutting-edge techniques and tools to ensure high performance, reliable, and scalable FM development.
Location: Room 206
Duration: 1 hour

Sponsor(s):

AWS

Speaker(s):

Author:

Mark Vinciguerra

Assoc. WW Solution Architect
AWS GenAI

... read more

Mark Vinciguerra
Assoc. WW Solution Architect
AWS GenAI

Author:

Aravind Neelakantan

WW Solution Architect
AWS GenAI

... read more

Aravind Neelakantan
WW Solution Architect
AWS GenAI

Author:

Aman Shanbhag

WW Solution Architect
AWS GenAI

Aman Shanbhag is a Specialist Solutions Architect on the ML Frameworks team at Amazon Web Services (AWS), where he helps customers and partners with deploying ML training and inference solutions at scale. Before joining AWS, Aman graduated from Rice University with degrees in computer science, mathematics, and entrepreneurship.

... read more

Aman Shanbhag
WW Solution Architect
AWS GenAI

Aman Shanbhag is a Specialist Solutions Architect on the ML Frameworks team at Amazon Web Services (AWS), where he helps customers and partners with deploying ML training and inference solutions at scale. Before joining AWS, Aman graduated from Rice University with degrees in computer science, mathematics, and entrepreneurship.

Session Type:
Workshop

New technology modalities for the AI fabric chipset: when advanced ASICs and photonics ICs come together with 3D packaging

In the new era of AI infrastructure, CMOS scaling remains the workhorse for heavy computational workloads. But the need for an energy-efficient solution imposes a paradigm shift at the interconnect level, requiring an intimate 3D co-integration of advanced ASICs and optical connectivity.
As the architectural complexity of new products increases, relying on state-of-the-art platforms, with a short path to manufacturing. In this workshop, we will highlight how you can access following technologies for your future products:
Advanced-node ASIC down to TSMC N2
Imec’s integrated photonics platforms from 200G up to co-packaged optics
Imec’s advanced 3D packaging technique from interposer to hybrid bonding
Location: Room 207
Duration: 1 hour

Sponsor(s):

IMEC

Speaker(s):

Author:

Philippe Soussan

Technology Portfolio Director
IMEC

Philippe Soussan is Technology Portfolio Director at imec. For 20 years, he has held different positions in R&D management at imec in the field of sensors, photonics, and 3D packaging. Addressing these technologies from R&D up to manufacturing levels.
His expertise lies in wafer-scale technologies, and he has authored over 100 publications and holds more than 20 patents in these fields.
Since 2024, Philippe has been in charge of strategy definition within the “IC-link by imec” sector. This imec business line provides access to design and manufacturing services in the most advanced ASIC and specialty technologies.

... read more

Philippe Soussan
Technology Portfolio Director
IMEC

Philippe Soussan is Technology Portfolio Director at imec. For 20 years, he has held different positions in R&D management at imec in the field of sensors, photonics, and 3D packaging. Addressing these technologies from R&D up to manufacturing levels.
His expertise lies in wafer-scale technologies, and he has authored over 100 publications and holds more than 20 patents in these fields.
Since 2024, Philippe has been in charge of strategy definition within the “IC-link by imec” sector. This imec business line provides access to design and manufacturing services in the most advanced ASIC and specialty technologies.

Session Type:
Workshop