AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & Serving

May 24, 2024

Lu Qiu

Speed and efficiency are two requirements for the underlying infrastructure for machine learning model development. Data access can bottleneck end-to-end machine learning pipelines as training data volume grows and when large model files are more commonly used for serving. For instance, data loading can constitute nearly 80% of the total model training time, resulting in less than 30% GPU utilization. Also, loading large model files for deployment to production can be slow because of slow network or storage read operations. These challenges are prevalent when using popular frameworks like PyTorch, Ray, or HuggingFace, paired with cloud object storage solutions like S3 or GCS, or downloading models from the HuggingFace model hub.

In this presentation, Lu and Siyuan will offer comprehensive insights into improving speed and GPU utilization for model training and serving. You will learn:

The data loading challenges hindering GPU utilization
The reference architecture for running PyTorch and Ray jobs while reading data from S3, with benchmark results of training ResNet50 and BERT
Real-world examples of boosting model performance and GPU utilization through optimized data access

Video:

Presentation slides:

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & Serving from Alluxio, Inc.

Videos:

Presentation Slides:

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & Serving from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendation Applications

March 6, 2025

AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune

March 6, 2025

AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, Pretraining, & Inference at Scale

March 6, 2025

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

Videos:

Presentation Slides:

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer