Model Training Across Regions and Clouds – Challenges, Solutions and Live Demo
October 15, 2024
By 
Tom Luckenbach

AI training workloads running on compute engines like PyTorch, TensorFlow, and Ray require consistent, high-throughput access to training data to maintain high GPU utilization. However, with the decoupling of compute and storage and with today’s hybrid and multi-cloud landscape, AI Platform and Data Infrastructure teams are struggling to cost-effectively deliver the high-performance data access needed for AI workloads at scale.

Join Tom Luckenbach, Alluxio Solutions Engineering Manager, to learn how Alluxio enables high-speed, cost-effective data access for AI training workloads in hybrid and multi-cloud architectures, while eliminating the need to manage data copies across regions and clouds.

What Tom will share:

  • AI data access challenges in cross-region, cross-cloud architectures.​
  • The architecture and integration of Alluxio with frameworks like PyTorch, TensorFlow, and Ray using POSIX, REST, or Python APIs across AWS, GCP and Azure.
  • A live demo of an AI training workload accessing cross-cloud datasets leveraging Alluxio's distributed cache, unified namespace, and policy-driven data management.
  • MLPerf and FIO benchmark results and cost-savings analysis.

AI training workloads running on compute engines like PyTorch, TensorFlow, and Ray require consistent, high-throughput access to training data to maintain high GPU utilization. However, with the decoupling of compute and storage and with today’s hybrid and multi-cloud landscape, AI Platform and Data Infrastructure teams are struggling to cost-effectively deliver the high-performance data access needed for AI workloads at scale.

Join Tom Luckenbach, Alluxio Solutions Engineering Manager, to learn how Alluxio enables high-speed, cost-effective data access for AI training workloads in hybrid and multi-cloud architectures, while eliminating the need to manage data copies across regions and clouds.

What Tom will share:

  • AI data access challenges in cross-region, cross-cloud architectures.​
  • The architecture and integration of Alluxio with frameworks like PyTorch, TensorFlow, and Ray using POSIX, REST, or Python APIs across AWS, GCP and Azure.
  • A live demo of an AI training workload accessing cross-cloud datasets leveraging Alluxio's distributed cache, unified namespace, and policy-driven data management.
  • MLPerf and FIO benchmark results and cost-savings analysis.

AI training workloads running on compute engines like PyTorch, TensorFlow, and Ray require consistent, high-throughput access to training data to maintain high GPU utilization. However, with the decoupling of compute and storage and with today’s hybrid and multi-cloud landscape, AI Platform and Data Infrastructure teams are struggling to cost-effectively deliver the high-performance data access needed for AI workloads at scale.

Join Tom Luckenbach, Alluxio Solutions Engineering Manager, to learn how Alluxio enables high-speed, cost-effective data access for AI training workloads in hybrid and multi-cloud architectures, while eliminating the need to manage data copies across regions and clouds.

What Tom will share:

  • AI data access challenges in cross-region, cross-cloud architectures.​
  • The architecture and integration of Alluxio with frameworks like PyTorch, TensorFlow, and Ray using POSIX, REST, or Python APIs across AWS, GCP and Azure.
  • A live demo of an AI training workload accessing cross-cloud datasets leveraging Alluxio's distributed cache, unified namespace, and policy-driven data management.
  • MLPerf and FIO benchmark results and cost-savings analysis.
Videos:
Presentation Slides:

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer