With data lakes expanding from on-prem to the cloud as well as increasing use of new object data stores, data platform teams are challenged with providing consistent, high-throughput access to distributed data sources for analytics and AI/ML applications. In today’s hybrid cloud and multi-cloud era, data-intensive applications such as Presto, Spark, Hive, and Tensorflow are suffering more sluggish response times and increased complexity with the growing separation of data and compute.
Join Alluxio’s distributed systems experts as they explore today’s data access challenges and open source data orchestration solutions for modernizing your data platform.
In this tech talk, you’ll learn:
- How data access and throughput challenges are hindering large-scale analytics and AI/ML applications
- How a data orchestration layer can simplify distributed data access and improve performance
- Real-world production use cases and example journeys for architecting a modern data platform
ALLUXIO WEBINAR
With data lakes expanding from on-prem to the cloud as well as increasing use of new object data stores, data platform teams are challenged with providing consistent, high-throughput access to distributed data sources for analytics and AI/ML applications. In today’s hybrid cloud and multi-cloud era, data-intensive applications such as Presto, Spark, Hive, and Tensorflow are suffering more sluggish response times and increased complexity with the growing separation of data and compute.
Join Alluxio’s distributed systems experts as they explore today’s data access challenges and open source data orchestration solutions for modernizing your data platform.
In this tech talk, you’ll learn:
- How data access and throughput challenges are hindering large-scale analytics and AI/ML applications
- How a data orchestration layer can simplify distributed data access and improve performance
- Real-world production use cases and example journeys for architecting a modern data platform
Video:
Slides:
ALLUXIO WEBINAR
With data lakes expanding from on-prem to the cloud as well as increasing use of new object data stores, data platform teams are challenged with providing consistent, high-throughput access to distributed data sources for analytics and AI/ML applications. In today’s hybrid cloud and multi-cloud era, data-intensive applications such as Presto, Spark, Hive, and Tensorflow are suffering more sluggish response times and increased complexity with the growing separation of data and compute.
Join Alluxio’s distributed systems experts as they explore today’s data access challenges and open source data orchestration solutions for modernizing your data platform.
In this tech talk, you’ll learn:
- How data access and throughput challenges are hindering large-scale analytics and AI/ML applications
- How a data orchestration layer can simplify distributed data access and improve performance
- Real-world production use cases and example journeys for architecting a modern data platform
Video:
Slides:
Videos:
Presentation Slides:
Complete the form below to access the full overview:
Videos
In the rapidly evolving landscape of AI and machine learning, Platform and Data Infrastructure Teams face critical challenges in building and managing large-scale AI platforms. Performance bottlenecks, scalability of the platform, and scarcity of GPUs pose significant challenges in supporting large-scale model training and serving.
In this talk, we introduce how Alluxio helps Platform and Data Infrastructure teams deliver faster, more scalable platforms to ML Engineering teams developing and training AI models. Alluxio’s highly-distributed cache accelerates AI workloads by eliminating data loading bottlenecks and maximizing GPU utilization. Customers report up to 4x faster training performance with high-speed access to petabytes of data spread across billions of files regardless of persistent storage type or proximity to GPU clusters. Alluxio’s architecture lowers data infrastructure costs, increases GPU utilization, and enables workload portability for navigating GPU scarcity challenges.
TorchTitan is a proof-of-concept for Large-scale LLM training using native PyTorch. It is a repo that showcases PyTorch's latest distributed training features in a clean, minimal codebase.
In this talk, Tianyu will share TorchTitan’s design and optimizations for the Llama 3.1 family of LLMs, spanning 8 billion to 405 billion parameters, and showcase its performance, composability, and scalability.