Apache Spark and Alluxio were both born in UC Berkeley’s AMPLab as research projects. As an open source data orchestration platform, Alluxio is able to achieve seamless docking and acceleration of different data sources, and improve the efficiency and fault tolerance of Spark’s big data computing business.
Alluxio has been deployed and running on a large scale managing petabytes level data in the production environment of companies such as Microsoft, Tiktok, Tencent, Singapore Development Bank, China Unicom, etc.
This talk shares the designs and use cases of the Alluxio and Spark integrated solutions, as well as the best practice and “what not to do” in designing and implementing Alluxio distributed systems.
ALLUXIO DAY VI 2021
October 12, 2021
Apache Spark and Alluxio were both born in UC Berkeley’s AMPLab as research projects. As an open source data orchestration platform, Alluxio is able to achieve seamless docking and acceleration of different data sources, and improve the efficiency and fault tolerance of Spark’s big data computing business.
Alluxio has been deployed and running on a large scale managing petabytes level data in the production environment of companies such as Microsoft, Tiktok, Tencent, Singapore Development Bank, China Unicom, etc.
This talk shares the designs and use cases of the Alluxio and Spark integrated solutions, as well as the best practice and “what not to do” in designing and implementing Alluxio distributed systems.
Video:
Presentation Slides:
Apache Spark and Alluxio were both born in UC Berkeley’s AMPLab as research projects. As an open source data orchestration platform, Alluxio is able to achieve seamless docking and acceleration of different data sources, and improve the efficiency and fault tolerance of Spark’s big data computing business.
Alluxio has been deployed and running on a large scale managing petabytes level data in the production environment of companies such as Microsoft, Tiktok, Tencent, Singapore Development Bank, China Unicom, etc.
This talk shares the designs and use cases of the Alluxio and Spark integrated solutions, as well as the best practice and “what not to do” in designing and implementing Alluxio distributed systems.
Videos:
Presentation Slides:
Complete the form below to access the full overview:
.png)
Videos
Deepseek’s recent announcement of the Fire-flyer File System (3FS) has sparked excitement across the AI infra community, promising a breakthrough in how machine learning models access and process data.
In this webinar, an expert in distributed systems and AI infrastructure will take you inside Deepseek 3FS, the purpose-built file system for handling large files and high-bandwidth workloads. We’ll break down how 3FS optimizes data access and speeds up AI workloads as well as the design tradeoffs made to maximize throughput for AI workloads.
This webinar you’ll learn about how 3FS works under the hood, including:
✅ The system architecture
✅ Core software components
✅ Read/write flows
✅ Data distribution/placement algorithms
✅ Cluster/node management and disaster recovery
Whether you’re an AI researcher, ML engineer, or infrastructure architect, this deep dive will give you the technical insights you need to determine if 3FS is the right solution for you.