Burst Presto & Spark workloads to AWS EMR with no data copies

April 28, 2020

Adit Madan

Director of Product Management

Alluxio

Bin Fan

VP of Technology

Alluxio

Today’s conventional wisdom states that network latency across the two ends of a hybrid cloud prevents you from running analytic workloads in the cloud with the data on-prem. As a result, most companies copy their data into a cloud environment and maintain that duplicate data. All of this means that it is challenging to make both on-prem HDFS data accessible with the desired application performance.

In this talk, we will show you how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud.

In this Office Hour, we will go over:

A strategy to embrace the hybrid cloud, including an architecture for running ephemeral compute clusters using on-prem HDFS.
An example of running on-demand Presto, Spark, and Hive with Alluxio in the public cloud.
An analysis of experiments with TPC-DS to demonstrate the benefits of the given architecture.

ALLUXIO COMMUNITY OFFICE HOUR

In this Office Hour, we will go over:

A strategy to embrace the hybrid cloud, including an architecture for running ephemeral compute clusters using on-prem HDFS.
An example of running on-demand Presto, Spark, and Hive with Alluxio in the public cloud.
An analysis of experiments with TPC-DS to demonstrate the benefits of the given architecture.

Video:

Slides:

Burst Presto & Spark workloads to AWS EMR with no data copies from Alluxio, Inc.

‍

In this Office Hour, we will go over:

A strategy to embrace the hybrid cloud, including an architecture for running ephemeral compute clusters using on-prem HDFS.
An example of running on-demand Presto, Spark, and Hive with Alluxio in the public cloud.
An analysis of experiments with TPC-DS to demonstrate the benefits of the given architecture.

Videos:

Presentation Slides:

Burst Presto & Spark workloads to AWS EMR with no data copies from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

GTC 2025 | Alluxio Decouples Storage and Compute for a Faster AI Future

April 9, 2025

Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distributed Storage

Deepseek’s recent announcement of the Fire-flyer File System (3FS) has sparked excitement across the AI infra community, promising a breakthrough in how machine learning models access and process data.

In this webinar, an expert in distributed systems and AI infrastructure will take you inside Deepseek 3FS, the purpose-built file system for handling large files and high-bandwidth workloads. We’ll break down how 3FS optimizes data access and speeds up AI workloads as well as the design tradeoffs made to maximize throughput for AI workloads.

This webinar you’ll learn about how 3FS works under the hood, including:

✅ The system architecture

✅ Core software components

✅ Read/write flows

✅ Data distribution/placement algorithms

✅ Cluster/node management and disaster recovery

Whether you’re an AI researcher, ML engineer, or infrastructure architect, this deep dive will give you the technical insights you need to determine if 3FS is the right solution for you.

‍

April 1, 2025

AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendation Applications

March 6, 2025

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

ALLUXIO COMMUNITY OFFICE HOUR

Videos:

Presentation Slides:

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer