Burst Presto & Spark workloads to AWS EMR with no data copies
April 28, 2020
By 
Adit Madan
Bin Fan

Today’s conventional wisdom states that network latency across the two ends of a hybrid cloud prevents you from running analytic workloads in the cloud with the data on-prem. As a result, most companies copy their data into a cloud environment and maintain that duplicate data. All of this means that it is challenging to make both on-prem HDFS data accessible with the desired application performance.

In this talk, we will show you how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud.

In this Office Hour, we will go over:

  • A strategy to embrace the hybrid cloud, including an architecture for running ephemeral compute clusters using on-prem HDFS.
  • An example of running on-demand Presto, Spark, and Hive with Alluxio in the public cloud.
  • An analysis of experiments with TPC-DS to demonstrate the benefits of the given architecture.
ALLUXIO COMMUNITY OFFICE HOUR

Today’s conventional wisdom states that network latency across the two ends of a hybrid cloud prevents you from running analytic workloads in the cloud with the data on-prem. As a result, most companies copy their data into a cloud environment and maintain that duplicate data. All of this means that it is challenging to make both on-prem HDFS data accessible with the desired application performance.

In this talk, we will show you how to leverage any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) to scale analytics workloads directly on on-prem data without copying and synchronizing the data into the cloud.

In this Office Hour, we will go over:

  • A strategy to embrace the hybrid cloud, including an architecture for running ephemeral compute clusters using on-prem HDFS.
  • An example of running on-demand Presto, Spark, and Hive with Alluxio in the public cloud.
  • An analysis of experiments with TPC-DS to demonstrate the benefits of the given architecture.

Video:

Slides:

Burst Presto & Spark workloads to AWS EMR with no data copies from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer