Community Office Hour: Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
August 27, 2019
By 
Bin Fan
Nakkul Sreenivas

Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.

In this Office Hour we go over:

  • How to set up EMR Spark with Alluxio such that Spark jobs can seamlessly read from and write to S3
  • Compare the performance between Spark on S3 with Spark and Alluxio on S3
  • Open Session for discussion on any topics such as solving the separation of compute and storage problem, and more

Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.

In this Office Hour we go over:

  • How to set up EMR Spark with Alluxio such that Spark jobs can seamlessly read from and write to S3
  • Compare the performance between Spark on S3 with Spark and Alluxio on S3
  • Open Session for discussion on any topics such as solving the separation of compute and storage problem, and more

Video:

Presentation slides:

Building a Cloud Native Stack with EMR Spark, Alluxio, and S3 from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer