Tech Talk: From limited Hadoop compute capacity to increased data scientist efficiency
October 17, 2019
By 
No items found.

Using “zero-copy” hybrid bursting with Spark to solve capacity problems

Want to leverage your existing investments in Hadoop with your data on-premise and still benefit from the elasticity of the cloud?

Like other Hadoop users, you most likely experience very large and busy Hadoop clusters, particularly when it comes to compute capacity. Bursting HDFS data to the cloud can bring challenges – network latency impacts performance, copying data via DistCP means maintaining duplicate data, and you may have to make application changes to accomodate the use of S3.

“Zero-copy” hybrid bursting with Alluxio keeps your data on-prem and syncs data to compute in the cloud so you can expand compute capacity, particularly for ephemeral Spark jobs.

In this tech talk, we’ll discuss:

  • Approaches to burst data to the cloud
  • How Alluxio can enable “zero-copy” bursting of Spark workloads to cloud data services like EMR and Dataproc
  • How DBS Bank uses Alluxio to solve for limited on-prem compute capacity by zero-copy bursting Spark workloads to AWS EMR

Using “zero-copy” hybrid bursting with Spark to solve capacity problems

Want to leverage your existing investments in Hadoop with your data on-premise and still benefit from the elasticity of the cloud?

Like other Hadoop users, you most likely experience very large and busy Hadoop clusters, particularly when it comes to compute capacity. Bursting HDFS data to the cloud can bring challenges – network latency impacts performance, copying data via DistCP means maintaining duplicate data, and you may have to make application changes to accomodate the use of S3.

“Zero-copy” hybrid bursting with Alluxio keeps your data on-prem and syncs data to compute in the cloud so you can expand compute capacity, particularly for ephemeral Spark jobs.

In this tech talk, we’ll discuss:

  • Approaches to burst data to the cloud
  • How Alluxio can enable “zero-copy” bursting of Spark workloads to cloud data services like EMR and Dataproc
  • How DBS Bank uses Alluxio to solve for limited on-prem compute capacity by zero-copy bursting Spark workloads to AWS EMR

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer