Community Office Hour: Improving Memory Utilization of Spark Jobs Using Alluxio
November 26, 2019
By 
Bin Fan

Apache Spark has been widely adopted for in-memory data analytics at scale, however, efficient memory utilization is a common challenge, and users will either run out of memory or experience low and unstable performance. Many Spark users may not be aware of the differences in memory utilization between caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio. In this office hour, I will highlight the two approaches with a demo and open up for discussions

In this Office Hour we’ll go over:

  • How to run Spark shell with Alluxio such that Spark jobs
  • A demo to compare the memory usage between Spark cache and using Alluxio as the external off-heap caching service
  • Open Session for discussion on any topics such as running Presto on Alluxio, and more
ALLUXIO COMMUNITY OFFICE HOUR

Apache Spark has been widely adopted for in-memory data analytics at scale, however, efficient memory utilization is a common challenge, and users will either run out of memory or experience low and unstable performance. Many Spark users may not be aware of the differences in memory utilization between caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio. In this office hour, I will highlight the two approaches with a demo and open up for discussions

In this Office Hour we’ll go over:

  • How to run Spark shell with Alluxio such that Spark jobs
  • A demo to compare the memory usage between Spark cache and using Alluxio as the external off-heap caching service
  • Open Session for discussion on any topics such as running Presto on Alluxio, and more

Video:

Presentation slides:

Improving Memory Utilization of Spark Jobs Using Alluxio from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer