Community Office Hour: Improving Memory Utilization of Spark Jobs Using Alluxio

November 26, 2019

Bin Fan

VP of Technology

Alluxio

Apache Spark has been widely adopted for in-memory data analytics at scale, however, efficient memory utilization is a common challenge, and users will either run out of memory or experience low and unstable performance. Many Spark users may not be aware of the differences in memory utilization between caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio. In this office hour, I will highlight the two approaches with a demo and open up for discussions

In this Office Hour we’ll go over:

How to run Spark shell with Alluxio such that Spark jobs
A demo to compare the memory usage between Spark cache and using Alluxio as the external off-heap caching service
Open Session for discussion on any topics such as running Presto on Alluxio, and more

ALLUXIO COMMUNITY OFFICE HOUR

In this Office Hour we’ll go over:

How to run Spark shell with Alluxio such that Spark jobs
A demo to compare the memory usage between Spark cache and using Alluxio as the external off-heap caching service
Open Session for discussion on any topics such as running Presto on Alluxio, and more

Video:

Presentation slides:

Improving Memory Utilization of Spark Jobs Using Alluxio from Alluxio, Inc.

‍

In this Office Hour we’ll go over:

How to run Spark shell with Alluxio such that Spark jobs
A demo to compare the memory usage between Spark cache and using Alluxio as the external off-heap caching service
Open Session for discussion on any topics such as running Presto on Alluxio, and more

Videos:

Presentation Slides:

Community Office Hour: Improving Memory Utilization of Spark Jobs Using Alluxio from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

GTC 2025 | Alluxio Decouples Storage and Compute for a Faster AI Future

April 9, 2025

Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distributed Storage

Deepseek’s recent announcement of the Fire-flyer File System (3FS) has sparked excitement across the AI infra community, promising a breakthrough in how machine learning models access and process data.

In this webinar, an expert in distributed systems and AI infrastructure will take you inside Deepseek 3FS, the purpose-built file system for handling large files and high-bandwidth workloads. We’ll break down how 3FS optimizes data access and speeds up AI workloads as well as the design tradeoffs made to maximize throughput for AI workloads.

This webinar you’ll learn about how 3FS works under the hood, including:

✅ The system architecture

✅ Core software components

✅ Read/write flows

✅ Data distribution/placement algorithms

✅ Cluster/node management and disaster recovery

Whether you’re an AI researcher, ML engineer, or infrastructure architect, this deep dive will give you the technical insights you need to determine if 3FS is the right solution for you.

‍

April 1, 2025

AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendation Applications

March 6, 2025

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

ALLUXIO COMMUNITY OFFICE HOUR

Videos:

Presentation Slides:

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer