Scalable and Highly-available Distributed File System Metadata Service Using gRPC, RocksDB and RAFT

April 7, 2020

Bin Fan

VP of Technology

Alluxio

Alluxio (alluxio.io) is an open-source data orchestration system that provides a single namespace federating multiple external distributed storage systems. It is critical for Alluxio to be able to store and serve the metadata of all files and directories from all mounted external storage both at scale and at speed.

This talk shares our design, implementation, and optimization of Alluxio metadata service (master node) to address the scalability challenges. Particularly, we will focus on how to apply and combine techniques including tiered metadata storage (based on off-heap KV store RocksDB), fine-grained file system inode tree locking scheme, embedded state-replicate machine (based on RAFT), exploration and performance tuning in the correct RPC frameworks (thrift vs gRPC) and etc. As a result of the combined above techniques, Alluxio 2.0 is able to store at least 1 billion files with a significantly reduced memory requirement, serving 3000 workers and 30000 clients concurrently.

In this Office Hour, we will go over how to:

Metadata storage challenges
How to combine different open source technologies as building blocks
The design, implementation, and optimization of Alluxio metadata service

ALLUXIO COMMUNITY OFFICE HOUR

In this Office Hour, we will go over how to:

Metadata storage challenges
How to combine different open source technologies as building blocks
The design, implementation, and optimization of Alluxio metadata service

Video:

Slides:

Scalable and High available Distributed File System Metadata Service Using gRPC, RocksDB and RAFT from Alluxio, Inc.

‍

In this Office Hour, we will go over how to:

Metadata storage challenges
How to combine different open source technologies as building blocks
The design, implementation, and optimization of Alluxio metadata service

Videos:

Presentation Slides:

Scalable and Highly-available Distributed File System Metadata Service Using gRPC, RocksDB and RAFT from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

GTC 2025 | Alluxio Decouples Storage and Compute for a Faster AI Future

April 9, 2025

Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distributed Storage

Deepseek’s recent announcement of the Fire-flyer File System (3FS) has sparked excitement across the AI infra community, promising a breakthrough in how machine learning models access and process data.

In this webinar, an expert in distributed systems and AI infrastructure will take you inside Deepseek 3FS, the purpose-built file system for handling large files and high-bandwidth workloads. We’ll break down how 3FS optimizes data access and speeds up AI workloads as well as the design tradeoffs made to maximize throughput for AI workloads.

This webinar you’ll learn about how 3FS works under the hood, including:

✅ The system architecture

✅ Core software components

✅ Read/write flows

✅ Data distribution/placement algorithms

✅ Cluster/node management and disaster recovery

Whether you’re an AI researcher, ML engineer, or infrastructure architect, this deep dive will give you the technical insights you need to determine if 3FS is the right solution for you.

‍

April 1, 2025

AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendation Applications

March 6, 2025

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

ALLUXIO COMMUNITY OFFICE HOUR

Videos:

Presentation Slides:

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer