Store 1 Billion Files in Alluxio 2.0

April 9, 2019

Alluxio 2.0 is designed to support 1 billion files in a single file system namespace. Namespace scaling is vital to Alluxio for a couple of reasons:

Alluxio provides a single namespace where multiple storage systems can be mounted. So the size of Alluxio's namespace is the sum of the sizes of all mounted storages.
Object storage is increasing in popularity, and object stores often hold many more small files compared with filesystems like HDFS.

In Alluxio 1.x, the namespace was limited to around 200 million files in practice. Scaling further would cause garbage collection issues due to the limit of the Alluxio master JVM heap size. Also, storing 200 million files would require a large memory footprint (around 200GB) of JVM heap. To scale the Alluxio namespace in 2.0, we added support for storing part of the namespace on disk in RocksDB. Recently-accessed data is stored in memory, while older data ends up on disk. This reduces the memory requirements for serving the Alluxio namespace, and also takes pressure off of the Java garbage collector by reducing the number of objects it needs to deal with.

Setting up off-heap metadata

Off-heap metadata is the default in Alluxio 2.0, so no action is required to start using it. If you want to go back to storing all metadata in-memory, set in conf/alluxio-site.propertiesalluxio.master.metastore=HEAP # default is ROCKS for off-heap The default location for off-heap metadata is ${alluxio.work.dir}/metastore, which is usually under the alluxio home directory. You can configure the location by setting conf/alluxio-site.propertiesalluxio.master.metastore.dir=/path/to/metastore

In-memory metadata storage

While Alluxio has the ability to store all of its metadata on disk, this would result in reduced performance due to the relative slowness of disk compared to memory. To alleviate this, Alluxio keeps a large number of files in memory for fast access. When the number of files grows close to the maximum cache size, Alluxio evicts the less-recently-used files onto disk. Users can trade off between memory usage and speed by setting conf/alluxio-site.propertiesalluxio.master.metastore.inode.cache.max.size=10000000 A cache size of 10 million requires about 10GB of heap.

Sizing Memory

For the best performance, the metadata of the working set should fit within the master's memory. Estimate the number of files in your working set, and multiply by 1KB per file. For example, for 5 million files, estimate 5GB of master memory. To configure master memory, add a jvm option in conf/alluxio-env.sh: ALLUXIO_MASTER_JAVA_OPTS+=" -Xmx5G" If you have the memory available, we recommend giving the master a 31GB heap in production deployments. Setting conf/alluxio-env.shALLUXIO_MASTER_JAVA_OPTS+=" -Xmx31G" Update the cache size to match the amount of memory allocated in conf/alluxio-site.properties# 31 million alluxio.master.metastore.inode.cache.max.size=31000000 This will accommodate most workloads, and avoids the need to switch to 64-bit pointers once the heap reaches 32GB.

Disk

Alluxio stores inodes on disk more compactly than it stores them in memory. Instead of 1KB per file, it takes about 500 bytes. To support 1 billion files stored on disk, provision an HDD or SSD with 500GB of space, and make sure the rocksdb metastore is using that disk. Setting conf/alluxio-site.propertiesalluxio.master.metastore.dir=/path/to/metastore

Conclusion

Alluxio 2.0 significantly improves metadata scalability by taking advantage of disk resources for cold metadata. This prepares Alluxio to handle larger namespace volumes as people mount more storages into Alluxio and make greater use of object storage. Stay tuned for a followup blog on the technical details of how we leverage RocksDB for storing our metadata.

Share this post

Blog

Uptycs Chooses Alluxio to Power GenAI Natural Language Analytics at Terabyte Scale

Suresh Kumar Veerapathiran and Anudeep Kumar, engineering leaders at Uptycs, recently shared their experience of evolving their data platform and analytics architecture to power analytics through a generative AI interface. In their post on Medium titled Cache Me If You Can: Building a Lightning-Fast Analytics Cache at Terabyte Scale, Veerapathiran and Kumar provide detailed insights into the challenges they faced (and how they solved them) scaling their analytics solution that collects and reports on terabytes of telemetry data per day as part of Uptycs Cloud-Native Application Protection Platform (CNAPP) solutions.

AI/ML Infra Meetup at Uber Seattle: Tackling Scalability Challenges of AI Platforms

Insights from from Uber, Snap, and Alluxio on LLM training, fine-tuning, deployment, designing scalable architectures, GPU optimization, and building recommendations systems.

New Features in Alluxio Enterprise AI 3.5

With the new year comes new features in Alluxio Enterprise AI! Just weeks into 2025 and we are already bringing you exciting new features to better manage, scale, and secure your AI data with Alluxio. From advanced cache management and improved write performance to our Python SDK and S3 API enhancements, our latest release of Alluxio Enterprise AI delivers more power and performance to your AI workloads. Without further ado, let’s dig into the details.

‍

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo