Introducing Alluxio Enterprise AI and A Vision Beyond Unintelligent Storage
October 18, 2023
By
Adit Madan

We take great pride in the Alluxio Data Platform serving many of the most critical data-driven applications in the world as we speak today. Each of us interact with platforms empowered by Alluxio on a daily basis, and unknowingly you are as well. 

From the voice assistant we speak to, the bank we transact with, the healthcare provider we rely on, to the telecommunications & internet provider we can't live without. The largest enterprises and internet giants in the world are using the technology we’ve built in the last 8 years, and now we are at the next stage in realizing our vision with the launch of Alluxio Enterprise AI.

Alluxio holds a unique position in the data stack, neither as a compute engine nor just another storage system, but instead sitting right at the intersection of compute and storage. By being close to storage, we have a universal view of the workloads on the data platform across stages of a data pipeline. This is the knowledge we tap into. Being close to compute is what makes the Alluxio Data Platform smart, by tapping into a view of what the applications on the compute engines are trying to achieve. Leveraging this unique position is what differentiates us from the myriad of offerings in the market.

We built Alluxio Enterprise, a software defined solution in the data infrastructure stack, to bring efficiency and ease of access to data for growing data platforms. Ever since the early days, even before the term “data lake” was coined, Alluxio has always truly embraced the separation of storage and compute. Our existing customers use Alluxio Enterprise to scale out the capacity of their data platforms, adopt a hybrid- &  multi-cloud strategy, and even lower down costs with immense agility. We started by targeting data & analytics, and have steadily expanded our footprint in machine learning and AI.

Today we power critical data-intensive AI workloads, including deep learning & generative AI, in some of the largest technology-first global enterprises. The new product, Alluxio Enterprise AI, is focussed on the Deep Learning segment of Machine Learning, commonly used to train Computer Vision, Natural Language Processing and multi-modal models prevalent across industries such as autonomous driving and much more.  When we say model training, we include both training of models from scratch & also fine-tuning a base model.

Alluxio Enterprise has been used by Data & Analytics Platform teams in large enterprises, working with multi-petabyte data lakes. Oftentimes, Alluxio itself is managing hundreds of terabytes to nearly a petabyte of active data at any given moment from the primary data lake. With Alluxio Enterprise AI, AI Platform teams utilizing tools like PyTorch should expect to benefit from high compute efficiency and increased capacity with specialized compute infrastructure like GPU servers. Not just the largest enterprises in the world, but anyone scaling beyond training on a single GPU server to a cluster operating on tens or hundreds of terabytes will benefit.

Alluxio Enterprise AI has a revolutionary new architecture, that we call DORA (Decentralized Object Repository Architecture), purpose built for the needs of deep learning and generative AI. High-performance I/O is not a new problem. Parallel File Systems have been around for decades serving HPC-style applications like weather simulations and now being repurposed for AI. What makes DORA unique is that we have designed the system as a cache instead of storage, bringing data close to data hungry compute as needed when operating directly on top of the primary data lake without redundancy. The new architecture scales out the limit of the data centers and does not require dedicated hardware other than what is available on the compute cluster itself.

Alluxio Enterprise AI is also intelligent as we utilize knowledge of what the applications above are trying to achieve, with features such as pre-loading of data into the cache based on the specific data access patterns. Alluxio Enterprise AI also exposes interfaces, such as POSIX and a REST API, commonly required PyTorch and other commonly used compute engines in this space. The interfaces provide an order of magnitude improvement in I/O throughput and significant end-to-end performance improvements.

With the launch of Alluxio Enterprise AI, we are officially expanding our product portfolio to include Alluxio Enterprise Data and Alluxio Enterprise AI. Alluxio Enterprise Data is the next-gen version of Alluxio Enterprise Edition, and will continue to be the ideal choice for businesses focused primarily on analytic workloads. 

I hope you share our excitement about the new product, and I invite you to learn more about the new offering:

Share this post

Blog

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer