Alluxio strengthens its commitment to Open Source and announces preview of Alluxio 2.0 to simplify data access for cloud workloads

March 7, 2019

Community Feedback Invited on Biggest Alluxio Release Ever, Including Support for More Than One Billion Files and New Alluxio POSIX API for AI Applications

SAN MATEO, Calif., March 07, 2019 (GLOBE NEWSWIRE) -- Alluxio, developer of the world's first software system that unifies data at memory speed, today announced the preview release of Alluxio 2.0, the most ambitious platform upgrade since the inception of Alluxio. This preview release, now available for free download, is the largest open source release with the most new features added since the creation of the project and is designed to allow the community to experiment with new capabilities and explore Alluxio for new use cases such as simplifying data engineering and access for AI model training.

“Today, our users already deploy Alluxio at very large scale with many thousand node single cluster production deployments across telecommunications, retail and internet companies,” said Haoyuan Li, CEO and co-founder of Alluxio. “This release allows our users to take Alluxio deployments to the next level of scale with support for extreme data requirements. Our users as well as the data engineering community will find a much more intuitive interface with greatly expanded capabilities to help them run analytics and AI workloads on private, public or hybrid cloud infrastructures leveraging valuable data wherever it might be stored.”

“At China Unicom, we use Alluxio at scale as a core component of our modern data stack along with Apache Spark, HDFS, Hive and Apache Kafka. We are excited about Alluxio 2.0, particularly the new metadata management and scale out capabilities that will allow us to continue elastically scaling our deployment for the explosive data growth we see coming,” said Ce Zhang, Senior Software Engineer at China Unicom Software Research Institute.

“AVA -- our cloud-native deep learning platform -- is built on Tensorflow, Caffe, Alluxio and KODO (a customized object store and CEPH). Alluxio orchestrates data movement from storage systems to data science environments, eliminating complex data engineering tasks and speeding up model training. Alluxio 2.0’s improved file system API to access data stored in any storage system will allow for accelerating machine learning training even further for faster innovation,” said Chaoguang Li, Technical Director of Atlab at Qiniu Cloud.

The Alluxio 2.0 preview release provides new features across critical key areas:

Support for hyperscale data workloads:

Support for more than 1 billion files - New option for tiered metadata storage for files and objects enabling the unified namespace to scale to more than a billion files with metadata for hot data stored in the process memory while the rest is managed by Alluxio outside the process memory.
Highly distributed data services - 2.0 introduces the Alluxio Job Service, a distributed clustered service, that data operations such as replication, persistence, cross storage move and distributed load now use, for enabling high performance and massive scale.
Adaptive replication for increased data locality - New feature to configure a range for the number of copies of data stored in Alluxio that are automatically managed.
High availability with embedded journal - A new fault tolerance and high availability mode for file and object metadata called the embedded journal that uses the RAFT consensus algorithm and is independent of any other external storage systems. This is particularly helpful for abstracting object storage.

Enabling machine learning and deep learning workloads on any storage:

Machine learning and deep learning frameworks need to extract data from Hadoop and object stores, typically a very manual and time consuming process.

Alluxio POSIX API. Alluxio’s FUSE feature enables a POSIX compatible API so that frameworks like Tensorflow, Caffe and other Python-based models can directly access data from any storage system via Alluxio using traditional file system access.

Better storage abstraction for completely independent and elastic compute:

Support for HDFS clusters across different versions - Explosive growth of data has led enterprises to have many data silos including multiple Hadoop clusters across many different versions. Unified access across these clusters is currently very difficult. With Alluxio 2.0, users can connect to multiple HDFS clusters with any version to Alluxio and unify data access across them.
Active sync with Hadoop - New capability integrates with HDFS iNotify to update any data and metadata changes that happen to files stored in Hadoop allowing for applications accessing data via Alluxio to proactively receive the latest updates.

For more information on the biggest advance in Alluxio capabilities ever, register here for the free San Francisco Bay Area 2.0 Preview Meetup on March 14, 2019 and the free 2.0 Preview webinar on March 28, 2019.

Resources:

‍

About Alluxio

Alluxio, a leading provider of the high performance data platform for analytics and AI, accelerates time-to-value of data and AI initiatives and maximizes infrastructure ROI. Uniquely positioned at the intersection of compute and storage systems, Alluxio has a universal view of workloads on the data platform across stages of a data pipeline. This enables Alluxio to provide high performance data access regardless of where the data resides, simplify data engineering, optimize GPU utilization, and reduce cloud and storage costs. With Alluxio, organizations can achieve magnitudes faster model training and serving without the need for specialized storage, and build AI infrastructure on existing data lakes. Backed by leading investors, Alluxio powers technology, internet, financial services, and telecom companies, including 9 out of the top 10 internet companies globally. To learn more, visit www.alluxio.io.

Media Contact:
Beth Winkowski
Winkowski Public Relations, LLC for Alluxio
978-649-7189
beth@alluxio.com

News & Press

Alluxio Enterprise AI 3.5 Enhances AI Workflows with Breakthrough Cache Mode, Distributed Cache Management, and Python SDK Integration

2025 Observability Predictions - Part 2

2025: The Year of Sustainable Data Centers? 5 Predictions for the Data Center Space

The Global Data Center Market achieved a valuation of $196.9 Billion in 2023. It is projected to exhibit steady growth, reaching $464.6 Billion by 2032, with a compound annual growth rate (CAGR) of 10.30% during the forecast period (2024–2032). However, resolving security, operational efficiency, and environmental impact issues will be critical to continuing this growth trajectory, reports Straits Research.

Here, experts in the field offer their predictions for what 2025 holds for data centers

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo