Alluxio Virtualizes Distributed Storage for Petabyte Scale Computing at In-Memory Speeds
February 23, 2016

Supported by Alibaba, Baidu, Barclays, IBM, EMC, Intel and Other Industry Leaders, Alluxio Is the Next Major Innovation Out of UC Berkeley's AMPLab

SAN MATEO, CA -- (Marketwired - Feb 23, 2016) -- Alluxio (formerly known as Tachyon), the world's first memory-centric virtual distributed storage system, today announced its open source version 1.0 release. The vision for Alluxio is to become the de-facto storage unification layer for big data and other scale-out application environments in the same manner that Apache Spark became the standard computation layer.

Alluxio's memory-centric architecture provides orders of magnitude performance gains over existing solutions and superior manageability by allowing developers to interact with a single storage layer API without worrying about the configurations and complexities of underlying storage and file systems.

Co-created by Haoyuan Li, CEO of Alluxio, Inc. and a founding committer of Spark, Alluxio ushers in the next generation of storage virtualization for petabyte scale computing.

"A storage unification layer that bridges computation frameworks and underlying storage systems is long overdue in the enterprise," said Haoyuan Li. "Alluxio is that unification layer with a memory-centric architecture. Alluxio enables any framework to access any data, from any storage at memory speeds."

Organizations can run any computation framework (e.g. Apache Spark, Apache MapReduce, Presto, etc.) with any storage system (e.g. Amazon S3, EMC, Google Cloud Storage, NetApp) and utilize any storage media (DRAM, SSD, HDD, etc.). As a memory-centric system, Alluxio yields orders of magnitude performance gains and manageability for existing configurations.

Only three years in existence, Alluxio has gained broad industry support as an open source project. With more than 200 contributors, 12,000 commits, and over 50 commercial organizations, Alluxio has surpassed many other open source projects in the same timeframe. Alluxio runs in production at some of the largest cloud providers for petabyte scale workloads, in financial services to meet government regulations, for research by leading universities, and at technology vendors globally.

Intel recently published its findings on the diverse range of big data storage challenges that Alluxio can address. "Big data analytics is driving new requirements for distributed memory across clusters for real-time streaming, interactive queries, analytics and graph processing," said Michael Greene, Intel vice president, Software and Services Group and general manager of System Technologies and Optimization. "We are excited to work with developer communities on Alluxio and to optimize Alluxio solutions on Intel platforms. Ultimately, this helps our customers create more innovative and high performance cloud and big data solutions."

In financial services, Alluxio brings many advantages. It helps banks make faster and better trading decisions through dramatic performance improvements and also helps satisfy regulatory requirements. Barclays, the global financial services firm with 48 million customers and clients, recently published a report about how it uses Alluxio to boost big data analytics performance without duplicating confidential customer information to disk.

Last summer, IBM Research published a study about using Tachyon for "ultra-fast big data processing" to overcome "critical bottlenecks for system workloads."

For some of the world's cloud computing giants, Alluxio is allowing business analysts to discover insights interactively by analyzing petabytes of data in near real-time to improve customer experience. "As one of the largest Internet companies in the world, Baidu constantly faces the challenges of managing data at multi-petabyte scale. By adopting innovative technologies like Alluxio we are able to help our users extract meaningful and useful data almost instantly," said James Peng, Chief Architect at Baidu. "Our deployment of an Alluxio cluster has already reached 1,000 workers, which is one of the largest Alluxio clusters in the world. The tiered storage of Alluxio has provided us great flexibility in managing data in large-scale. We are seeing an average 10-fold, and up to 30-fold performance improvement in supporting interactive query system and other types of workloads. This greatly improved the speed in making important business decisions."

"As the cloud computing business for Alibaba Group, the world's leading e-commerce business, Alibaba manages many of the world's largest data centers, including the largest big data cluster ever built in China," said Wensong Zhang, CTO and Senior Research Fellow of AliCloud, founder of Linux Virtual Server. "With Alluxio combined with AliCloud OSS as well as other AliCloud cloud service products, our customers can leverage the technology trends of hardware to run important jobs at the fastest performance. We have been contributing to the Alluxio open source community and believe that Alluxio will play a critical role in the future of big data infrastructure."

Background
As a PhD candidate at UC Berkeley, Haoyuan Li saw Spark adoption driving the requirements for more developer-friendly methods for how big data frameworks access persistent data at in-memory speeds. Formerly known as Tachyon, the Alluxio system quickly gained prominence in use cases that required in-memory storage speeds for Spark computation and received early backing from enterprise software and storage leaders, including EMC and Pivotal.

Where storage and file systems have historically required high customization and tuning, Alluxio brings a unified interface that's intuitive for developers, easy for operators, and delivers unprecedented speeds for data access to support the broadest range of big data use cases such as machine learning, real-time analytics and streaming data.

"As a layer that abstracts away the differences of existing storage systems from the cluster computing frameworks such as Apache Spark and Hadoop MapReduce, Alluxio can enable the rapid evolution of the big data storage, similarly to the way the Internet Protocol (IP) has enabled the evolution of the Internet," said Prof. Ion Stoica, co-author of Spark, co-founder and executive chairman of DataBricks, co-director of UC Berkeley AMPLab and Ph.D. co-advisor to Haoyuan Li.

"AMPLab has created some of the most important open source technologies in the new big data stack, including Apache Spark," said Michael Franklin, Professor of Computer Science and Director of the AMPLab at UC Berkeley. "Alluxio is the next project with roots in the AMPLab to have major impact. We see it playing a huge disruptive role in the evolution of the storage layer to handle the expanding range of big data use cases."

To protect the project from potential trademark litigation and to preserve the intellectual property of the open source software community contributions internationally, the community changed the project name from Tachyon to Alluxio. A newly-created non-profit organization, Alluxio Open Foundation, will host the project.

In 2015, Andreessen Horowitz invested $7.5M in Alluxio Inc., which has since assembled a team consisting of the world's leading distributed computing experts from Carnegie Mellon University, Google, Palantir, UC Berkeley AMPLab and VMWare to continue to innovate and realize the vision for Alluxio.

About Alluxio

Alluxio, a leading provider of the high performance data platform for analytics and AI, accelerates time-to-value of data and AI initiatives and maximizes infrastructure ROI. Uniquely positioned at the intersection of compute and storage systems, Alluxio has a universal view of workloads on the data platform across stages of a data pipeline. This enables Alluxio to provide high performance data access regardless of where the data resides, simplify data engineering, optimize GPU utilization, and reduce cloud and storage costs. With Alluxio, organizations can achieve magnitudes faster model training and serving without the need for specialized storage, and build AI infrastructure on existing data lakes. Backed by leading investors, Alluxio powers technology, internet, financial services, and telecom companies, including 9 out of the top 10 internet companies globally. To learn more, visit www.alluxio.io.

Media Contact:
Beth Winkowski
Winkowski Public Relations, LLC for Alluxio
978-649-7189
beth@alluxio.com

News & Press

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer