Alluxio, formerly Tachyon, began as a research project when I was a Ph.D. student at UC Berkeley’s AMPLab in 2012. At the time, Spark and Mesos were taking off. We saw what Spark and Mesos could do for compute and resource management respectively, while the storage piece of this story was missing. Together with my research group, we started investigating how to enable memory speed data sharing across different applications.
I built the first version of Alluxio during Christmas of 2012, and open sourced it in April 2013. Two years later, Alluxio, Inc. was founded, receiving a $7.5 million investment from Andreessen Horowitz, to realize the vision of Alluxio becoming the de facto storage unification layer for big data and other scale out application environments and to provide a commercial backer for the project.
Today, we are very excited to announce the 1.0 release of Alluxio, the world’s first memory-centric virtual distributed storage system, which unifies data access and bridges computation frameworks and underlying storage systems. Applications only need to connect with Alluxio to access data stored in any underlying storage systems. Additionally, Alluxio’s memory-centric architecture enables data access orders of magnitude faster than existing solutions.
Now, organizations can run any computation framework (Apache Spark, Apache MapReduce, Apache Flink, etc.) with any storage system (Alibaba OSS, Amazon S3, OpenStack Swift, GlusterFS, Ceph, etc.), leveraging any storage media (DRAM, SSD, HDD, etc.).
What we have accomplished
Over the past three years, Alluxio has evolved from a small codebase for research prototyping into a stable and reliable system, with a vibrant community, deployed by companies around the world. Since the first open source release, our community has grown from 1 contributor to more than 200 contributors from over 50 companies. There are production deployments of Alluxio with hundreds of machines. Our meetup group has grown to more than 800 people, and the most recent meetup had over 300 registrants. The number of commits has grown from 200 to more than 12,000.
Beyond the numbers, we have seen that Alluxio is solving critical problems in enterprises across different industries around the world. For example, search giant Baidu has been running Alluxio in their production for more than a year. Alluxio brings them 30X performance improvement. Barclays, a world leading bank, uses Alluxio to make the impossible possible, by reducing the end-to-end latency from hours to seconds. Public cloud providers such as Alibaba and RackSpace have also shown how Alluxio virtualizes their object storage systems. Intel has published articles to showcase several ways to leverage Alluxio in their customers’ environments. IBM presented how Alluxio can abstract OpenStack storage to enable fast data analytics.
It has been exciting to see the project adoption grow from zero companies to many, including various industry leaders. The achievements thus far validate the tremendous potential of Alluxio and demonstrate the industry and community’s great excitement around it.
What to look forward to
Alluxio and its community have grown tremendously in many aspects over the past three years. With the increase in adoption of Alluxio and the growing community, we established a nonprofit organization, Alluxio Open Foundation, to provide a better venue for the project -- stay tuned for the details. The project has been rebranded from Tachyon to Alluxio to protect it from potential trademark litigation and to preserve the intellectual property of the open source software community’s contributions internationally.
Furthermore, in response to growing demand for a forum to communicate and learn about Alluxio, we are planning for the first Alluxio Conference to take place later this year in the San Francisco Bay Area. If you are interested in presenting, attending, or sponsoring, please let us know.
Kudos to the Alluxio community for all we have achieved. Let us look forward to the future!
Further reading:
Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.