2021 by the numbers
Let’s take a look at the Alluxio project growth by the numbers:
- 8 Alluxio Community Day 🍕 meetups
- 84 Live community developer sync meetings and online
- Office Hours
- 27 Webinars
- 62 blog ✍️ posts in multiple languages
- 5 new PMC
- members and 1 new PMC maintainer
- 2 new
- Committers promoted
- 983 pull requests ✅ merged 💻 in GitHub with 308 coming from community contributors
- 3144 new members 👋 and 24531 messages in Slack
- 512 issues 📝 created in GitHub
- 11 Alluxio 🚀 releases published
Alluxio Day – community oriented, user driven, developer centered
Though 2021 continues to be a tough year with COVID, the community persisted in getting together virtually. We kicked off the year by starting a new online community series called Alluxio Day; aimed at bringing users together to share their stories and experiences with one another and providing a platform for more connections and technical collaborations among contributors and users. Throughout the year, we were pleasantly surprised by how the community welcomed and embraced this new way of getting together. The Alluxio Day community virtual meetup resulted in 37 high quality technical talks with 51 speakers from all over the world, highlighting not just the Alluxio community, but also other popular open source projects such as Presto, Apache Hudi, Apache Iceberg, Apache Spark, and public cloud vendors like Alibaba Cloud, Microsoft Azure, and Tencent Cloud.
Faster release cycles to enable innovations/improvements
In 2021, Alluxio continues to increase its popularity for big-data analytics applications. Our open source community is working closely with engineers from companies such as Facebook and Uber to optimize their Presto workloads with Alluxio as the data caching layer. Learn more from the whitepaper Presto with Alluxio Overview – Architecture Evolution for Interactive Queries.
We also observed another trend this year, where Alluxio is being adopted for data-intensive AI/ML workloads to provide both distributed, high-performance I/O and data management functionalities across users and cloud providers. For this emerging workload, we have been working closely with Alibaba Cloud, Microsoft Azure, Nanjing University, Tencent Cloud, and many other contributors to create and optimize JNI-based Alluxio POSIX clients. For more details, check out this whitepaper Accelerating Machine Learning / Deep Learning in the Cloud: Architecture and Benchmark
As users scale up their Alluxio usage with more complicated workloads, it is increasingly more challenging to optimize the system. To meet these needs, Alluxio improved its scalability significantly in 2021, all the way from deploying Alluxio on clusters with thousands of nodes to loading data sets with billions of files in production workloads. Users like Tencent is running a single Alluxio cluster with 1000+ Alluxio nodes to speed up AI applications.
We are thrilled to see a rapid growth in Alluxio user numbers, and we are thankful for the community feedback. More than 500 github issues were created in 2021 and most of them are from the community users. To respond to the feedback—especially bug reports—agilely, we started experimenting with releasing our software in a faster cadence. Through 2021, a total 11 releases are generated.
Thriving community with remarkable user contributions
From its inception, the Alluxio Open Source project follows a merit-driven “Contributor-Committer-Member” progression as the central governing process. The Alluxio Open Source project welcomed 5 new Project Committee Members who contributed significantly to the project’s growth. 2 individual contributors were granted Committer status by the PMC committee. More than 60 new contributors were recognized, whose contributions were not limited to code and documentation, but also a myriad of valuable initiatives including project promotion, technical sharing, Q&A support, user blogs, beta test of new releases, etc. All efforts contributed to Alluxio’s growth and awareness across the globe. If you are interested in becoming a contributor,or a committer, or a PMC member, please check out our guide on github.
The Alluxio Open Source community would like to give a special thanks to the Tencent Alluxio OTeam led by Baolong Mao, a PMC Maintainer of the Project. This year, the Tencent Alluxio OTeam has contributed significantly in improving or creating features such as JNI-fuse, dynamically updating configuration, and UFS modules for cephfs-hadoop and ozone. On top of this, two contributors from the OTeam were granted Committer level by the Project. These two individuals alone contributed more than 3000+ lines of code to the Alluxio code base.
We would like to further express our appreciation to all the top community users who provided valuable feedback as we work together to make this project more adaptable and stable across environments. These user groups are: Facebook(talk), Uber(talk), Tiktok(talk), Microsoft(blog,talk), Tencent(talk), Alibaba(talk), Robinhood(talk), BossZP(talk), Bilibili, MOMO(talk), JD.com, Shopee, Intel(talk), NVidia(blog,talk), WeRide(talk), T3Go(blog), Unisound(blog), etc.
Diversity in the community by empowering women
Diversity is important to the Alluxio Open Source community. We believe in achieving greater equality in society and strongly encourage a wider range of young people to consider careers in tech regardless of their gender or race. We are proud to empower female leaders like Peijie Zhou and Lu Qiu with the support of their colleagues and the community.
Peijie Zhou is an infrastructure engineer at BossZP China and a top community contributor of the Alluxio Open Source project. Currently, Peijie leads a small team working on improving the stability and performance of Alluxio in machine learning and deep learning training.
Lu Qiu is a PMC maintainer and a Machine Learning Engineer at Alluxio. She currently leads a bi-weekly Special Interest Group discussions on AI and Machine Learning workloads. Check out her presentation here.
2022, here we come!
As we head into 2022, the Alluxio team is setting lofty goals to accelerate development, expanding existing work with our users and adopting new popular use cases with other open source communities in our ecosystem.
We are certain that the new year will be filled with new users, new features, and countless feedback from the Alluxio Open Source community. Thank you all for sharing your experiences and joining us for this incredible journey. Wishing you the best in 2022 and looking forward to a fruitful new year!
Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.