The Alluxio team is thrilled to announce the release of v2.6 of the Alluxio Data Orchestration Platform. Version 2.6 for both the free open source Community Edition and Alluxio Enterprise Edition are now generally available.
Alluxio 2.6 significantly improves the performance of data-intensive AI/ML workloads across any storage, and also improves the general maintainability and visibility of Alluxio clusters, especially for large-scale deployments. We have taken the feedback and contributions from the community and introduced features which simplify deployment, introduce new data management capabilities, optimize performance, and provide enhanced visibility into system behavior.
With this release, Alluxio expands the spectrum of data-driven workloads that benefit from Alluxio’s data orchestration capabilities. At the same time, time to production and time to value are significantly reduced with improved visibility and ease of operation.
Free downloads Alluxio Community Edition and free trials Alluxio Enterprise Edition can be found here. Join thousands of members in our Slack channel to ask any questions and provide your feedback. And thank you to everyone who contributed to this release.
Streamlined Data Orchestration for AI/ML Workloads
Performance is a key benefit Data Orchestration brings to AI/ML workloads, and Alluxio has provided significant value to users in that respect.
Alluxio 2.6 builds upon the work done in Alluxio 2.5 to provide a complete solution for AI/ML workloads. The improvements span from the very first task of deploying Alluxio to the ever difficult goal of monitoring the system after it is running production workloads.
From a deployment perspective, most users used a containerized approach, often leveraging Kubernetes for container orchestration. To simplify deployments, we combined the Alluxio worker and FUSE processes. Users no longer need to configure multiple processes to be housed in the same Kubernetes pods or use other workarounds to ensure the availability of both the Alluxio worker and FUSE process on all of the required nodes.
Another benefit of the consolidation of the two processes is the ability to avoid inter-process communication. The reduction of communication overhead showed significant performance improvements for workloads which had a large number of small files. This happens to be a common case for training workloads such as image recognition.
Finally, improvements to the user experience of data loading through Alluxio native commands such as distributed load makes setting up training data much easier. This avoids the need for custom scripts or another system to prepare the data into Alluxio cache.
Improved System Visibility
Data Orchestration frameworks are involved in many mission-critical workflows. Therefore, visibility into Alluxio’s system status is paramount to successful operation, maintenance, and optimization of the system.
Alluxio 2.6 takes system visibility to the next level, providing detailed information enabling users to drill down into specific component behavior and trace request handling and timing. These new metrics and capabilities provide a much better toolkit for devops when troubleshooting a problem that has been narrowed down to a subset of the system.
System administrators should still rely on general statistics for Alluxio system observability. Alluxio 2.6 provides templates for common monitoring dashboards like Grafana so new deployments will have a quick start for tracking Alluxio’s health. From the collaboration and reports of Alluxio users, we have also added documentation for interpreting the default metrics and how to adjust system configuration or capacity accordingly.
System visibility is a key focus for the Alluxio project. In coming releases, we plan to further improve visibility by introducing logical Alluxio metrics such as file and block access rates, job progress and history, and cluster load heatmaps.
More Info
You can find more information in the 2.6.0 official release notes. Have questions? Come join the Community Slack Channel.
Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.