What's New in Alluxio 2.6: Better Performance for AIML Workloads plus Increased Operating Metrics Visibility

July 1, 2021

Adit Madan

The Alluxio team is thrilled to announce the release of v2.6 of the Alluxio Data Orchestration Platform. Version 2.6 for both the free open source Community Edition and Alluxio Enterprise Edition are now generally available.

Alluxio 2.6 significantly improves the performance of data-intensive AI/ML workloads across any storage, and also improves the general maintainability and visibility of Alluxio clusters, especially for large-scale deployments. We have taken the feedback and contributions from the community and introduced features which simplify deployment, introduce new data management capabilities, optimize performance, and provide enhanced visibility into system behavior.

With this release, Alluxio expands the spectrum of data-driven workloads that benefit from Alluxio’s data orchestration capabilities. At the same time, time to production and time to value are significantly reduced with improved visibility and ease of operation.

Free downloads Alluxio Community Edition and free trials Alluxio Enterprise Edition can be found here. Join thousands of members in our Slack channel to ask any questions and provide your feedback. And thank you to everyone who contributed to this release.

Streamlined Data Orchestration for AI/ML Workloads

Performance is a key benefit Data Orchestration brings to AI/ML workloads, and Alluxio has provided significant value to users in that respect.

Alluxio 2.6 builds upon the work done in Alluxio 2.5 to provide a complete solution for AI/ML workloads. The improvements span from the very first task of deploying Alluxio to the ever difficult goal of monitoring the system after it is running production workloads.

From a deployment perspective, most users used a containerized approach, often leveraging Kubernetes for container orchestration. To simplify deployments, we combined the Alluxio worker and FUSE processes. Users no longer need to configure multiple processes to be housed in the same Kubernetes pods or use other workarounds to ensure the availability of both the Alluxio worker and FUSE process on all of the required nodes.

Another benefit of the consolidation of the two processes is the ability to avoid inter-process communication. The reduction of communication overhead showed significant performance improvements for workloads which had a large number of small files. This happens to be a common case for training workloads such as image recognition.

Finally, improvements to the user experience of data loading through Alluxio native commands such as distributed load makes setting up training data much easier. This avoids the need for custom scripts or another system to prepare the data into Alluxio cache.

Improved System Visibility

Data Orchestration frameworks are involved in many mission-critical workflows. Therefore, visibility into Alluxio’s system status is paramount to successful operation, maintenance, and optimization of the system.

Alluxio 2.6 takes system visibility to the next level, providing detailed information enabling users to drill down into specific component behavior and trace request handling and timing. These new metrics and capabilities provide a much better toolkit for devops when troubleshooting a problem that has been narrowed down to a subset of the system.

System administrators should still rely on general statistics for Alluxio system observability. Alluxio 2.6 provides templates for common monitoring dashboards like Grafana so new deployments will have a quick start for tracking Alluxio’s health. From the collaboration and reports of Alluxio users, we have also added documentation for interpreting the default metrics and how to adjust system configuration or capacity accordingly.

System visibility is a key focus for the Alluxio project. In coming releases, we plan to further improve visibility by introducing logical Alluxio metrics such as file and block access rates, job progress and history, and cluster load heatmaps.

More Info

You can find more information in the 2.6.0 official release notes. Have questions? Come join the Community Slack Channel.

Share this post

Blog

Uptycs Chooses Alluxio to Power GenAI Natural Language Analytics at Terabyte Scale

Suresh Kumar Veerapathiran and Anudeep Kumar, engineering leaders at Uptycs, recently shared their experience of evolving their data platform and analytics architecture to power analytics through a generative AI interface. In their post on Medium titled Cache Me If You Can: Building a Lightning-Fast Analytics Cache at Terabyte Scale, Veerapathiran and Kumar provide detailed insights into the challenges they faced (and how they solved them) scaling their analytics solution that collects and reports on terabytes of telemetry data per day as part of Uptycs Cloud-Native Application Protection Platform (CNAPP) solutions.

AI/ML Infra Meetup at Uber Seattle: Tackling Scalability Challenges of AI Platforms

Insights from from Uber, Snap, and Alluxio on LLM training, fine-tuning, deployment, designing scalable architectures, GPU optimization, and building recommendations systems.

New Features in Alluxio Enterprise AI 3.5

With the new year comes new features in Alluxio Enterprise AI! Just weeks into 2025 and we are already bringing you exciting new features to better manage, scale, and secure your AI data with Alluxio. From advanced cache management and improved write performance to our Python SDK and S3 API enhancements, our latest release of Alluxio Enterprise AI delivers more power and performance to your AI workloads. Without further ado, let’s dig into the details.

‍

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo