We are pleased to unveil the latest version of Alluxio. This new release represents a significant milestone to enhance system reliability under different kinds of resource limitations or stress scenarios, particularly to get the most out of limited hardware resources to scale at manageable costs.
Enhanced Functionality:
- Dramatic Improvements in High Availability (HA): Mission-critical applications for extreme scale will not be interrupted by Alluxio Master failovers. SLA improvements include a 20x improvement (95% reduction) in the maximum time it takes for Alluxio to start serving requests again after a restart, while most common scenarios will observe a 2x improvement.
- Significant Reduction in Resource Usage: Improvements allow provisioning of much cheaper hardware for Alluxio servers. Utilization of the memory resource for the Alluxio Master process is reduced by as much as 10x for scenarios in which Alluxio is required to rapidly keep up with out-of-band updates to underlying storage sources. Similarly, resource allocation specifically for pre-loading capabilities in Alluxio across the cluster have also been improved by 10x.
Improved High Availability at Scale
A 20x improvement (95% reduction) in the worst-case Alluxio Master failover time, for those using embedded journal, can be attributed to optimizations to the creation & restoration mechanism for snapshotting. This mechanism, which bounds the total number of journal entries the system needs to replay during a failover event, also allows administrators to improve the common failover time by 2x.
These improvements bring down the failover time from minutes to tens of seconds when over a hundred million files are actively managed by an Alluxio cluster. Planned downtime is obviated, and the impact has already been verified in production scenarios with a large number of small files.
Resource Efficiency
In the past, some users have observed surging memory resource consumption on the Alluxio Master due to an internal mechanism, called metadata synchronization, to keep files and directories in Alluxio’s namespace consistent with underlying data sources. Oftentimes, this process is triggered unintentionally while listing or pre-loading large directories.
High resource consumption has led to overprovisioning of resources. With the 2.10 release, the memory requirements for the Alluxio Master can be reduced by as much as 10x for rapid synchronization intervals, while also impacting the end-to-end performance by 2x.
Pre-loading capabilities, using the load operation, are often used to either improve SLAs when remote data is predictably accessed at a scheduled time of the day for analytical workloads, but also for accelerating the model training & deployment time. Compared to 2.9, these capabilities now require 10x fewer resources across the cluster to achieve the same or higher throughput.
Upgrade Alluxio Today
This new release is a testament to our unwavering commitment to deliver an extremely stable product with low maintenance overhead at scale. Upgrade today and get more out of your Alluxio deployments with ease. To learn more about the features and benefits of Alluxio 2.10, view the release notes and get in touch with our dedicated support team to explore the upgrade options.
Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.