We are excited to announce Alluxio Enterprise Edition (AEE) 1.6.0 and Alluxio Community Edition (ACE) 1.6.0 releases. The AEE release brings a new embedded journal as well as enhancements in the areas of Security and Fast Durable Write. In addition, both the AEE and the ACE releases bring new clients support (Amazon S3 API and Python Client), major usability improvements as well as enhanced integrations with the ecosystem. Highlights for the release are listed below.
AEE New Features
Embedded Fault Tolerance and High Availability
- Previously, Alluxio required a UFS for storing its write-ahead journal metadata. This made Alluxio metadata operations very slow when using an object UFS because every journal flush needs to create a new object. 1.6.0 supports an embedded journal stored on Alluxio masters' local disks. The masters communicate among themselves to elect a leader and keep the journal consistent. The new leader election capability removes the need to rely on an external Zookeeper cluster. With the embedded journal, Alluxio is available as long as the majority of its masters are available, with no dependency on a UFS or Zookeeper.
Fast Durable Write Enhancements
- AEE 1.6 made some improvements in Fast Durable Write feature. When Alluxio clients write files using "fast-durable-write", in case Alluxio workers are full, the client will automatically fall back to the THROUGH mode to bypass storing data to Alluxio but write to UFS directly. This process is completely transparent to the user and application.
Security Enhancement
- AEE 1.6 supports separating the UFS principal from the Alluxio service principal. The UFS principal and keytab are used to communicate with Kerberized UFS only, not for Alluxio internal authentication. This enables separate or headless UFS principal.
Alluxio Manager New Features
Support for Offline Installation
- Alluxio Manager can now be delivered and used to install Alluxio in data centers that are not connected to the internet. The 1.6.0 release allows users to get the Alluxio Manager and Alluxio Enterprise Edition binary, without requiring that a web server is running in the data center. Please contact the Alluxio team for detailed setup instructions.
Support for Mount Point Options
- Alluxio Manager now supports specifying options for each new mount point. These specified configuration options are only applied to the selected mount point.
AEE and ACE New Features
New Client APIs
S3 Client
- The new S3 API in Alluxio 1.6 release allows applications to interact with Alluxio in the same way that they interact with S3. It adds support for a RESTful API that is compatible with the basic operations of the Amazon S3 API. Applications previously using S3 as a backend can be seamlessly transitioned to use Alluxio backed by S3 (or any other storage) without code change.
Python Client
- A Python Client has been developed for interacting with Alluxio from Python environments. The Python client exposes an API similar to the native Java API. See this example of how to perform basic filesystem operations in Alluxio.
Usability Improvements
Audit Logging
- Alluxio supports audit logging of user accesses. With audit logging, system administrators can keep track of which operations each user has attempted to perform. The format and functionality is consistent with HDFS audit logging.
Dynamically Adjustable Log Levels
- Users can now modify Alluxio server log levels without needing to restart the server.
Remote Logging
- Alluxio supports sending logs to a remote log server over the network. This feature can be useful to system administrators who want to consolidate Alluxio logs in a central location.
Performance Improvements
Alluxio Partial Caching Enhancement
- Alluxio provides an option for caching a block even when only a part of it is read. Partial caching is enabled when
alluxio.user.file.cache.partially.read.block
is set to true (default), in this case, the entire block will be cached in Alluxio even if the client only reads a part of this block. Alluxio now provides better handling of partial caching, the entire block will be cached in a more efficient manner leading to potential 4x performance improvement.
For users upgrading from Alluxio version <= 1.4.x, note that your Alluxio journal will need to be upgraded in order to migrate to the new version.
Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.