We are excited to announce Alluxio Enterprise Edition (AEE) 1.5.0 and Alluxio Community Edition (ACE) 1.5.0 releases. The AEE release brings enhancements in the areas of security, multi-tenancy as well as working with multiple under-stores. In addition, both the AEE and the ACE releases bring major usability and performance improvements as well as enhanced integrations with the ecosystem. Highlights for the release are listed below.
AEE New Features
Encryption
- AEE 1.5 introduces client-side encryption. Data is encrypted by the Alluxio client on the write path and decrypted on the read path. The feature also supports OpenSSL hardware acceleration and is compatible with Hadoop's KMS.
Multi-version HDFS Mount
- AEE 1.5 enables Alluxio to mount HDFS clusters with different versions and different configuration. Now, a single client can access data from clusters with different HDFS versions at ease.
LDAP and Active Directory Support
- AEE 1.5 adds integration with LDAP for user group mapping and Active Directory for authentication. This allows for authentication and authorization in Alluxio to be used in conjunction with the enterprise security policies.
Cross Data Center Data Replication
- This features enables users to mount multiple storage systems to the same Alluxio path. On the write path, data is replicated to all the storage systems mounted. On the read path, high availability is provided by serving data as long as at least one of the one mounted storage is available. This capability allows Alluxio to be used in DR scenarios.
Admin Privilege
- AEE 1.5 introduces privileges to allow a system administrator to control access to certain privileged operations such as pin, mount, or setTtl. This is beneficial in multi-tenanted clusters, by providing control over each users use of the total available resources in Alluxio.
Alluxio Manager New Features
Support for offline data centers
- Alluxio Manager can now be used to install Alluxio in data centers that are not connected to the internet.
Alluxio agent improvements
- Installation and starting of Alluxio agent now happens if and when needed, avoiding the need to manually restart the Alluxio agent process when the Alluxio cluster nodes are restarted.
Ecosystem Integrations
Golang Client
- Golang applications can communicate with Alluxio through Alluxio's Golang Client, without writing customized code.
Presto Integration
- Support for running Presto on top of Alluxio. Our customers have seen up to 10x performance improvement.
Docker Integration
- Support for deploying Alluxio service in a Docker container.
Ceph S3A Connector
- AEE 1.5 can connect to Ceph under storages using Ceph's S3A connector. This brings significant functionality improvement and up to 3x performance improvement.
Improved Mesos and Marathon Integration
- Improved support for deploying Alluxio in a Mesos and Marathon environment. This includes convenience functionality for deploying Alluxio servers as well as configuring Alluxio clients.
Performance Improvements
Alluxio Restarting Time Improvement
- Alluxio periodically compacts journal edit logs to update the journal checkpoint. This reduces restart and failover (if in HA mode) times, which were previously unbounded, to less than a minute. Periodic journal checkpointing is enabled in both HA and non HA modes.
Network I/O Performance Improvement
- All network I/O now use the packet streaming protocol, providing up to 3x performance improvement. In addition, the metadata management between client and worker is greatly simplified, reducing the number of connections between the two components by about half.
Domain Socket Based Short Circuit I/O
- AEE 1.5 introduces an option to use domain sockets for short circuit I/O to write data to local Alluxio storage through a worker process instead of the client. This mode is recommended for users running write heavy workloads in a container environment.
10x Under Storage Metadata Operations Improvement
- The internal logic and concurrency mechanism of Alluxio is greatly improved for handling metadata heavy workloads with under storage systems, most notably object stores like Amazon S3, resulting in up to 10x performance improvement in metadata heavy stages of compute tasks.
Usability Improvements
High and Low Watermark Space Reserver
- Asynchronous space reservation on Alluxio workers are now triggered at a specified high watermark and evict until the user specified low watermark is reached. It is highly recommended to use the space reserver in workloads with bursty writes.
Improved Object Store Directory Management
- The need to use dummy metadata files suffixed by $folder$ has been removed. This enables Alluxio to be more compatible with other systems interacting with the backing object store and allows users to mount buckets without having write permissions or the bucket set up with dummy files to work with Alluxio.
Alluxio Client Module Isolation
- Alluxio 1.5.0 introduces a runtime module specifically designed for drop-in use of an Alluxio client. This client is equivalent to the core-client uber jar in previous versions. Alluxio’s other client interfaces such as HDFS and Alluxio’s native file system are in independent modules making it much easier to control dependencies for downstream projects. Best practices have been updated in the latest documentation.
Mount Point Configuration Properties
- Richer configuration settings are exposed at a mount point granularity, enabling users to mount under storage systems with different properties. This enables use cases like using different credentials to connect to the same type of under storage. For example, mounting in Alluxio two Google Cloud Store buckets which require different credentials.
For users upgrading from previous Alluxio versions, note that your Alluxio journal will need to be upgraded in order to migrate to the new version.
Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.