Today, we are thrilled to announce that Alluxio 2.9 is generally available (GA) for both the free open source Alluxio Community Edition and Alluxio Enterprise Edition! With GA, you can expect stability, support, and enterprise-readiness from Alluxio. In this blog post, we explore how Alluxio is enabling growth and agility for analytics and AI applications at the world’s leading companies, often running across regions, compute engines, and storage systems.
The Alluxio 2.9 version delivers support for a scale-out & multi-tenant architecture with a new cross-cluster synchronization feature, enhanced manageability with significant improvement in the tooling and guidelines for deploying Alluxio on Kubernetes, and improved security and performance with a strengthened S3 API.
Alluxio enables a compute & storage-agnostic multi-cloud data platform. Alluxio can be used with Spark, Presto, Trino, PyTorch, and Tensorflow amongst others on various cloud platforms, such as AWS, GCP, and Azure, and also on Kubernetes across private data centers or public clouds.
Alluxio Community Edition Highlights
The following features are included in both the Alluxio Community and Enterprise editions.
Master Health Status
The Alluxio master now periodically checks a combination of resource usage, including CPU and memory usage, and several performance critical internal data structures to infer the overall state of the system. The possible statuses, which can be retrieved by inspecting the master.system.status metric, are:
- IDLE
- ACTIVE
- STRESSED
- OVERLOADED
To get started, view the documentation for more information about this monitoring heuristic.
Paging Storage on Workers (Experimental)
The new release includes support for fine-grained paging-level (e.g., 1MB) storage representation for caching on Alluxio workers as an alternative option to the existing block-based (e.g. 64MB) storage.
This feature promises to improve caching efficiency and improve performance by reducing amplification of the amount of data read by applications when accessing the underlying storage sources for the first time.
To get started, view the documentation here.
Alluxio Enterprise Edition Highlights
The following features are part of the Alluxio Enterprise Edition only.
Multi-Cluster Synchronization
Tenant isolation rigorously prevents different teams from competing for access to shared data lake storage. With the new cross-cluster synchronization feature, Alluxio 2.9 improves scalability when deploying multiple Alluxio clusters across tenants in Kubernetes or across environments.
Federation of multiple Alluxio clusters makes one instance of Alluxio aware of another by actively synchronizing metadata with a stream of update events. This feature is particularly useful when adopting a satellite architecture with data producers updating data lake storage with isolation from data consumers.
To get started, view the documentation here.
Manageability with new Kubernetes Operator
Running Alluxio on Kubernetes helps standardize deployment methodologies to make the data stack portable to any environment. This new release introduces an Alluxio Operator, which simplifies deploying and managing multiple Alluxio clusters.
Administrators can now deploy and manage Alluxio using a CRD (Custom Resource Definition). Using the Alluxio operator reduces the burden of managing multiple instances of Alluxio.
To get started, view the documentation here.
Enhanced S3 API Security
Authentication and access control policies can be centrally managed using a unified namespace via the Alluxio S3 API to provide a unified security experience across heterogeneous storage, either on-premise or in the cloud.
By adopting the open authentication protocol for S3 API, user identities will be verified before their requests are processed. This new feature allows connections to identity management systems, such as PingFederate, and leverage Single Sign On (SSO).
To get started, view the documentation here.
If you’d like to speak with a solutions engineer to learn more about the latest in Alluxio 2.9, you can directly book a meeting here.
More Info
For an exhaustive list of major features and bug fixes of Alluxio 2.9, please refer to the Community Edition release notes and Enterprise Edition release notes.
Free downloads of Alluxio 2.9 open source Community Edition and trials of Alluxio Enterprise Edition are immediately available here: https://www.alluxio.io/download/. Join 9000+ members in our community slack channel to ask any questions and provide your feedback.
Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.