Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.
This is a tutorial to guide a newbie to complete a new-contributor task and become an open-source contributor of the Alluxio project.
This article highlights synergy between the two widely adopted open-source projects, Alluxio and Presto, and demonstrates how together they deliver a self-serve data architecture across clouds.
Running Presto with Alluxio is gaining popularity in the community. It avoids long latency reading data from remote storage by utilizing SSD or memory to cache hot dataset close to Presto workers. Presto supports hash-based soft affinity scheduling to enforce that only one or two copies of the same data are cached in the entire cluster, which improves cache efficiency by allowing more hot data cached locally. The current hashing algorithm used, however, does not work well when cluster size changes. This article introduces a new hashing algorithm for soft affinity scheduling, consistent hashing, to address this problem.
As data stewards and security teams provide broader access to their organization’s data lake environments, having a centralized way to manage fine-grained access policies becomes increasingly important. Alluxio can use Apache Ranger’s centralized access policies in two ways: 1) directly controlling access to virtual paths in the Alluxio virtual file system or 2) enforcing existing access policies for the HDFS under stores.
This blog will introduce how Tencent uses Prometheus and Grafana to set up monitoring system for Alluxio in 10 minutes.
To provide model training with the best experience, Tencent has implemented a 1000-node Alluxio cluster and designed a scalable, robust, and performant architecture to speed up Ceph storage for game AI training. This blog will give you insight into how Alluxio has been implemented and optimized at Tencent.
2021 marked accelerated growth for the Alluxio Open Source Project. We could not be more grateful for what the community has achieved together in this past year. This blog provides a glimpse of the year long summary of our community growth.
This blog is the last one in the machine learning series. Our first blog introduced the what and why of our solution, and the second blog compared traditional and Alluxio solutions. This blog will demonstrate how to set up and benchmark the end-to-end performance of the training process.
This blog is the second in the machine learning series following the previous one, which discussed Alluxio's solution to improve training performance and simplify data management. With the help of Alluxio, loading data from cloud storage, training and caching data can be done in a transparent and distributed way as a part of the training process, thus improving training performance and simplifying data management. In this blog 2 of the series, we focus on comparing traditional solutions with Alluxio’s.
In this blog, we provide an overview of Alluxio's AI/ML model training solution. For more details about the reference architecture and benchmarking results, please refer to the full length whitepaper.
As more organizations advance their data revolution strategy, and run more diverse workloads on a wider variety of platforms across clouds and hybrid clouds, 2022 will see even more advances in AI, machine learning and analytic workloads and technologies and services to support them.