Resources

Blog

Blog

Recommendations to Level Up Your Machine Learning Platform

With machine learning (ML) and artificial intelligence (AI) applications becoming more business-critical, organizations are in the race to advance their AI/ML capabilities. To realize the full potential of AI/ML, having the right underlying machine learning platform is a prerequisite.

Blog

Blog

Orchestrating Data for Machine Learning Pipelines

This article will discuss a new solution to orchestrating data for end-to-end machine learning pipelines that addresses the above questions. I will outline common challenges and pitfalls, followed by proposing a new technique, data orchestration, to optimize the data pipeline for machine learning.

Blog

Blog

From Cache to Cash Introducing NFT for Data Orchestration

Today, we are excited to announce the launch of Non-fungible token (NFT) as a new feature in our leading data orchestration platform.

Blog

Blog

Improving Presto Architectural Decisions with Alluxio Shadow Cache at Meta Facebook

With the collaboration between Meta (Facebook), Princeton University, and Alluxio, we have developed "Shadow Cache" – a lightweight Alluxio component to track the working set size and infinite cache hit ratio. Shadow cache can keep track of the working set size over the past window dynamically and is implemented by a series of bloom filters. Shadow cache is deployed in Meta (Facebook) Presto and is being leveraged to understand the system bottleneck and help with routing design decisions.

On Demand Videos

On Demand Videos

Geo-distributed Analytics with NetApp StorageGRID and Alluxio

Blog

Blog

Accelerate Auto Data Tagging with Alluxio and Spark in Hybrid Cloud A Practice in WeRide

This blog shares the practice of using Alluxio and Spark to accelerate the auto data tagging system in WeRide, an autonomous driving technology company.

On Demand Videos

On Demand Videos

Speed Up Uber’s Presto with Alluxio

A Simple Approach to Speed Up Presto Interactive Queries at Uber’s Scale Through Caching

On Demand Videos

On Demand Videos

Building an Efficient AI Training Platform at bilibili with Alluxio

ALLUXIO DAY X 2022

On Demand Videos

On Demand Videos

Alluxio Journal Evolution – Towards high availability and fault tolerance

ALLUXIO DAY X 2022

White Paper

White Paper

Spark + Alluxio Overview | Pair Spark with Alluxio to Modernize Your Data Platform

Blog

Blog

Pair Spark with Alluxio to Modernize Your Data Platform

Alluxio is the data orchestration platform to unify data silos across heterogeneous environments. This is the last article in a series to give you the basics of Alluxio’s architecture and solution.

Blog

Blog

How to Become a Contributor to Alluxio Open Source Project

This is a tutorial to guide a newbie to complete a new-contributor task and become an open-source contributor of the Alluxio project.

Blog

Blog

Selfserve Data Architecture with Presto and Alluxio Across Clouds

This article highlights synergy between the two widely adopted open-source projects, Alluxio and Presto, and demonstrates how together they deliver a self-serve data architecture across clouds.

Blog

Blog

Using Consistent Hashing in Presto to Improve Caching Data Locality in Dynamic Clusters

Running Presto with Alluxio is gaining popularity in the community. It avoids long latency reading data from remote storage by utilizing SSD or memory to cache hot dataset close to Presto workers. Presto supports hash-based soft affinity scheduling to enforce that only one or two copies of the same data are cached in the entire cluster, which improves cache efficiency by allowing more hot data cached locally. The current hashing algorithm used, however, does not work well when cluster size changes. This article introduces a new hashing algorithm for soft affinity scheduling, consistent hashing, to address this problem.

Blog

Blog

Alluxio and Apache Ranger Best Practices

As data stewards and security teams provide broader access to their organization’s data lake environments, having a centralized way to manage fine-grained access policies becomes increasingly important. Alluxio can use Apache Ranger’s centralized access policies in two ways: 1) directly controlling access to virtual paths in the Alluxio virtual file system or 2) enforcing existing access policies for the HDFS under stores.

Blog

Blog

How to Set Up Monitoring System for Alluxio with Prometheus and Grafana in 10 Minutes

This blog will introduce how Tencent uses Prometheus and Grafana to set up monitoring system for Alluxio in 10 minutes.

On Demand Videos

On Demand Videos

Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds

Blog

Blog

ThousandNode Alluxio Cluster Powers Game AI Platform A Production Case Study from Tencent

To provide model training with the best experience, Tencent has implemented a 1000-node Alluxio cluster and designed a scalable, robust, and performant architecture to speed up Ceph storage for game AI training. This blog will give you insight into how Alluxio has been implemented and optimized at Tencent.

Case Study

Case Study

Tencent Game AI Platform

Thousand-Node Alluxio Cluster Powers Game AI Platform – A Production Case Study from Tencent

On Demand Videos

On Demand Videos

The Evolution of an Open Data Platform with Alluxio

ALLUXIO DAY IX 2022

On Demand Videos

On Demand Videos

Industrial Bank’s Alluxio Deployment

ALLUXIO DAY IX 2022

On Demand Videos

On Demand Videos

Vipshop Offline Data Cache Acceleration System – Alluxio Integration

ALLUXIO DAY IX 2022

Blog

Blog

A Year with Alluxio Community 2021

2021 marked accelerated growth for the Alluxio Open Source Project. We could not be more grateful for what the community has achieved together in this past year. This blog provides a glimpse of the year long summary of our community growth.

Blog

Blog

Machine Learning Model Training with Alluxio Part 3 Benchmarking

This blog is the last one in the machine learning series. Our first blog introduced the what and why of our solution, and the second blog compared traditional and Alluxio solutions. This blog will demonstrate how to set up and benchmark the end-to-end performance of the training process.

Alluxio Enterprise AI

Alluxio Enterprise Data

Resource Hub

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer