Resources

On Demand Videos

On Demand Videos

Alluxio Product School Webinar – Transparent URI

Blog

Blog

Cross Cluster Synchronization in Alluxio Part 1 Scenarios and Background

This is a blog series talking about the design and implementation of the Cross Cluster Synchronization mechanism in Alluxio. This mechanism ensures that the metadata is consistent when running multiple Alluxio clusters. Part 1 of this blog series discusses the scenario and background.

Blog

Blog

Cross Cluster Synchronization in Alluxio Part 3 Discussions and Conclusion

Following part 1 and part 2, this final blog of the series discusses some design decisions and details, as well as certain future work.

Blog

Blog

Cross Cluster Synchronization in Alluxio Part 2 Mechanism

This is part 2 of the blog series talking about the design and implementation of the Cross Cluster Synchronization mechanism in Alluxio. In the previous blog, we discussed the scenario, background and how metadata sync is done with a single Alluxio cluster. This blog will describe how metadata sync is built upon to provide metadata consistency in a multi-cluster scenario.

‍

Blog

Blog

Data Access as a Service at Shopee Using Alluxio to Accelerate Interactive Queries and Enhance Developer Experience with Flexible APIs

Blog

Blog

Get Started with Trino and Alluxio in 5 Minutes

Blog

Blog

Hopping into the Year of Rabbit with Alluxio Community

Case Study

Case Study

Comcast

Maximizing Efficiency and Reducing S3 Egress Cost with Hybrid Cloud Data Access

Blog

Blog

Whats Next for Data Analytics AI and Cloud in 2023

Blog

Blog

Integrate Alluxio With Your Existing Data Stack Without Redefining Hive Tables

On Demand Videos

On Demand Videos

Alluxio 2.9 Release Overview

Blog

Blog

Whats New in Alluxio 2.9: MultiAlluxio Synchronization Kubernetes Operator and Flexible S3 Access Control

Blog

Blog

Architecting Data Orchestration Four Use Cases

Modern analytics projects rely on a hodgepodge of compute clusters, data stores, and pipelines, flung across countries and continents. Enterprises struggle to meet performance SLAs without replicating lots of data or moving and re-coding applications.

‍

On Demand Videos

On Demand Videos

Building a Distributed File System For The Cloud-Native Era

Big Data Bellevue Meetup

Blog

Blog

Tutorial of Building MultiCloud Data Lake using Delta Lake and Alluxio

On Demand Videos

On Demand Videos

Zookeeper vs Raft: Stateful Distributed Coordination with HA and Fault Tolerance

Big Data Bellevue & Cloudy With a Chance of Data Meetup

Case Study

Case Study

Achieving Hybrid and Multi-Cloud Architecture With Application Portability

A Fortune 50 technology company that serves over 1 billion users successfully implemented Alluxio to achieve a hybrid cloud strategy, become multi-cloud ready, cut costs, and boost agility.

Case Study

Case Study

Expedia Group

Unify Data Lakes Across Multiple Geographic Regions in the Cloud

On Demand Videos

On Demand Videos

Architecting Data Platform Across Regions and Clouds for Analytics and AI

Blog

Blog

Data Orchestration Simplifying Data Access for Analytics

The problem with data modernization initiatives is that they result in distributed datasets that impede analytics projects. As enterprises start their cloud migration journey, adopt new types of applications, data stores, and infrastructure, they still leave residual data in the original location. This results in far-flung silos that can be slow, complex and expensive to analyze. As business demands for analytics rise—along with cloud costs—enterprises need to rationalize how they access and process distributed data. They cannot afford to replicate entire datasets or rewrite software every time they study data in more than one location.

‍

Presentation

Presentation

Unified Data API for Distributed Cloud Analytics and AI

ALLUXIO DAY x APAC Modern Data Stack 2022

Alluxio (www.alluxio.io) is an open-source virtual distributed file system that provides a unified data access layer for hybrid and multi-cloud deployments. It enables distributed compute engines like Spark, Presto or Machine Learning frameworks like TensorFlow to transparently access different persistent storage systems (including HDFS, S3, Azure and etc) while actively leveraging in-memory cache to accelerate data access. Developed originally from UC Berkeley AMPLab as research project “Tachyon”, Alluxio has more than 1200 contributors and is used by over 100 companies worldwide with the largest production deployment over 1000 nodes.