Resources

Blog

Blog

Machine Learning Model Training with Alluxio Part 2 Comparable Analysis

This blog is the second in the machine learning series following the previous one, which discussed Alluxio's solution to improve training performance and simplify data management. With the help of Alluxio, loading data from cloud storage, training and caching data can be done in a transparent and distributed way as a part of the training process, thus improving training performance and simplifying data management. In this blog 2 of the series, we focus on comparing traditional solutions with Alluxio’s.

Blog

Blog

Speed up Largescale MLDL Offline Inference Jobs with Alluxio at Microsoft Bing

Running inference at scale is challenging. In this blog, we will share our observations and the practice to use Alluxio to speed up the I/O performance for large-scale ML/DL offline inference at Microsoft Bing.

Blog

Blog

Machine Learning Model Training with Alluxio Part 1 Solution Overview

In this blog, we provide an overview of Alluxio's AI/ML model training solution. For more details about the reference architecture and benchmarking results, please refer to the full length whitepaper.

Blog

Blog

Top Data Predictions for 2022

As more organizations advance their data revolution strategy, and run more diverse workloads on a wider variety of platforms across clouds and hybrid clouds, 2022 will see even more advances in AI, machine learning and analytic workloads and technologies and services to support them.

Blog

Blog

Metadata Synchronization in Alluxio Design Implementation and Optimization

Metadata synchronization (sync) is a core feature in Alluxio that keeps files and directories consistent with their source of truth in under storage systems, thus making it simple for users to reason the data retrieved from Alluxio. Meanwhile, understanding the internal process is important in order to tune the performance. This article describes the design and the implementation in Alluxio to keep metadata synchronized.

On Demand Videos

On Demand Videos

Building an Open Data Platform with Apache Iceberg

ALLUXIO DAY VIII 2021

‍

On Demand Videos

On Demand Videos

Iceberg + Alluxio for Fast Data Analytics

ALLUXIO DAY VIII 2021

On Demand Videos

On Demand Videos

Alluxio + Spark: Accelerating Auto Data Tagging in WeRide

ALLUXIO DAY VIII 2021

White Paper

White Paper

Accelerating Machine Learning / Deep Learning in the Cloud: Architecture and Benchmark

Blog

Blog

Whats New in Alluxio 2.7: Enhanced Scalability Stability and Major Improvements in AIML Training Efficiency

With this release, Alluxio has strengthened its position as a de-facto data unification and acceleration solution in data analytics and machine learning pipelines. The solution is optimized to support Spark, Presto, Tensorflow, and PyTorch, and is available on multiple cloud platforms such as AWS, GCP, and Azure Cloud, and also on Kubernetes in private data centers or public clouds.

White Paper

White Paper

Presto with Alluxio Overview – Architecture Evolution for Interactive Queries

Blog

Blog

Presto with Alluxio Overview Architecture Evolution for Interactive Queries

Alluxio is the data orchestration platform to unify data silos across heterogeneous environments. The following blog will discuss the architecture combining Spark with Alluxio.

Blog

Blog

Speeding Up the Atlas Supercomputing Platform with Fluid Alluxio

Unisound is an artificial intelligence company focusing on Internet of Things services. Unisound’s AI technology stacks include the perception and expression capabilities of signals, voices, images, and texts, and the cognitive technologies such as knowledge, understanding, analysis, and decision-making, towards a multi-modal AI system. Atlas is the supercomputing platform supporting all kinds of AI applications including model training and reasoning inferencing.

Case Study

Case Study

Unisound

Speeding Up the Atlas Supercomputing Platform with Fluid + Alluxio

White Paper

White Paper

Alluxio Use Cases Overview

Blog

Blog

Alluxio Use Cases Overview Unify silos with Data Orchestration

This blog is the first in a series introducing Alluxio as the data platform to unify data silos across heterogeneous environments. The next blog will include insights from PrestoDB committer Beinan Wang to uncover the value for analytics use cases, specifically with PrestoDB as the compute engine.