CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes Using Alluxio

January 22, 2020

Gene Pang

Adit Madan

In the on-prem days, one key performance optimization for Apache Hadoop or Apache Spark workloads is to run tasks on nodes with local HDFS data. However, while adoption of the Cloud & Kubernetes makes scaling compute workloads exceptionally easy, HDFS is often not an option. Effectively accessing data from cloud-native storage services like AWS S3 or even on-premises HDFS becomes harder as data locality is lost.

Originated from UC Berkeley AMPLab, the open source project Alluxio approaches this problem in a new way by helping to move data closer to compute workloads efficiently and on-demand, and unify data across multiple or remote clouds, and many more. This webinar will describe the concept and internal mechanism using the stack of Spark+Alluxio in Kubernetes to enhance data locality even when the storage service is outside or remote.

Particularly, we will go over:

Why Spark is able to make a locality-aware schedule when working with Alluxio in K8s environment using the host network
Why a pod running Alluxio can share data efficiently with a pod running Spark on the same host using domain socket and host path volume
The roadmap of Alluxio to further improve running analytics jobs like Spark and Presto, including the on-going closer integration with Presto

Complete the form below to access the full overview:

Presentations

Alluxio in Suning, Kyligence, Didi, JD.com, and Tencent [Chinese]

Use Alluxio to Unify Storage Systems in Suning

Suning is one of the leading commercial enterprises in China with two public companies in China and Japan respectively. It uses Alluxio to unify storage systems and manage multiple HDFS clusters.

January 20, 2018

Using Alluxio as a Fault-Tolerant Pluggable Optimization Component to Compute Frameworks of JD System

STRATA DATA CONFERENCE LONDON 2018

JD.com is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently, JD.com’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 cluster nodes and a total capacity of 210 PB.

Alluxio, formerly Tachyon, is the world’s first system that unifies disparate storage systems at memory speed. In the big data ecosystem, Alluxio lies between computation frameworks or jobs and various kinds of storage systems. Additionally, Alluxio’s memory-centric architecture enables data access orders of magnitude faster than existing solutions.

Alluxio has run in JD.com’s production environment on 100 nodes for six months. Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFSURLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.

May 21, 2018

Alluxio in MOMO, JD.com, TalkingData, and Vipshop [Chinese]

Alluxio in MOMO: Accelerating Ad Hoc Analysis

From our friends at MOMO

MOMO, a leading pan-entertainment social platform in China, has deployed Alluxio to accelerate ad-hoc query analytics. In the course of evaluating the best fit for Alluxio in their infrastructure they conducted several performance tests to understand how ad-hoc query analytics behaved in several scenarios. These tests give real-world insight to the performance benefits Alluxio provides. The MOMO findings include:

With Alluxio, performance was improved 3-5x over the current mode
Even when initially reading ‘cold’ data Alluxio delivered superior performance in most cases
Alluxio can effectively scale-out to improve performance as requirements grow

August 24, 2018

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

Complete the form below to access the full overview:

Presentations

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer