Community Office Hour: Hands-on with Alluxio Structured Data Management
January 14, 2020
By 
Bin Fan
Gene Pang

Users deploy Alluxio in a wide range of use cases from analytics to AI platforms, for Alluxio’s unified access to data and transparent caching for acceleration. However, many frameworks are SQL engines, like Presto, Apache Spark SQL, or Apache Hive, and consume data structured as tables of rows and columns. Since Alluxio is commonly used as a filesystem of files and directories, there is a mismatch between how Alluxio exposes data (files, directories), and how SQL engines deal with data (tables, rows, columns). This gap creates various challenges and inefficiencies.

Therefore, in the Alluxio 2.1 release, we introduce Alluxio Structured Data Management, which is a new set of services that enables structured data applications to interact with data more efficiently. The new services include the catalog service and a transformation service, which all work together to bridge the gap between storage and SQL engines and enable physical data independence.

In this office hour, we introduce the concepts and components of Alluxio Structured Data Management, and go through a demo with Presto.

In this Office Hour we’ll go over:

  • Introduction and motivation of Alluxio Structured Data Management
  • Overview of the different services of Alluxio Structured Data Management in Alluxio 2.1
  • A demo of using Alluxio Structured Data Management with Presto
ALLUXIO COMMUNITY OFFICE HOUR

Users deploy Alluxio in a wide range of use cases from analytics to AI platforms, for Alluxio’s unified access to data and transparent caching for acceleration. However, many frameworks are SQL engines, like Presto, Apache Spark SQL, or Apache Hive, and consume data structured as tables of rows and columns. Since Alluxio is commonly used as a filesystem of files and directories, there is a mismatch between how Alluxio exposes data (files, directories), and how SQL engines deal with data (tables, rows, columns). This gap creates various challenges and inefficiencies.

Therefore, in the Alluxio 2.1 release, we introduce Alluxio Structured Data Management, which is a new set of services that enables structured data applications to interact with data more efficiently. The new services include the catalog service and a transformation service, which all work together to bridge the gap between storage and SQL engines and enable physical data independence.

In this office hour, we introduce the concepts and components of Alluxio Structured Data Management, and go through a demo with Presto.

In this Office Hour we’ll go over:

  • Introduction and motivation of Alluxio Structured Data Management
  • Overview of the different services of Alluxio Structured Data Management in Alluxio 2.1
  • A demo of using Alluxio Structured Data Management with Presto

Video:

Slides:

Hands-on with Alluxio Structured Data Management from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer