Resource Hub

Presentation

Presentation

Workshop: Presto on Alluxio Hands-On Lab

DATA ORCHESTRATION SUMMIT 2019

This hands-on training run by the creators of Presto and Alluxio will cover how to get started with Presto and Alluxio. Attendees will get hands-on experience launching the EC2 instance, exploring the Alluxio filesystem and cluster status, and running queries with Presto on Alluxio where you’ll experience the performance benefits of using Alluxio in your analytics stack.

Presto is a widely popular sql query engine, and it is great for interactive sql analytics. However, when the data is remote or in object stores, performance becomes a challenge. Alluxio can improve Presto’s query performance by using Alluxio as a distributed cache layer co-located with Presto. Presto with Alluxio brings together two open source technologies to give you better performance and multi-cloud capabilities for interactive analytic workloads. Presto’s open source distributed SQL query engine coupled with Alluxio enables true separation of storage and compute for data locality and provides memory speed response time and aggregate data from any file or object store.

On Demand Videos

On Demand Videos

What’s New in Alluxio 2

DATA ORCHESTRATION SUMMIT 2019

On Demand Videos

On Demand Videos

Enterprise Distributed Query Service Powered by Presto & Alluxio Across Clouds at WalmartLabs

DATA ORCHESTRATION SUMMIT 2019

On Demand Videos

On Demand Videos

Presto: Query Anything – Data Engineer’s Perspective

DATA ORCHESTRATION SUMMIT 2019

On Demand Videos

On Demand Videos

Open Source Panel: How to create an open source project

DATA ORCHESTRATION SUMMIT 2019

‍

On Demand Videos

On Demand Videos

Orchestrate a Data Symphony

DATA ORCHESTRATION SUMMIT 2019

On Demand Videos

On Demand Videos

Modern Data Platforms – Thinking Data Flywheel on the Cloud

DATA ORCHESTRATION SUMMIT 2019

On Demand Videos

On Demand Videos

From Files to Tables: Alluxio Structured Data Management

DATA ORCHESTRATION SUMMIT 2019

On Demand Videos

On Demand Videos

How to Develop and Operate Cloud Native Data Platforms and Applications

DATA ORCHESTRATION SUMMIT 2019

On Demand Videos

On Demand Videos

Legend, Legacy, Orchestration: Challenge and Evolution of Data Orchestration at Rakuten Data System

DATA ORCHESTRATION SUMMIT 2019

On Demand Videos

On Demand Videos

How to Run Fast Presto Analytics with Alluxio in Cloud – a Production Experience

DATA ORCHESTRATION SUMMIT 2019

On Demand Videos

On Demand Videos

Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes

DATA ORCHESTRATION SUMMIT 2019

On Demand Videos

On Demand Videos

Apache Iceberg – A Table Format for Huge Analytic Datasets

DATA ORCHESTRATION SUMMIT 2019

On Demand Videos

On Demand Videos

tf.data: TensorFlow Input Pipeline

DATA ORCHESTRATION SUMMIT 2019

Presentation

Presentation

Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds

ODSC WEST 2019

Cloud storage brings great flexibility in management and cost-efficiency to data scientists, but also introduces new challenges related to data accessibility and data locality for machine learning applications. For instance, when the input data is stored in a remote cloud storage like AWS S3 or Azure blob storage, direct data access is often slow and expensive; but manually moving data to the training clusters can be time-consuming, complicated and often require data engineering or ETL pipelines.

This session is designed for data scientists or data engineers who work with remote and possibly multiple data sources in hybrid or multi-cloud environments. We will guide the audience to use Alluxio to greatly simplify the data preparation in these environments, covering the following topics:

-How to setup and create POSIX endpoint for Alluxio service to unify the file system data access to S3, HDFS and Azure blob storage
How to run Apache Spark to read input from and write output to remote storage with Alluxio as the distributed data caching layer
How to run TensorFlow to train models backed by accessing remote input data like access local file system.

Blog

Blog

Improving Spark Memory Resource with OffHeap InMemory Storage

On Demand Videos

On Demand Videos

Online Meetup: Powering Data Science and AI with Apache Spark, Alluxio, and IBM

Blog

Blog

Introducing Wormhole Dockerized Presto Alluxio setups for blazing fast analytics

Blog

Blog

Tutorial Presto Alluxio Hive Metastore on Your Laptop in 10 min

This tutorial guides users to set up a stack of Presto, Alluxio and Hive Metastore on your local server, and it demonstrates how to use Alluxio as the caching layer for Presto queries.

On Demand Videos

On Demand Videos

Tech Talk: From limited Hadoop compute capacity to increased data scientist efficiency

Presentation

Presentation

Enabling big data & AI workloads on the object store at DBS

The big data stack has evolved over the past few years with an explosion of data frameworks, starting with MapReduce and expanding to Apache Spark and Presto. The approach to managing and storing data has evolved as well, starting from using primarily Hadoop distributed file system (HDFS) to newer, cheaper, and easier technologies like object stores. But the design of most object stores inhibits real-time big data and AI workloads running directly on them.

Vitaliy Baklikov and Dipti Borkar explore a different architecture for analytic workloads, particularly those deployed in cloud environment. Alluxio, an open-source virtual distributed file system, provides a unified data access layer for hybrid and multicloud deployments. Alluxio enables distributed compute engines like Spark or Presto or machine learning frameworks like TensorFlow to transparently access different persistent storage systems (including HDFS, S3, Azure, etc.) while actively leveraging in-memory cache to accelerate data access.

Vitaliy and Dipti dive into how DBS Bank built a modern big data analytics stack, leveraging an object store as persistent storage even for data-intensive workloads, and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. In addition, deploying Alluxio to access data solves many challenges that cloud deployments bring with separated compute and storage.

Presentation

Presentation

Online Meetup: AWS S3 + Alluxio + Presto = ❤️ The Ryte Use Case

At Ryte, we analyze unstructured, semi-structured and structured data for more than one million users worldwide. The whole Ryte-Platform is built with a scalable architecture to support our heavy load and make it possible for our customers to drill-down from a high-level overview into the last byte of their websites.

In this presentation, I will show why & how we solve some challenging technical issues, improve the speed, and reduce costs of our AWS EMR Hadoop & Presto -Backend with Alluxio to an awesome level!

Topics:

What is Ryte: Platform to optimize your Online-Marketing
Requirements for the Ryte-Platform
Why we use Presto on AWS EMR with S3
When problems pop-up
How we solve them with Alluxio in a perfect way

Blog

Blog

QA with Alluxios Bin Fan on Data Orchestration Cloud Migration and Data Engineering Challenges

For today’s blog post I interviewed Bin Fan, Founding Engineer and VP of Open Source at Alluxio. Bin is the PMC maintainer of the Alluxio open source project. Prior to Alluxio, he worked for Google on the next-generation storage infrastructure.

Presentation

Presentation

Alluxio – Data Orchestration for Analytics and AI in the Cloud

Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More

Data storage is migrating from the colocated model (e.g., HDFS) to a more cost-effective, scalable but often fully disaggregated and remote data lake model (e.g. S3). This has created a strong need for data orchestration in the cloud like what K8s does for container-based workloads, so that data can be presented in the right layout at right location for data applications on the cloud. Originally developed from UC Berkeley AMPLab project “Tachyon”, Alluxio (www.alluxio.io) implements the world’s first open-source data orchestration system in the cloud: an unified access layer for data-driven applications in bigdata and ML, enabling Spark, Presto or TensorFlow to transparently access different external storage systems while actively leveraging in-memory cache to accelerate data access. In this talk, we will present: trends and challenges in the data ecosystem in cloud era; Data engineering in the cloud with data orchestration; Use cases of using tech stacks (Presto or Tensorflow) with Alluxio on S3

‍

Your selections don't match any items.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

Resource Hub

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer