Products
Resource Hub
.png)
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.


Presentation

Presentation
Enabling Ultra-fast Presto in the Cloud with Alluxio
PRESTO SUMMIT NYC
This talk describes a stack of open-source projects to serve high-concurrent and low-latency SQL queries using Presto with Alluxio on big data in the cloud. Deploying Alluxio as a data orchestration layer to access cloud storage object storage (e.g., AWS S3), this architecture greatly enhances the data locality of Presto with distributed and cross-query caching, thus avoids reading the same data repeatedly from the cloud storage.
In addition, since the Alluxio v2.1 release, Alluxio provides structured data management to deliver additional performance beyond caching raw bytes of input files or objects, but also manage and transform structured data. For example, Alluxio can convert data in raw formats (such as CSV) into a more compact and performant file format (such as Parquet) to accelerate Presto queries by 10x for certain workloads with much less CPU used.
This talk will cover an overview of Alluxio’s core concepts, architecture, data flow, as well as the use cases from internet companies like Walmart, JD.com, Ryte that run this stack of Presto and Alluxio at the scale in production.
No items found.


Presentation

Presentation
Accelerating workloads and bursting data with Google Dataproc & Alluxio
BIG DATA APPLICATION MEETUP @ GOOGLE
Google Cloud Dataproc is a popular managed on-demand service to run Spark, Presto and many other compute workloads. Alluxio, an open source data orchestration technology, helps speed up Dataproc workloads by providing a distributed caching layer within the Dataproc Cluster. In addition, Alluxio enables “Zero-copy” bursting allowing users to run compute workloads even on data that’s remote on-prem or another cloud. In this session, Dipti from Alluxio and Roderick from Google Cloud will share an overview of Alluxio and Google Dataproc and the benefits the two together bring. It will include a demo of initializing a Dataproc cluster with Alluxio to run workloads on remote data.
No items found.


Presentation

Presentation
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
Presto Meetup Hosted @ UBER
This talk describes a stack of open-source projects to serve high-concurrent and low-latency SQL queries using Presto with Alluxio on big data in the cloud. Deploying Alluxio as a data orchestration layer to access cloud storage object storage (e.g., AWS S3), this architecture greatly enhances the data locality of Presto with distributed and cross-query caching, thus avoids reading same data repeatedly from the cloud storage.
In addition, in the latest v2.1 release, Alluxio provides structured data management to deliver additional performance beyond caching raw bytes of input files or objects, but also manage and transform structured data. For example, Alluxio can convert data in raw formats (such as CSV) into a more compact and performant file format (such as Parquet) to accelerate Presto queries by 10x for certain workloads with much less CPU used.
This talk will cover an overview of Alluxio’s core concepts, architecture, data flow, as well as the use cases from internet companies like Walmart and JD.com that run this stack of Presto and Alluxio at the scale in production.
No items found.
.jpeg)

Blog
.jpeg)
Blog
Kubernetes Alluxio and the Disaggregated Analytics Stack
TL;DR: First the news - Alluxio support for K8s Helm charts now available! K8s is a certified environment for Alluxio. Now the take away- Alluxio brings back data locality for the disaggregated analytics stack in K8s. How? Read on.
Large Scale Analytics Acceleration


Presentation

Presentation
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
JD.com is China’s largest online retailer. It uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. One example of their computing framework, JDPresto, has gained a 10x performance improvement on average by deploying Alluxio.
No items found.
.jpeg)

Blog
.jpeg)
Blog
Data Orchestration Summit Recap and Highlights
We are delighted by the success of the inaugural Data Orchestration Summit on Nov. 7, 2019! Organized by Alluxio, this one-day event was sold out with nearly 400 attendees! Data engineers, cloud engineers, data scientists joined the talks of 24 industry leaders from all over the globe to share their experiences building cloud-native data and AI platforms. All session recordings and slides are now available.
No items found.


Presentation

Presentation
Workshop: Presto on Alluxio Hands-On Lab
DATA ORCHESTRATION SUMMIT 2019
This hands-on training run by the creators of Presto and Alluxio will cover how to get started with Presto and Alluxio. Attendees will get hands-on experience launching the EC2 instance, exploring the Alluxio filesystem and cluster status, and running queries with Presto on Alluxio where you’ll experience the performance benefits of using Alluxio in your analytics stack.
Presto is a widely popular sql query engine, and it is great for interactive sql analytics. However, when the data is remote or in object stores, performance becomes a challenge. Alluxio can improve Presto’s query performance by using Alluxio as a distributed cache layer co-located with Presto. Presto with Alluxio brings together two open source technologies to give you better performance and multi-cloud capabilities for interactive analytic workloads. Presto’s open source distributed SQL query engine coupled with Alluxio enables true separation of storage and compute for data locality and provides memory speed response time and aggregate data from any file or object store.
No items found.
Your selections don't match any items.