Two Sigma Meetup Recap: Achieving Compute and Storage Independence for Data-driven Workloads

April 12, 2019

This is a recap of the Two Sigma and Alluxio joint meetup hosted in New York. Two Sigma is a leading hedge fund that leverages cutting edge technology to train their models with petabytes of data in on-premise storage. Special thanks to Two Sigma for hosting. Here are the slides from the presentation.

In this meetup, Bin Fan from Alluxio and Wenbo Zhao from Two Sigma co-presented a reference stack (running Alluxio as a data access layer for Apache Spark) that can enable independent and separated compute and storage for big data and machine learning workloads.

Two Sigma’s use case is a great example of the benefits of this reference stack for bursting machine learning computation to the public cloud while still being able to access data stored on-premise efficiently. Their data scientists want to leverage the public cloud as a scalable and elastic computation resource to speed up the end-to-end model training process. By using Alluxio as the data access layer co-located with compute in the cloud, their researchers achieved 10x faster end to end processing, which enables them to perform more iterations on their models.

We had a great time interacting with the audience on the East coast and we look forward to the next NYC event!

To stay up to date on future events, join our meetup groups: Alluxio Open Source New York Meetup, Alluxio Open Source Bay Area Meetup.

If you are interested in hosting or presenting at a future event, please contact us at community@alluxio.com.

Share this post

Blog

Alluxio AI 3.8: Two New Breakthrough Features for Faster Object Storage Writes and Faster Model Loading

Learn about the new features in Alluxio AI 3.8 designed to eliminate two of the most painful bottlenecks in modern AI pipelines. Introducing Alluxio S3 Write Cache, which dramatically reduces object store write latency and improves write-heavy workload performance, and Safetensors Model Loading Acceleration that delivers near-local NVMe throughput for model weight loading

‍

Introducing Alluxio S3 Write Cache

For write-heavy AI and analytics workloads, cloud object storage can become the primary bottleneck. This post introduces how Alluxio S3 Write Cache decouples performance from backend limits, reducing write latency up to 8X - down to ~4–6 ms for concurrent and bursty PUT workloads.

Alluxio and Oracle Cloud Infrastructure: Delivering Sub-Millisecond Latency for AI Workloads

Oracle Cloud Infrastructure has published a technical solution blog demonstrating how Alluxio on Oracle Cloud Infrastructure (OCI) delivers exceptional performance for AI and machine learning workloads, achieving sub-millisecond average latency, near-linear scalability, and over 90% GPU utilization across 350 accelerators.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo