On-Demand Videos
Scaling experimentation in digital marketplaces is crucial for driving growth and enhancing user experiences. However, varied methodologies and a lack of experiment governance can hinder the impact of experimentation leading to inconsistent decision-making, inefficiencies, and missed opportunities for innovation.
At Poshmark, we developed a homegrown experimentation platform, Lightspeed, that allowed us to make reliable and confident reads on product changes, which led to a 10x growth in experiment velocity and positive business outcomes along the way.
This session will provide a deep dive into the best practices and lessons learned from successful implementations of large-scale experiments. We will explore the importance of experimentation, overcome scalability challenges, and gain insights into the frameworks and technologies that enable effective testing.
In the rapidly evolving world of e-commerce, visual search has become a game-changing technology. Poshmark, a leading fashion resale marketplace, has developed Posh Lens – an advanced visual search engine that revolutionizes how shoppers discover and purchase items.
Under the hood of Posh Lens lies Milvus, a vector database enabling efficient product search and recommendation across our vast catalog of over 150 million items. However, with such an extensive and growing dataset, maintaining high-performance search capabilities while scaling AI infrastructure presents significant challenges.
In this talk, Mahesh Pasupuleti shares:
- The architecture and strategies to scale Milvus effectively within the Posh Lens infrastructure
- Key considerations include optimizing vector indexing, managing data partitioning, and ensuring query efficiency amidst large-scale data growth
- Distributed computing principles and advanced indexing techniques to handle the complexity of Poshmark’s diverse product catalog
As machine learning and deep learning models grow in complexity, AI platform engineers and ML engineers face significant challenges with slow data loading and GPU utilization, often leading to costly investments in high-performance computing (HPC) storage. However, this approach can result in overspending without addressing the core issues of data bottlenecks and infrastructure complexity.
A better approach is adding a data caching layer between compute and storage, like Alluxio, which offers a cost-effective alternative through its innovative data caching strategy. In this webinar, Jingwen will explore how Alluxio's caching solutions optimize AI workloads for performance, user experience and cost-effectiveness.
What you will learn:
- The I/O bottlenecks that slow down data loading in model training
- How Alluxio's data caching strategy optimizes I/O performance for training and GPU utilization, and significantly reduces cloud API costs
- The architecture and key capabilities of Alluxio
- Using Rapid Alluxio Deployer to install Alluxio and run benchmarks in AWS in just 30 minutes
Video: Presentation Slides: Deep Learning in the Cloud at Scale: A Data Orchestration Story from Alluxio, Inc.
Today, many people run deep learning applications with training data from separate storage such as object storage or remote data centers. This presentation will demo the Intel Analytics Zoo + Alluxio stack, an architecture that enables high performance while keeping cost and resource efficiency balanced without network being I/O bottlenecked.
Intel Analytics Zoo is a unified data analytics and AI platform open-sourced by Intel. It seamlessly unites TensorFlow, Keras, PyTorch, Spark, Flink, and Ray programs into an integrated pipeline, which can transparently scale from a laptop to large clusters to process production big data. Alluxio, as an open-source data orchestration layer, accelerates data loading and processing in Analytics Zoo deep learning applications.
This talk, we will go over:
- What is Analytics Zoo and how it works
- How to run Analytics Zoo with Alluxio in deep learning applications
- Initial performance benchmark results using the Analytics Zoo + Alluxio stack
Dataproc is Google’s managed Hadoop and Spark platform. In this talk, we will showcase how to swiftly build a hybrid cloud data platform with Alluxio and Presto and migrate data seamlessly.
Data infrastructure on-premises is increasingly complex and cloud adoption is attractive for business agility. Operating a hybrid environment is an approach to start benefiting from cloud elasticity quickly without abandoning the infrastructure on-premises. In this session I will discuss the benefits of using Alluxio’s Data Orchestration Platform to dynamically burst Apache Spark and Presto workloads to Amazon EMR for best performance and agility.
Describe benefits and methods Alluxio enables secure data access in the Comcast’s dx hybrid data cloud.
- Review the data access challenges and tradeoffs in hybrid cloud
- Review our hybrid architecture and the important role Alluxio plays
- Provide performance metrics to highlight the benefits
Unicom’s traditional batch architecture consists mainly of IOE, Hive, and Greenplum systems. With the development of business, a large number of computing application modules based on diverse scenarios, chimney-like, decentralized applications have emerged. To solve the problem of resource fragmentation, we have introduced a unified computing platform for computing ecology with Spark and Alluxio as the core. Alluxio plays an important role in accelerating data processing and ensuring process stability.
This talk introduces T3Go’s solution in building an enterprise-level data lake based on Apache Hudi & Alluxio, and how to use Alluxio to accelerate the reading and writing of data on the data lake when compute and storage are segregated.
Presto & Alluxio on AWS: How we build a Up-To-Date Data-Platform at Ryte. Video: Presentation Slides: Introducing the Hub for Data Orchestration from Alluxio, Inc.
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Comcast, GrubHub, FINRA, LinkedIn, Lyft, Netflix, Slack, Zalando, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
Delta Lake, a storage layer originally invented by Databricks and recently open sourced, brings ACID capabilities to big datasets held in Object Storage. While initially designed for Spark, Delta Lake now supports multiple query compute engines including Presto.
In this talk we discuss how Presto enables query-time correlations between Delta Lake, Snowflake, and Elasticsearch to drive interactive BI analytics across disparate datasets.
Video: Presentation Slides: Exploring Alluxio for Daily Tasks at Robinhood from Alluxio, Inc.
For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain from Facebook will introduce their teams’ collaboration with Alluxio on adding a local on-SSD Alluxio cache inside Presto workers at Facebook to improve queries with unsatisfied latency.
Video: Presentation Slides: Presentation Slides: Powering Interactive Analytics with Alluxio and Presto from Alluxio, Inc.