On-Demand Videos
Scaling experimentation in digital marketplaces is crucial for driving growth and enhancing user experiences. However, varied methodologies and a lack of experiment governance can hinder the impact of experimentation leading to inconsistent decision-making, inefficiencies, and missed opportunities for innovation.
At Poshmark, we developed a homegrown experimentation platform, Lightspeed, that allowed us to make reliable and confident reads on product changes, which led to a 10x growth in experiment velocity and positive business outcomes along the way.
This session will provide a deep dive into the best practices and lessons learned from successful implementations of large-scale experiments. We will explore the importance of experimentation, overcome scalability challenges, and gain insights into the frameworks and technologies that enable effective testing.
In the rapidly evolving world of e-commerce, visual search has become a game-changing technology. Poshmark, a leading fashion resale marketplace, has developed Posh Lens – an advanced visual search engine that revolutionizes how shoppers discover and purchase items.
Under the hood of Posh Lens lies Milvus, a vector database enabling efficient product search and recommendation across our vast catalog of over 150 million items. However, with such an extensive and growing dataset, maintaining high-performance search capabilities while scaling AI infrastructure presents significant challenges.
In this talk, Mahesh Pasupuleti shares:
- The architecture and strategies to scale Milvus effectively within the Posh Lens infrastructure
- Key considerations include optimizing vector indexing, managing data partitioning, and ensuring query efficiency amidst large-scale data growth
- Distributed computing principles and advanced indexing techniques to handle the complexity of Poshmark’s diverse product catalog
As machine learning and deep learning models grow in complexity, AI platform engineers and ML engineers face significant challenges with slow data loading and GPU utilization, often leading to costly investments in high-performance computing (HPC) storage. However, this approach can result in overspending without addressing the core issues of data bottlenecks and infrastructure complexity.
A better approach is adding a data caching layer between compute and storage, like Alluxio, which offers a cost-effective alternative through its innovative data caching strategy. In this webinar, Jingwen will explore how Alluxio's caching solutions optimize AI workloads for performance, user experience and cost-effectiveness.
What you will learn:
- The I/O bottlenecks that slow down data loading in model training
- How Alluxio's data caching strategy optimizes I/O performance for training and GPU utilization, and significantly reduces cloud API costs
- The architecture and key capabilities of Alluxio
- Using Rapid Alluxio Deployer to install Alluxio and run benchmarks in AWS in just 30 minutes
In February’s product school, Greg Palmer, Lead Solution Engineer at Alluxio, will present a live demo featuring Transparent URI, a key feature in Alluxio Enterprise Edition which provides ease of integration of Alluxio with your existing data stack without any changes to the location metadata of the Hive Metastore. Join us to learn the configurations and other advanced settings for employing Transparent URI to simplify DevOps of Alluxio implementation, allowing users to access their existing storage systems without changing URIs at application level.
In November’s Product School, Adit Madan, Director of Product Management at Alluxio, will highlights new features, enhanced manageability, improved security and performance in Alluxio 2.9 release.
Today, data engineering in modern enterprises has become increasingly more complex and resource-consuming, particularly because (1) the rich amount of organizational data is often distributed across data centers, cloud regions, or even cloud providers, and (2) the complexity of the big data stack has been quickly increasing over the past few years with an explosion in big-data analytics and machine-learning engines (like MapReduce, Hive, Spark, Presto, Tensorflow, PyTorch to name a few).
To address these challenges, it is critical to provide a single and logical namespace to federate different storage services, on-prem or cloud-native, to abstract away the data heterogeneity, while providing data locality to improve the computation performance. [Bin Fan] will share his observation and lessons learned in designing, architecting, and implementing such a system – Alluxio open-source project — since 2015.
Alluxio originated from UC Berkeley AMPLab (used to be called Tachyon) and was initially proposed as a daemon service to enable Spark to share RDDs across jobs for performance and fault tolerance. Today, it has become a general-purpose, high-performance, and highly available distributed file system to provide generic data service to abstract away complexity in data and I/O. Many companies and organizations today like Uber, Meta, Tencent, Tiktok, Shopee are using Alluxio in production, as a building block in their data platform to create a data abstraction and access layer. We will talk about the journey of this open source project, especially in its design challenges in tiered metadata storage (based on RocksDB), embedded state-replicate machine (based on RAFT) for HA, and evolution in RPC framework (based on gRPC) and etc.
Distributed systems are made up of many components such as authentication, a persistence layer, stateless services, load balancers, and stateful coordination services. These coordination services are central to the operation of the system, performing tasks such as maintaining system configuration state, ensuring service availability, name resolution, and storing other system metadata. Given their central role in the system it is essential that these systems remain available, fault tolerant and consistent. By providing a highly available file system-like abstraction as well as powerful recipes such as leader election, Apache Zookeeper is often used to implement these services. This talk will go over a generic example of stateful coordination service moving from Zookeeper to Raft.
Data platform teams are increasingly challenged with accessing multiple data stores that are separated from compute engines, such as Spark, Presto, TensorFlow or PyTorch. Whether your data is distributed across multiple datacenters and/or clouds, a successful heterogeneous data platform requires efficient data access.
In October’s Product School, Alluxio’s Lead Solutions Engineer Greg Palmer will present and demo how Alluxio enables you to embrace the cloud migration strategy or multi-cloud architecture for large-scale analytics and AI workloads. Alluxio also helps scale out your platform adoption for analytics and AI across multiple tenants and applications teams.
Shopee is the leading e-commerce platform in SouthEast Asia. In this presentation, Luo Li from Shopee will share their Data Infra team’s recent project on acceleration with Presto and storage servitization. He will share the details on how Shopee leverages Alluxio to accelerate Presto query and provide standardized methods of accessing data through Alluxio-Fuse and Alluxio-S3.
Apache Hudi’s open-source community is very active and healthy. In this talk, an overview of community-driven major features will be presented, followed by a deep-dive into two of those features, metastore and table management service, driven by Bytedance to illustrate Hudi’s platform vision.
In this presentation, Yingjun Wu, Founder @ RisingWave Labs will talk about the birth, the growth, and the prosperity of modern data stack. I will show you why modern data stack is more than a buzzword, and how it will possibly evolve in the next couple of years.
Streaming systems form the backbone of the modern data pipeline as the stream processing capabilities provide insights on events as they arrive. But what if we want to go further than this and execute analytical queries on this real-time data? That’s where Apache Pinot comes in.
OLAP databases used for analytical workloads traditionally executed queries on yesterday’s data with query latency in the 10s of seconds. The emergence of real-time analytics has changed all this and the expectation is that we should now be able to run thousands of queries per second on fresh data with query latencies typically seen on OLTP databases.
Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from streaming sources like Kafka, as well as from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage), and provides a layer of indexing techniques that can be used to maximize the performance of queries.
Come to this talk to learn how you can add real-time analytics capability to your data pipeline.
With the advent of the Big Data era, it is usually computationally expensive to calculate the resource usages of a SQL query. Can we estimate the resource usages of SQL queries more efficiently without any computation in a SQL engine kernel? In this session, Chunxu and Beinan would like to introduce how Twitter’s data platform leverages a machine learning-based approach in Presto and BigQuery to estimate query utilization with 90%+ accuracy.
This talk introduces the three game level progressions to use Alluxio to speed up your cloud training with production use cases from Microsoft, Alibaba, and BossZhipin.
- Level 1: Speed up data ingestion from cloud storage
- Level 2: Speed up data preprocessing and training workloads
- Level 3: Speed up full training workloads with a unified data orchestration layer
OceanBase Database, is an open-source, distributed Hybrid Transactional/Real-time Operational Analytics (HTAP) database management system that has set new world records in both the TPC-C and TPC-H benchmark tests. OceanBase Database starts from 2010, and it has been serving all of the critical systems in Alipay. Besides Alipay, OceanBase has also been serving customer from a variety of sectors, including Internet, financial services, telecommunications and retail industry.
In this tech talk, we will talk about the architecture of OceanBase and some typical use cases. This talk will include some technical topic such as Paxos replication, 2PC commit, LSM-Tree like storage, SQL optimizer and executor, city-level disaster recovery, etc.