AI and Analytics Solutions for Autonomous Driving and Smart Vehicles
Scaling Analytics & AI Platform on Existing Infrastructure for Autonomous Driving and Smart Vehicles
wHAT’S NEW
ALLUXIO WEBINAR | 10X Faster trino queries on your data platform | tue, june 18 @11:00am pst
As Trino users increasingly rely on cloud object storage for retrieving data, speed and cloud cost have become major challenges. The separation of compute and storage creates latency challenges when querying datasets; scanning data between storage and compute tiers becomes I/O bound. On the other hand, cloud API costs related to GET/LIST operations and cross-region data transfer add up quickly.
The newly introduced Trino file system cache by Alluxio aims to overcome the above challenges. In this session, Jianjian will dive into Trino data caching strategies, the latest test results, and discuss the multi-level caching architecture. This architecture makes Trino 10x faster for data lakes of any scale, from GB to EB.
Alluxio optimizes your analytics and AI platform on your existing infrastructure, enabling your organization to gain a competitive edge in the automotive industry with innovations that yield better outcomes.
Unify Data Access
Provide a single point of access to multiple data lakes, making hybrid and multi-cloud data infrastructure a reality.
Save Data Infrastructure Costs
Enable up to 70% in data infrastructure TCO savings. Eliminate I/O stall to increase the ROI of GPU resources.
Accelerate ML and Data Pipelines
Deliver unparalleled performance, with up to 20x model training speed and 10x model deployment speed.
“WeRide aims at delivering Level 4 autonomous driving technology for the future. Data access is a critical part of developing smart mobility. Alluxio is the right partner – we experienced many improvements using Alluxio, including reducing the complexity of data synchronization by having a single interface to access data, and saving S3 egress cost of downloading redundant data.”
Derek Tan
Executive Director of Infra & Simulation at WeRide
>> Read the Case Study
Data and AI Challenges
Analytics
Data Silo Challenges
While scaling cross multiple data lakes, data is siloed among different storage systems, clouds, and regions. These silos require highly complex data pipelines by replicating data, introducing additional data engineering complexity and costs.
Slow Analytics Speed
Immediate access to data lakes is critical as engineers need results from data queries quickly. However, replicating data and managing pipelines can easily consume more than half of an engineer’s working time, causing delays in analytics results.
High Data Infrastructure Costs
As telemetry volume grows and fleet size increases continuously, cloud costs add up correspondingly and are difficult to manage and predict. Every time data is replicated from one silo to another, costs are incurred. API costs (GET object operations) and egress costs (cross-region data transfer fees) add up over time and are difficult to predict.
AI/ML
Slow AI Pipeline
Data loading on a massive number of small files (usually images) delays model training performance with insufficient I/O speed, leaving GPUs idle. For example, traditional NAS solutions cannot deliver the speed to feed GPUs with enough throughput. Copying training data from NAS to storage on GPU servers for faster training is not scalable for production.
Hard to Scale
Models must be tested on huge datasets to provide confidence. Vehicles generate a mountain of data every day, resulting in petabyte or even exabyte-scale data lakes with billions of files. Current storage scalability is not designed to meet future capacity requirements.
Rising Costs
Cloud object storage costs add up quickly with frequent GET object operations and cross-region data transfers. Specialized storage provides good performance but is very expensive. In addition, GPUs are underutilized, wasting expensive compute resources. All of the above leads to challenges of rising costs.
“Alluxio is the best choice for optimizing our model training platform. Traditional NAS could not scale to handle an increasing volume of data and could cause issues with data synchronization. Alluxio, on the other hand, provides excellent scalability and allows us to expand our AI infrastructure without sacrificing performance. Additionally, our data scientists have seen a significant improvement in productivity since they no longer have to waste time copying data between systems.”
An autonomous driving computing platform company
Alluxio solves data challenges by providing a platform between your existing compute engines and storage systems, optimizing data access at every step of your data pipeline to accelerate analytics and AI workloads, on-prem, in the cloud, or both.
Alluxio Data Platform has two product offerings – Alluxio Enterprise Data and Alluxio Enterprise AI.
Related Resources
Accelerate Auto Data Tagging with Alluxio and Spark in Hybrid Cloud – A Practice in WeRide
WeRide Case Study
Building High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go
T3Go Case Study
Automaker Geely’s Efficient Data Lake Architecture with Alluxio
Geely Case Study
End-to-End
Machine Learning Pipeline
with Alluxio
Product Demo
The Ultimate Guide to
Saving Data Egress Costs
in the Cloud
Ebook
Accelerate Distributed PyTorch/Ray Workloads
in the Cloud
On-demand Video
We offer a complimentary technical consultation looking at your current and future needs, including a proposal for proof of concept.