Blog

Uptycs Chooses Alluxio to Power GenAI Natural Language Analytics at Terabyte Scale

Suresh Kumar Veerapathiran and Anudeep Kumar, engineering leaders at Uptycs, recently shared their experience of evolving their data platform and analytics architecture to power analytics through a generative AI interface. In their post on Medium titled Cache Me If You Can: Building a Lightning-Fast Analytics Cache at Terabyte Scale, Veerapathiran and Kumar provide detailed insights into the challenges they faced (and how they solved them) scaling their analytics solution that collects and reports on terabytes of telemetry data per day as part of Uptycs Cloud-Native Application Protection Platform (CNAPP) solutions.

AI/ML Infra Meetup at Uber Seattle: Tackling Scalability Challenges of AI Platforms

Insights from from Uber, Snap, and Alluxio on LLM training, fine-tuning, deployment, designing scalable architectures, GPU optimization, and building recommendations systems.

New Features in Alluxio Enterprise AI 3.5

With the new year comes new features in Alluxio Enterprise AI! Just weeks into 2025 and we are already bringing you exciting new features to better manage, scale, and secure your AI data with Alluxio. From advanced cache management and improved write performance to our Python SDK and S3 API enhancements, our latest release of Alluxio Enterprise AI delivers more power and performance to your AI workloads. Without further ado, let’s dig into the details.

‍

Speed Trino Queries with These Performance-Tuning Tips

Top Tips and Tricks for PyTorch Model Training Performance Tuning 2024

Trino Optimization With Distributed Caching on Data Lakes: Trino Fest 2023 Session Recap

Data Caching Strategies for Data Analytics and AI: Data+AI Summit 2023 Session Recap

What's New in Alluxio Enterprise 2.10: Radically Resource-efficient for Improved Speed at Lower Cost

Building High-performance Data Access Layer for Model Training and Model Serving for LLM

Bringing a large language model from its initial training to deployment requires numerous systems and components. At Zhihu, we grappled with a multi-cloud, cross-region AI platform, requiring an efficient solution to facilitate the rapid training and delivery of models for production use cases. This led us to adopt Alluxio, the high-performance data access layer for LLM. This blog provides an in-depth look at Zhihu’s challenges, journey, and solution for LLM training and deployment. Through adopting Alluxio, we’ve significantly enhanced model training performance by 2 to 3 times and can deploy updated models every minute instead of hours or days. Also, our GPU utilization has doubled, infrastructure and operation costs have been halved, and we have established a resilient, efficient infrastructure capable of meeting our escalating AI demands.

Millions Saved Annually: Unleashing the Power of Alluxio HDFS at Uber

Saving Cloud Costs in 2023: Top Five Strategies to Reduce AWS Cloud Data Transfer Fees

Announcing Our First AI PMC Member -- CacheGPT

Alipay: Optimizing Alluxio for Efficient Large-Scale Training on Billions of Files

Cross Cluster Synchronization in Alluxio: Part 2 - Mechanism

This is part 2 of the blog series talking about the design and implementation of the Cross Cluster Synchronization mechanism in Alluxio. In the previous blog, we discussed the scenario, background and how metadata sync is done with a single Alluxio cluster. This blog will describe how metadata sync is built upon to provide metadata consistency in a multi-cluster scenario.

‍

Cross Cluster Synchronization in Alluxio: Part 3 - Discussions and Conclusion

Following part 1 and part 2, this final blog of the series discusses some design decisions and details, as well as certain future work.

Your selections don't match any items.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo