Lenovo is an Alluxio customer with a common problem and use case in the world of data analytics. They have petabytes of data in multiple data centers in different geographic locations. Analyzing it requires an ETL process to get all of the data in the right place. This is both slow, because data has to be transferred across the network, and costly because multiple copies of the data need to be stored. Freshness and quality of the data can also suffer as the data is also potentially out of date and incomplete because regulatory issues prevent certain data from being transferred. Lenovo deployed Alluxio to address these issues and eliminate the need for ETL altogether. Data from multiple locations is cached in Alluxio and multiple applications can access it for analysis. The data is either accessed locally from memory or fetched by Alluxio for new requests as needed from persistent storage. The working data set is always available without the need for ETL. Alluxio fits within the existing security frameworks and enforces the policies in place, ensuring regulatory and compliance requirements from different countries and jurisdictions are met. You can learn more in the full case study
.png)
Blog

Alluxio's strong Q2 featured Enterprise AI 3.7 launch with sub-millisecond latency (45× faster than S3 Standard), 50%+ customer growth including Salesforce and Geely, and MLPerf Storage v2.0 results showing 99%+ GPU utilization, positioning the company as a leader in maximizing AI infrastructure ROI.

In this blog, Greg Lindstrom, Vice President of ML Trading at Blackout Power Trading, an electricity trading firm in North American power markets, shares how they leverage Alluxio to power their offline feature store. This approach delivers multi-join query performance in the double-digit millisecond range, while maintaining the cost and durability benefits of Amazon S3 for persistent storage. As a result, they achieved a 22 to 37x reduction in large-join query latency for training and a 37 to 83x reduction in large-join query latency for inference.