New features streamline data pre-processing and loading phases, enabling better utilization of GPUs, greatly improving AI/ML training efficiency and reducing overall cost
SAN MATEO, CA – July 1, 2021 - Alluxio, the developer of open source data orchestration software for large-scale workloads, today announced the immediate availability of version 2.6 of its Data Orchestration Platform. This new release features an enhanced system architecture enabling AI/ML platform teams using GPUs to accelerate their data pipelines for business intelligence, applied machine learning and model training.
“Enterprises seeking competitive advantage are making greater use of machine learning and AI to derive insights from massive datasets,” said Haoyuan Li, Founder and CEO, Alluxio. “These datasets are often distributed across hybrid cloud environments, making more consistent and efficient data access critical to realizing the value from their AI/ML initiatives.”
“The success of machine learning depends on accurate ML models, which in turn depend on lots of heterogeneous training data,” said Kevin Petrie, VP of Research at Eckerson Group. “This creates a bottleneck unless you efficiently apply the right compute to the right data. Alluxio aims to apply GPU compute to large datasets faster, which can help speed data ingestion, data transformation, and model training.”
In the latest release, Alluxio improves its system architecture to best support AI/ML applications using the POSIX interface. System performance is maximized by removing inter-process latency overheads, which is critical for enabling full utilization of compute resources. Aside from I/O performance, the end-to-end workflow of data preprocessing, loading, training, and result writing is well supported by Alluxio’s data management capabilities.
“Machine learning applications benefit greatly from the performance acceleration offered by GPUs. However, when utilizing powerful compute hardware, the limiting factor of the workload often shifts to I/O where workloads become bound on how fast data can be made available to the GPUs as opposed to how fast the GPUs can do training computations,” said Adit Madan, Product Manager, Alluxio. “Alluxio 2.6 bridges this gap in performance with a data orchestration layer for AI/ML workloads, allowing applications to fully utilize expensive and powerful hardware without encountering the data access and I/O bottlenecks.”
Alluxio 2.6 Community and Enterprise Edition features new capabilities, including:
Faster Data Access with a Large Number of Small Files
Alluxio 2.6 unifies the Alluxio worker and FUSE process. By coupling the two, significant performance improvements are achieved due to reductions in inter-process communication. This is especially evident in AI/ML workloads where file sizes are small and RPC overheads make up a significant portion of the I/O time. In addition, containing both components in a single process greatly improves the deployment of the software in containerized environments, such as Kubernetes. These enhancements substantially reduce data access latency, enabling users to process greater amounts of data more efficiently to deliver more AI/ML benefits to the business.
Simplified Data Management and Operability
Alluxio 2.6 enhances the mechanism to load data into Alluxio managed storage and introduces more traceability and metrics for easier operability. This distributed load operation is a key portion of the AI/ML workflow, and adjustments to the internal mechanisms have been made to optimize for the common case of loading prepared data for model training.
Improved System Visibility and Control
Alluxio 2.6 adds a large set of metrics and traceability features enabling users to drill into the system’s operating state. These range from aggregated throughput of the system to summarized metadata latency when serving client requests. This new level of visibility can be used to measure the current serving capacity of the system and identify potential resource bottlenecks.Request level tracing and timing information can also be obtained for deep performance analysis. These new features enable users to get new levels of visibility and control for improving SLAs of their large-scale data pipelines for a wide variety of use cases.
Availability
Free downloads of Alluxio 2.6 open source Community Edition and trials of Alluxio Enterprise Edition are generally available here: https://www.alluxio.io/download/
Resources
- To learn more about the Alluxio 2.6 release, read the product blog or the official release notes.
- For general information about Alluxio, visit https://www.alluxio.io.
Tweet this: @Alluxio brings substantial performance and ease of use improvements to GPU-centric AI/ML workloads with latest release #analytics #AI #DataOrchestration https://bit.ly/35PMQy8
About Alluxio
Alluxio, a leading provider of the high performance data platform for analytics and AI, accelerates time-to-value of data and AI initiatives and maximizes infrastructure ROI. Uniquely positioned at the intersection of compute and storage systems, Alluxio has a universal view of workloads on the data platform across stages of a data pipeline. This enables Alluxio to provide high performance data access regardless of where the data resides, simplify data engineering, optimize GPU utilization, and reduce cloud and storage costs. With Alluxio, organizations can achieve magnitudes faster model training and serving without the need for specialized storage, and build AI infrastructure on existing data lakes. Backed by leading investors, Alluxio powers technology, internet, financial services, and telecom companies, including 9 out of the top 10 internet companies globally. To learn more, visit www.alluxio.io.
Media Contact:
Beth Winkowski
Winkowski Public Relations, LLC for Alluxio
978-649-7189
beth@alluxio.com
News & Press
The Global Data Center Market achieved a valuation of $196.9 Billion in 2023. It is projected to exhibit steady growth, reaching $464.6 Billion by 2032, with a compound annual growth rate (CAGR) of 10.30% during the forecast period (2024–2032). However, resolving security, operational efficiency, and environmental impact issues will be critical to continuing this growth trajectory, reports Straits Research.
Here, experts in the field offer their predictions for what 2025 holds for data centers