Resource Hub
The problem with data modernization initiatives is that they result in distributed datasets that impede analytics projects. As enterprises start their cloud migration journey, adopt new types of applications, data stores, and infrastructure, they still leave residual data in the original location. This results in far-flung silos that can be slow, complex and expensive to analyze. As business demands for analytics rise—along with cloud costs—enterprises need to rationalize how they access and process distributed data. They cannot afford to replicate entire datasets or rewrite software every time they study data in more than one location.
ALLUXIO DAY x APAC Modern Data Stack 2022
Alluxio (www.alluxio.io) is an open-source virtual distributed file system that provides a unified data access layer for hybrid and multi-cloud deployments. It enables distributed compute engines like Spark, Presto or Machine Learning frameworks like TensorFlow to transparently access different persistent storage systems (including HDFS, S3, Azure and etc) while actively leveraging in-memory cache to accelerate data access. Developed originally from UC Berkeley AMPLab as research project “Tachyon”, Alluxio has more than 1200 contributors and is used by over 100 companies worldwide with the largest production deployment over 1000 nodes.
This blog was originally published on the website of NetApp: https://www.netapp.com/blog/modernize-analytics-workloads-netapp-alluxio/
Imagine as an IT leader having the flexibility to choose any services that are available in public cloud and on premises. And imagine being able to scale your storage for your data lakes with control over data locality and protection for your organization. With these goals in mind, NetApp and Alluxio are joining forces to help our customers adapt to new requirements for modernizing data architecture with low-touch operations for analytics, machine learning, and artificial intelligence workflows.
In the previous blog, we introduced Uber’s Presto use cases and how we collaborated to implement Alluxio local cache to overcome different challenges in accelerating Presto queries. The second part discusses the improvements to the local cache metadata.