Many organizations have taken advantage of the scalability and cost-savings of cloud computing as well as cloud storage services to meet their data-powered workload demands. In addition, as data is increasingly siloed and lives everywhere, there’s a need for data orchestration to bring the needed data closer to compute. With Alluxio’s data orchestration platform, bring back data locality for your compute with in-memory & tiered data access.
Key Benefits:
• Cache data from S3 for Spark, Presto or Hive co-locating it on the same instance as compute
• Scale analytics workloads directly on remote, on-prem data without copying and syncing data into the cloud
• Improve performance with better data locality and get HDFS & S3 compatible data access layer on AWS EMR automatically synced with S3.
Solution briefs
International Data Corporation (IDC) reported that the global datasphere will grow from 33 zettabytes in 2018 to 175 zettabytes by 20251. This trend becomes more and more complicated with the variety and velocity of data growth, and it continuously changes the ways data is collected, stored, processed, and analyzed. New analytics solutions, including machine learning, deep learning, and artificial intelligence (AI), and new architectures and tools are being developed to extract and deliver value from the huge datasphere.
Intel Deep Learning (DL) Boost with BFloat16 (BF16) demonstrates benefits across deep learning training workloads with the same accuracy as 32-bit floating-point (single-precision) (FP32). Recently Amazon introduced EC2 M6i instances powered by the latest-generation Intel Xeon Scalable Processors. Intel and Alluxio collaborate to measure a 20-25% price/performance improvement over the prior generation for machine learning models with PyTorch on AWS. This collaboration demonstrates data preprocessing and training at lower cost on CPUs using Alluxio as the data access layer to cloud storage.