Enabling big data & AI workloads on the object store at DBS
October 14, 2019
By 
Vitaliy Baklikov
Dipti Borkar

The big data stack has evolved over the past few years with an explosion of data frameworks, starting with MapReduce and expanding to Apache Spark and Presto. The approach to managing and storing data has evolved as well, starting from using primarily Hadoop distributed file system (HDFS) to newer, cheaper, and easier technologies like object stores. But the design of most object stores inhibits real-time big data and AI workloads running directly on them.

Vitaliy Baklikov and Dipti Borkar explore a different architecture for analytic workloads, particularly those deployed in cloud environment. Alluxio, an open-source virtual distributed file system, provides a unified data access layer for hybrid and multicloud deployments. Alluxio enables distributed compute engines like Spark or Presto or machine learning frameworks like TensorFlow to transparently access different persistent storage systems (including HDFS, S3, Azure, etc.) while actively leveraging in-memory cache to accelerate data access.

Vitaliy and Dipti dive into how DBS Bank built a modern big data analytics stack, leveraging an object store as persistent storage even for data-intensive workloads, and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. In addition, deploying Alluxio to access data solves many challenges that cloud deployments bring with separated compute and storage.

Complete the form below to access the full overview:

Presentations

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer