The hybrid cloud model, where cloud resources run Spark or Presto jobs against data stored on-premises, is an appealing solution to reduce resource contention in on-premise environments while also saving in overall costs. One key flaw in a hybrid model is the overhead associated with transferring data between the two environments. Data and metadata locality within the compute application must be achieved in order to maintain the similar performance of analytics jobs as if the entire workload was run on-premises.
In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.
We will cover:
- Typical challenges of moving data to the cloud and expanding compute capacity.
- Details about “zero-copy” hybrid cloud solution for burst computing
- A demo of running Presto analytic queries using remote on-prem HDFS data with Alluxio deployed in AWS EMR
The hybrid cloud model, where cloud resources run Spark or Presto jobs against data stored on-premises, is an appealing solution to reduce resource contention in on-premise environments while also saving in overall costs. One key flaw in a hybrid model is the overhead associated with transferring data between the two environments. Data and metadata locality within the compute application must be achieved in order to maintain the similar performance of analytics jobs as if the entire workload was run on-premises.
In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.
We will cover:
- Typical challenges of moving data to the cloud and expanding compute capacity.
- Details about “zero-copy” hybrid cloud solution for burst computing
- A demo of running Presto analytic queries using remote on-prem HDFS data with Alluxio deployed in AWS EMR
Video:
Slides:
The hybrid cloud model, where cloud resources run Spark or Presto jobs against data stored on-premises, is an appealing solution to reduce resource contention in on-premise environments while also saving in overall costs. One key flaw in a hybrid model is the overhead associated with transferring data between the two environments. Data and metadata locality within the compute application must be achieved in order to maintain the similar performance of analytics jobs as if the entire workload was run on-premises.
In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.
We will cover:
- Typical challenges of moving data to the cloud and expanding compute capacity.
- Details about “zero-copy” hybrid cloud solution for burst computing
- A demo of running Presto analytic queries using remote on-prem HDFS data with Alluxio deployed in AWS EMR