Understand the data transfer and egress costs of Amazon Web Services and how to reduce them.
The Challenges with Rising Cloud Costs
Cloud storage has gained popularity recently, but managing AWS or other cloud service costs can be a significant challenge for enterprises. While AWS offers free data input to the cloud service, they charge customers high fees for retrieving them from the cloud, a.k.a. data transfer fees/data egress fees.
AWS egress costs have affected 34% of enterprises using cloud storage. According to The Information, Apple spent $50 million in egress fees in one year, while Pinterest spent over $20 million. Netflix and Airbnb also incurred charges exceeding $15 million. Adobe, Snap, and Salesforce also paid over $7 million each.
This blog explains cloud data transfer and the five best practices, resources, and tools for saving your organization on AWS pricing and cloud costs.
Ultimate Guide to Saving Cloud Egress Costs
What Is the Meaning of Data Transfer?
- Data transfer is a generic term that refers to any movement of data over the network. The movement can be within the same cloud or between a cloud and an external location, such as another cloud or on-premise infrastructure. Data transfers involve moving data into the cloud or out of the cloud.
- Ingress, or inbound, refers to the transfer of data into a cloud.
- Egress, also known as outbound, refers to the transfer of data from a public cloud to another type of infrastructure, typically an on-premises data center.
AWS refers to egress/ingress as data transfer out/in, respectively, while Google Cloud and Microsoft Azure use the terms Egress/Ingress.
How Does AWS Price Data Transfers?
The following table demonstrates the AWS pricing model of S3 (Amazon Simple Storage Service). These terms can help you estimate and manage your cloud costs.
*Amazon S3 Pricing Table (by traffic per month)
- 100 GB - 10 TB: $0.09/GB
- 10 - 50 TB: $0.085/GB
- 50 - 150 TB: $0.07/GB
- > 150 TB: $0.05/GB
- > 500 TB: Contact sales
There will be no egress fees when:
- Data transferred out to the internet for the first 100GB per month, aggregated across all AWS Services and Regions (except China and GovCloud)
- Data transferred between S3 buckets in the same AWS Region
- Data transferred from an Amazon S3 bucket to any AWS service(s) within the same AWS Region as the S3 bucket (including to a different account in the same AWS Region)
- Data transferred out to Amazon CloudFront
*Note: AWS pricing applies to the AWS US East Ohio region, As of April 14, 2023. https://aws.amazon.com/s3/pricing/
Optimizing Data Transfer Costs With Five Best Practices
Best Practice 1: Use Caching to Access Data On Demand
Caching is a very effective technique for egress cost reduction because it avoids repeatedly fetching data across the network.
An access-based caching strategy is recommended. This allows caching to decide what data to store and how long to store it instead of manually identifying the exact data needing access. Access-based caching can help improve performance and reduce latency by storing frequently accessed data in an optimal storage tier close to the application.
Using caching to access cross-region data on demand, Expedia Group has reduced egress costs on Amazon S3 by 50%. Read the full story here.
Best Practice 2: Streamline Your Data Pipeline to Minimize Replications
When designing your data pipeline, it's essential to consider efficient ways to handle data flow, minimize data duplication and streamline your pipeline associated with egress to save costs.
For replication across regions or clouds, it is advisable to replicate only the changes, known as deltas. To accomplish this, you can consider implementing data management policies based on the creation date or only capturing updated data. This policy-driven approach helps perform incremental data synchronization, resulting in savings and lower egress costs as the amount of data transferred is minimized.
To keep data flow to a minimum, it's also advisable to use compression techniques to reduce the amount being transferred. By compressing data, it's possible to reduce the amount of data that needs to be transferred, thereby reducing egress costs.
Best Practice 3: Optimize The Data Outflow of Your Architecture
Understanding and optimizing the data outflow points in your architecture is essential to minimize cloud costs. Egress fees arise primarily when compute resources are dispersed across multiple regions or availability zones. Also, the charges vary across different regions and zones.
Keep in mind that egress costs increase as the destination moves further away. There are almost always charges between cloud regions, and the cost of transfers between availability zones within a region depends on the cloud service being used. No costs are associated within a single high availability or zone, as mentioned in the previous free will section.
Design an infrastructure that follows the least expensive routes. Try to minimize traffic to the internet and between regions and availability zones while maximizing traffic within an availability zone, or at the very least, within a region.
Best Practice 4: Monitor Your Egress Fees Regularly
Regularly monitoring egress fees is crucial to prevent unexpected bills and minimize disruptions to cloud operations and billing. Observability helps you understand where your egress costs are allocated and take appropriate action before they become a significant financial burden.
Adding cost allocation tags to your reserved instances and load balancers is an effective way to achieve this. AWS provides a feature called cost allocation tags, which can be used with tools like the AWS Cost Explorer to analyze egress fees stemming from instances and services.
After filtering your data transfer costs, observing the trend over multiple months is important to identify any sudden fee increases or spikes in your monthly bill or costs. Grouping by the Name tag can help you identify which instances or services contribute to the price increase in your bills.
Best Practice 5: Negotiate Customized Pricing with Your Cloud Provider
If you require support for traffic exceeding 500 TB per month, it's recommended to negotiate a customized quote for business or organization-specific deals.
For example, AWS offers Private Pricing Programs that allow you to commit to using a minimum amount of egress traffic each month between designated regions for services.
Want to Learn More?
For more detailed strategies with pro tips and case studies, download this eBook:
The Ultimate Guide to Saving Data Egress Costs in the Cloud
Not using AWS? No worries! This eBook also offers techniques to save your Google Cloud or Azure costs. Get it now for free.
Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.