AWS Cost & Usage Report (CUR) as a service (CURAAS?)

Lokesh Taneja
6 min readFeb 22, 2021

For those of you who’ve ever tried to decode how AWS bills, you’ve probably discovered the Cost & Usage Report (CUR) — “The most comprehensive set of cost and usage data available”. For those that haven’t, a detailed description can be found here. The good thing for those who are hands on and data savvy, AWS gives an option to setup and get delivery of this raw data in your local S3 bucket up to three times a day.

Why CUR, why not just use Cost Explorer?

AWS has tried its best to visualize all relevant data recorded in CUR on AWS Cost Explorer, but there are just too many use cases and details they’ve missed out. For example, one of the most notorious categories is the “EC2-Other”. A simple google search will show you the frustration users have had decoding “wth is this EC2-Other?” Here is how cost explorer represents it:

AWS Cost Explorer View

You can flip through all the group bys, add filters and try decoding this, but if you have multiple instances, how do you attribute the costs? EC2-Other is essentially the grouping of things like EBS, Data transfer, Elastic IPs etc. for the EC2. This is only one resource, AWS has hundreds of resources that have similar vague and abstracted groupings making it challenging and time-consuming to get RCA for cost fluctuations or even get true costs of resources.

In comes the CUR, a granular data source with line items for each resource at every hour. If one knows their way around the AWS terminologies and has a hand in writing SQL queries, you can extract answers to almost all AWS Bill related questions you may have.

Show me this CUR Magic

Now before querying the CUR, you will need to set it up. We will not cover how to set it up in this article, AWS covers it in details here. A few points to consider while setting it up:

  • The daily CUR is delivered upto 3 times a day and can be fairly large in size, in order to prevent creeping S3 storage costs, select option to overwrite the file and set a lifecycle policy on the S3 bucket
  • If you have an organization and would like to capture data from all child accounts, setup the CUR in the payee account (the account you pay your AWS bills from)
  • If using Athena to query, select a compressed format like parquet since Athena charges on data scan size
  • Do not try to manually create the Athena table, let the glue crawler create and update it, some of the data types from the CUR are not straightforward, alot of the places you would expect an integer datatype is using double
  • The number of columns in the CUR are dynamic, hence ensure the crawler runs every now and then to update the columns. There is a way to automate this by looking at the manifest json and triggering if a delta is found.
  • If you will be querying the data often, consider creating partitions or limiting the time duration in the where clause to optimize

Once Athena is all setup you are ready to query the data. But not quite yet, like most of AWS, the CUR has been evolving, new columns and data are added frequently. AWS created a comprehensive data dictionary that describes the columns and all corresponding descriptions. On frequent querying, you will get accustomed to the column names and data categories, but initially this may all seem overwhelming. Putting the ramp up curve aside, now lets see what data we can extract for the EC2-Other from the CUR.

From querying this data 100s of times, i know the columns of interest here are line_item_operation for top level cost contribution, line_item_usage_type for the usage data & line_item_resource_id for pinpointing the exact resource. Lets run the query and see what contributes to that $9.6 on Feb-15.

select
line_item_resource_id,
- Contains the id of the resource
line_item_operation, - Contains the operation for the line item
line_item_usage_type, - Contains usage details for the line item
pricing_unit, - Contains usage units for the line item
sum(line_item_unblended_cost) as cost, - Unblended cost
sum(line_item_usage_amount) as usage - Usage amount
from {Table_Name) - Your Athena tablewhere line_item_product_code = 'AmazonEC2'
and date(line_item_usage_start_date) = date('2021-02-15')
group by 1,2,3,4

You will get quite a few rows of output, export this in a csv and pivot the data. The results:

Pivot data from athena query

The highlighted values account for EC2-Other $9.6, the resource and operations that attribute to the cost. Now $9.6 may not be a value to go through all this setup hassle and learning rampup, but if you have been building on AWS, sooner or later you will be going down this path and pulling your hair figuring out where all these vague cost inflations are coming from.

An ROI factor, when to go down this path?

Considering you have a monthly bill of > $1000, you could roughly estimate the following cost of deepdive per incidence:

Data rampup — 1 Hr
Query construction — 1–2 Hrs
Pivoting and analyzing data — 1–2 Hrs
Total — 3–5 Hrs

The cost to avoid Bill Shock and nip cost anomalies at the bud could be $300-$500 per incidence, assuming the dev is @ $100/hr. An important factor, the 3–5 Hrs in RCA has been taken away from the devs actual job and what they love doing.

“This doesn’t seem like a good ROI at all and plus the added hassle” — I agree with you, but when you hit that looping bug in lambda or that spiraling cloudwatch usage, this effort will reward in ROI.

Now if you don’t want to get your hands dirty….

You could use Cloudshim, it provides you automated analysis of all the costs, and with a double click, it easily surfaces the metrics that attribute to the cost. Its out-of-the-box ready to use and built ground up on the CUR.

See the data surfaced by a double click for the same date above in just a few seconds:

Cloudshim Cost Browser for same date

Forget SQLing the CUR, in the Cost Browser (We know, very original!), you can slice and dice the data as you like with custom columns and pass any data into the charts for ad-hoc analysis. Don’t want to slice and dice? Use the many predefined templates, or Request one and our CUR experts 🤓 will craft the template for you.

Predefined Templates

Concluding

On average, the meantime to respond (MTTR) for cost anomalies, deepdives, finance asks etc. on traditional AWS UI is daunting and 5–6X slower. Cost incurred inflates due to the operational hours spent in investigating.

Running lean on AWS requires a proactive effort from the user side and as the services grow so does the data. Cloudshim is built to allow businesses to be agile on AWS, spread cost awareness across engineering teams and most importantly free bandwidth navigating native AWS UI, searching for answers.

--

--