This article will focus on initial configuration we do when setting up a new AWS Account with Terraform. We will not cover Terraform basics.
AWS Account users
AWS Accounts start with a single
root user. Access to this user needs to be carefully managed and following AWS best practices, the
root user should not be used for day to day operations.
You may need to use the AWS console to set up an administrator user for yourself, get AWS Keys and configure awscli on your laptop first. Hopefully this is one of the few times we directly interact with the AWS Console as we try to codify all of our configuration to ensure reproducibility.
A future post on this blog will focus on templating Terraform code, which we use to manage our AWS Account users.
AWS Naming conventions and billing tags
To keep track of our cloud resources and allow splitting costs by Tags, we should have a set of pre-defined resource tags consistently used across our cloud infrastructure. For this, we copied the Tags defined in the cloudposse/terraform-null-label module for all our resources, these tags seem to suit small startups like us.
You will see us use these tags in the next section of this article.
Once you configured
awscli, download Terraform.
Note: We are using TF 0.11.x in this article. There is a few differences with 0.12.x, so be aware of that if you want to follow along …
Terraform heavily relies on a so called
state file. By default, the state file is kept locally within the project folder on your machine. For several team members to manage your cloud infrastructure, you will need to move the state file remotely and control access to it. Additionally, to keep TF configurations small and manageable as well as reduce blast radius for infrastructure changes, you need to split TF configurations across different state files referencing each other. Splitting state files is more portable when keeping them in a remote backend.
However, without any cloud infrastructure, you will have to start with a local state file until you have configured the necessary cloud resources.
Note: Terraform Cloud is a recent offering from Hashicorp which is free for small teams and removes the need for you to set up the remote state while also providing other workflow features. You may skip to the next section if you prefer to use Terraform Cloud (we will however not cover Terraform Cloud in this article).
At SWAT, we use Terraform itself to configure its own remote state storage. As we are using AWS, we use S3 to store the state files and DynamoDb to manage locks and checksums. We also encrypt the state files with KMS as they will often contain sensitive information.
Let’s start by defining variables for the tags mentioned in the Naming Convention section earlier as well as the AWS region we will use:
Next we define an S3 Bucket, a KMS Key for encryption and a DynamoDb table:
To configure our Terraform clients to put and fetch the state from S3 with encryption as required, we should capture the resource identities by defining them as outputs:
Finally, to use this in our future Terraform projects, we just need to provide the
backend configuration in the
terraform provider block:
Adding a new backend configuration, will actually force you to re-init your project folder. Terraform will then recognize you are moving the local state to the remote s3 bucket. If we add this backend configuration to our TF config above, TF will actually migrate its own state to the resources it manages (inception)! By keeping this TF configuration separate from all the future TF configuration, we ensure nobody accidentally deletes the state files for all our Cloud resources!
Infrastructure as code means version control can now be used for audit and traceability purposes (i.e.
git). We now also need to decide how we will organize our Infra repositories.
At SWAT we started out with an over-arching account level repository to create Billing Alerts, Route53 zones, VPCs with cross account peering, KMS Keys and 3rd party services. We only have
Production environments per Amazon Account and an additional
Services environment on our main account. We organized all these environments in a separate monorepo per account.
We still follow this logic for new accounts, but if I was to redesign this today, I would try to follow the monorepo workflow more closely as sub-folders can have their independent account and state configurations. Due to our limited team size, we do not require separate repositories for easier access control purposes for example.
For a more detailed overview of Terraform project structures, you may refer to:
- Anton Babenko - Terraform project structure - 2016
- Piotr Szymański - Terraform project structure - 2019
AWS billing alerts and budgets
As your startup scales, AWS tends to provide great support through your account manager. Account managers can help provide credits for PoCs of AWS products, help arrange sessions with their solution architects as well as secure invitations to product trainings for you and your team. This is not pure altruism by AWS, but necessary as other Cloud Providers such as Google do the same and will gladly help you migrate over quickly. Do make time to meet with your account manager and help identify useful resources for your startup.
One of the first things any account manager will stress is the need to set up billing alerts and budgets (as well as locking in Reserved Instances). If you choose to ignore these warnings, you may get a very large and unpleasant AWS bill which can eat precious runway from your startup. It is annoying to set up these alerts, but the following code snippets make it super easy!
Therefore, I strongly recommend that as part of bootstrapping your AWS account, you effectively set some sensible budgets and billing alerts.
Note: For this configuration I referenced a blog post by Kyle Galbraith, updated the information to the current latest Terraform AWS provider and added Slack notifications.
First, an unavoidable manual step is required: Enable billing metrics through the AWS Billing and Cost Management Console.
We also have to note that most of this AWS functionality is only accessible through its
us-east-1 region, if this is not your primary region - you can mix AWS resources across regions within the same TF config by using provider aliases as follows:
Next, we define a CloudWatch Alert and SNS topic to publish billing alerts (consuming the alerts will be covered in the last section of this article)
This will create alerts when our total bill is estimated to exceed our defined threshold. From our own experience, we identified certain services which need a closer eye and strongly recommend to also set up AWS Budgets for specific services.
Before creating AWS Budgets, we need to grant
budgets.amazonaws.com service the right to publish alerts to the SNS topic created earlier. We do this by copying the default SNS Topic policy and adding the necessary statements:
Finally, we can create budgets - for example we want to make sure our CloudWatch bill does not exceed USD1000/month (or stays below USD35/day):
Note: We noticed AWS CloudWatch bills going over USD50/day after enabling certain DataDog AWS integrations, thus we sadly experienced the need for such a budget.
Other budgets may be for
EC2-Other (NAT Gateways, Data Transfer Costs, … ):
Note: The relation between the service name in the AWS Billing and Cost Management Console (i.e.
EC2-other) and the actual API name (i.e.
Amazon Elastic Block Store) is quite unclear. We have requested a mapping from our AWS Account manager which we will add to this article once received. For now we create test budgets in the console and export to CSV to get the actual API name of each service.
The advantage of defining these budgets in Terraform is that we can template and auto generate these budgets easily - we will see how to do this in a future blog post.
Slack Notifications for alerts
Slack is a very popular tool for startups and is often acted upon faster than email. This is the main reason we choose to feed the messages from the above SNS topics into relevant slack channels using an AWS Lambda function. If you are using an alternative to Slack, the below function body should be easily replaced and the rest of this section will remain a good reference.
For slack notifications we can use a very basic function from the terraform-aws-modules/terraform-aws-notify-slack public module.
First, get the function body - and fork it under
./functions/notify_slack.py as a good starting template. Notice that the function takes following environment variables:
SLACK_WEBHOOK_URL: Used as-is if
startswith‘http’, else the function will try to use KMS to decrypt the url first (see note on IAM permissions below)
SLACK_CHANNEL: The Slack channel to post notifications to
SLACK_USERNAME: The Slack user name to post notifications as
SLACK_EMOJI: The Emoji used when posting notifications to Slack
AWS_REGION: required if KMS encryption is used for the
Note: We do not use the upstream module because we re-use the same lambda function for many different SNS topics, for example SES Bounce notifications (which we will cover in a future blog post) and need to separate the lambda function from the topics we subscribe it to.
Next, we define a role for the lambda resource running the function and give it permissions to AWS Services it interacts with:
Note: If we encrypt the
SLACK_WEBHOOK_URL (which is a good idea as it is committed to git), we should also grant
KMS access in the above policy. See the repository referenced above for the exact permissions required.
Next, we define the AWS Lambda cloud resources:
And finally, with the Lambda function created, we subscribe it to the topic and give SNS the ability to invoke it:
In this article we have covered basic AWS account Bootstrapping with Terraform, focused on managing the Terraform state remotely, encrypted and as part of our main AWS Account. We have also highlighted the importance of AWS billing alerts and Budgets and provided an easy way to automatically create these with alerts going into Slack from the get-go.
With our AWS Account bootstrapped, we can now focus on using AWS Products to iterate quickly on product features while keeping control of our infrastructure and our bills.