Protecting S3 Assets with EC2 Security Groups

Recently I completed a project using Tampermonkey user scripts, and I needed to host them somewhere that our team’s browsers could access.  S3 is an obvious choice to host static files but I wanted to limit access to certain IP addresses.  We maintain several existing EC2 Security Groups and I really wanted to reuse those, but—as the name suggests—EC2 Security Groups are for EC2 instances.  There is no easy way to use them to protect an S3 bucket.

In fact there seems to be no easy built-in way to put any type of HTTP/S-based authentication on an S3 bucket.  But after some research and tinkering I found a way we can add an authentication method of our choice.

Before we go any further I want to acknowledge and thank Douglas Duhaime for his excellent blog post which was a great guide to get started with the AWS side of things.  This Labs Note builds on his post because some things have moved around and changed names in the AWS console, and also I will highlight some frustrating issues I ran into to hopefully save you some time.  And finally, I’ll be adding code to do IP-based authentication using EC2 Security Groups.

Hopefully in in a few years someone else will write an updated, even more helpful blog post using this one as a jump-off.  🙂

Overview

You can easily make files in S3 available via HTTPS, but that makes them public.  We need to shim something between S3 and the public internet to handle authentication.

CloudFront, AWS’s CDN, can sit between the public internet and your S3 bucket.  A very brief overview:  CloudFront resources are called “Distributions”, which can be configured to get content from one or more sources (called “Origins”) like S3, and optionally define actions to take when handling requests (“Behaviors”).  One of the options for Behaviors is AWS Lambda—AWS’s server-less “functions in the cloud” facility.

In Lambda you can write functions in various languages (Python, Ruby, Node.js Go, Java, and .NET) and then hook things into those functions.  You can configure one or more Lambda “Triggers”—from both AWS and third-party services—that call your function and pass “event” data to it.  After your function does its thing, it pushes the result to the “Destinations” that you configure.  OR, instead of using Triggers/Destinations you can use another AWS feature called “[email protected]” which pushes your Lambda function into CloudFront to handle requests there.

So the browser hits CloudFront, CloudFront calls your Lambda function via [email protected], and then your function returns a result back to CloudFront telling it how to proceed.

Seems easy enough, but if you’re unfamiliar with these AWS services there is a bit of a learning curve.  And unfortunately there are a lot of examples floating around the internet that are completely broken.  And some of AWS’s error message are less than helpful, if not completely misleading.

So let’s dig in.

Create an S3 Bucket

This is the easy part.  Create the bucket as desired and make sure to BLOCK Public Access and DISABLE Static (public) website hosting, because it doesn’t matter what lock you use on the front door if the back door is wide open.

For this example, we’ll call our bucket “protected-bucket”.

Create a CloudFront Distribution

  1. Hop over to the CloudFront console and click “Create distribution”.
  2. In the “Origin” section:
    • Set “Origin domain” to your S3 bucket (something like “protected-bucket.s3.us-east-1.amazonaws.com”)
    • For “S3 bucket access” select “Yes use OAI (bucket can restrict access to only CloudFront)”
    • Click “Create new OAI” and use the suggested name
    • For “Bucket policy” select “Yes, update the bucket policy”
  3. In the “Default cache behavior” section:
    • Keep “Cache policy and origin request policy (recommended)” selected
    • Under “Cache policy”, you have a decision to make.  The Cache policy affects how CloudFront will cache the assets before refreshing them from S3.  The default policy is “CachingOptimized” which will cache your content for 24 hours.  Since the JavaScript I’m hosting is only fetched by a few dozen people and I want them to always have the latest changes, I chose “CachingDisabled”.  Choose what works best for your situation.
  4. Scroll to the bottom and click “Create distribution”.

Create an IAM Role

Go to the IAM console and click “Roles” on the left, then “Create role”.

  1. Select Lambda from the list of services that the role will use and then “Next: Permissions”
  2. Search for the “AWSLambdaExecute” role, select it, and click “Next: Tags”
  3. Add any tags that you’d like and then click “Next: Review”
  4. Name your role something like “protected-bucket-lamba-role”
  5. Hit “Create Role”

Click the newly created Role from the list and then:

  1. Click “Add Inline Policy”
  2. Select the “JSON” tab and then add the following policy to allow this role to describe your EC2 Security Groups:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeSecurityGroups"
            ],
            "Resource": "*"
        }
    ]
}
  1. Click “Review Policy”
  2. Give it a name like “EC2-DescribeSecurityGroups” and click “Create policy”
  3. Go back on your Role’s Summary view click the “Trust relationships” tab
  4. Click “Edit trust relationship”
  5. In the Service list, append “edgelambda.amazonaws.com” so that is looks like this:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "lambda.amazonaws.com",
          "edgelambda.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
  1. Click “Update Trust Policy”

Create Your Lambda Function

Head over the Lambda control panel and click “Create function”

  1. Use the first option: “Author from scratch”
  2. Add a name like “protected-bucket-lambda”
  3. Select Python as the Runtime
  4. Under “Change default execution role”, select “Use an existing role” and then select the IAM role that you created (something like “protected-bucket-lamba-role”)

Click on your function to open it up.  Note that under “Function overview” at the top, you will NOT have a Trigger or Destination — instead of these traditional input/output Lambda endpoints you will be using [email protected] so that your function runs and responds to CloudFront requests.  More on that later.

At this point you’re just missing a small detail:  the actual code.  But before we get to copy/pasting that, here are some very helpful tips that might save you a several hours of repeatedly applying your head to your desk.

Lambda and [email protected] Tips and Tricks

1. You need to “Deploy” your changes before you “Test” them, so before you click the “Test” button in the console you must click the “Deploy” button.  But you also need to Save your changes before you can Deploy them.  So to test your changes, you need to do three things:

  • Command-s to Save
  • Hit the Deploy button
  • Hit the Test button

… every … single … time.  It get’s old pretty quick, but alas, it is The Way.

2. Since you’re using [email protected], the “Deploy” button does NOT do what you think it does!  To actually deploy your code to CloudFront, you need to click the “Actions” dropdown in the upper-right of the Lambda function’s screen, and then under “Capabilities” select “Deploy to [email protected]”.

The FIRST time you deploy to [email protected], you must:

  • Select “Configure new CloudFront trigger”
  • Select the CloudFront Distribution from the fist drop down (using the random looking ID)
  • Leave “Cache behavior” as “*”
  • Select “Origin request” as the “CloudFront event” — this is important!
  • You can leave “Include body” un-checked

Although you’re doing this in the Lambda console, this will create the CloudFront trigger and configure your CloudFront Distribution to trigger your Lambda function upon receiving client’s request.

To re-deploy changes AFTER the first time you’ve deployed to [email protected]:

  • Select “Use existing CloudFront trigger on this function”
  • Select the existing trigger from the dropdown

After you hit “Deploy” on the “Deploy to [email protected]” modal dialog, jump over to your CloudFront Distribution’s page and you’ll see “Last modified:  (…) Deploying”.  And then you wait.

And wait.

And wait.

In my experience this can take anywhere from three to ten annoying minutes.  So you’ll definitely want to do as much testing as possible using the “Test” button in the code console, and only deploy for final testing when you’re confident things are working.

3.  To use the “Test” button, you must create Test Objects that mock CloudFront’s events

Lambda has no idea that you’re making a function for [email protected]

When your function is triggered by CloudFront, your Python Lambda function receives two arguments:  event and context.  If you’re using Node.js it receives three:  event, context, and callback.

When you trigger your Lambda function using the “Test” button, you need to mock those arguments.

event

The `event` object is a dict (or associative array) representing the main input from the trigger.  If you were consuming a message from a SQS message queue, it would contain that message and some metadata.  Since your function is going to handle [email protected] requests, your `event` in production will be the structure outlined here (https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-event-structure.html).  Bookmark that page!

Click the “Test” tab above the editor (not the button — the tab between “Code” and “Monitor”).  This is where you can define Test event objects that will be passed in as `event` when you hit that button.  You can have multiple Test events (say, one with an allowed client IP and one with a denied client IP) and then select which one you want to use by clicking the down arrow on the Test button.

The default Test Event is a very useless boilerplate object (event.key1 == “value1”, event.key2 == “value2”, and event.key3 == “value3”).  You’ll want to replace that with something that follows the [email protected] event structure.  You don’t need to mock the entire thing, but for example if you need to access the client’s public IP address you will need to at LEAST have a “Test event” structure that includes:

{
  "Records": [
    {
      "cf": {
        "request": {
          "clientIp": “1.2.3.4”
        }
      }
    }
  ]
}

Note:  While I was researching how to access the client IP, I found numerous examples saying to use `event[‘requestContext’][‘identity’][‘sourceIp’]`.  I don’t know what that is (maybe an older version of the API?), but as far as I can tell that is completely incorrect.  The `event[‘Records’][0][‘cf’][‘request’]` path is what works for me as of February 2022.

context

This object provides methods and properties that provide information about the invocation, function, and execution environment.  You can find it’s docs here: https://docs.aws.amazon.com/lambda/latest/dg/python-context.html.  You might want to bookmark that page, but so far I haven’t felt the need to use anything in that object.

callback (node.js only)

This is a function that you can call in non-async handlers to send a response. The callback function takes two arguments: an Error and a response.

4. Understand what your function must return to interface with CloudFront

Your handler function will return a dict (or associative array) representing either:

  • The original or modified request object (i.e., `event[‘Records’][0][‘cf’][‘request’]`), or
  • A new response.  

Your code may return the original, unmodified request object that was passed into it from CloudFront.  This will cause CloudFront to operate as if your Lambda function didn’t exist (my code does this to pass the request along to S3 after successfully authenticating the client).

Your code may also modify the original request object and then return that.  For example, you might rewrite the `querystring`, `uri`, add or remove `headers`, etc. before passing the request onto S3.

If you return a new response instead, it must minimally include a `status` (ex. `{ ‘status’: 403 }`), but you can also include headers, a body, etc. to tailor the response.

5.  “Caching” data between requests

As mentioned by Jason Byrne:

Amazon will actually re-use the same instance between requests. When Amazon retrieves a Lambda from cold storage, it creates a small instance where this function runs. There it stays in warm storage for some period of time until either the requests stop and AWS returns it to cold storage or more requests come in and it gets scaled out to multiple instances.

While you obviously can’t rely on this, it could be handy when you need to do something like fetch data from an external API and you want to cache them for some time.

6. Annoyances I Ran Into

I got really tripped up by googling for examples that didn’t work at all.  I don’t know if something changed with the Lambda API or if there are nuances that depend on how Lambda is being used.  But this was a pretty frustrating journey—especially with how long it takes to push code out to CloudFront via [email protected] and how useless some of AWS’s error messages are.  So hopefully these tips will save you a lot of time:

a. Mock your `event` object so you can use the super fast “Test” button instead of having to deploy to CloudFront to test every change.  None of the blog posts I came across covered that — they just posted code examples that worked in production.

b. If you’re returning a new response, you MUST return `status` — NOT `statusCode`.  I saw `statusCode` in numerous examples, which is wrong and was annoying to track down due to lack of error details from CF/Lambda.  Thank you, StackOverflow.  https://stackoverflow.com/questions/69843453/aws-lambdaedge-viewer-request-fails-with-the-body-is-not-a-string-is-not-an-o 

c. If you’re getting an error from Lambda in your browser when hitting CloudFront that says “The Lambda function result failed validation: The body is not a string, is not an object, or exceeds the maximum size.” …this error is 100% trash.  Amazon REALLY needs to update it to say “The Lambda function result returned a bad response” and link you to the documentation that details the response object’s format.  For example, I received this error when I was using “statusCode” instead of “status”.  That has nothing to do with the “body”.

d. If you’re specifying headers in a response, you CANNOT return a simple flat dictionary with string values—like I saw in several examples—but an object like this:

{
    'Content-Type': [
        {
            'value': 'application/json'
        }
    ],
}

Thankfully, here Lambda has a more acceptable error message regarding headers:  “The Lambda function returned an invalid entry in the headers object: Each header entry in the headers object must be an array.”

e. The generic “500 error” response from CloudFront when your Lambda function raises an unhandled exception is “The Lambda function associated with the CloudFront distribution is invalid or doesn’t have the required permissions.”

f. I saw examples where the “body” value being returned was a string of a string (i.e., `”foo”` instead of `foo`).  For example:

return {
    'status': 403,
    'body': json.dumps("Forbidden").encode('utf8')
}

… you can do that, but you don’t have to.  You can just set “body” to “Forbidden” and move along.

Lambda Code Examples for [email protected]

The implementation is only limited to your imagination.  You could limit access to start/stop at specific dates and times to coordinate with your marketing team, require some validation of the querystring provided by the client, make something accessible on days when there’s a full moon, or even hook into your organization’s single sign-on service.

In his blog post, Douglas Duhaime provides this node.js example for implementing HTTP Basic Authentication in front of an S3 bucket:

exports.handler = (event, context, callback) => {

  // Get the request and its headers
  const request = event.Records[0].cf.request;
  const headers = request.headers;

  // Specify the username and password to be used
  const user = 'user';
  const pw = 'password';

  // Build a Basic Authentication string
  const authString = 'Basic ' + new Buffer(user + ':' + pw).toString('base64');

  // Challenge for auth if auth credentials are absent or incorrect
  if (typeof headers.authorization == 'undefined' || headers.authorization[0].value != authString) {
    const response = {
      status: '401',
      statusDescription: 'Unauthorized',
      body: 'Unauthorized',
      headers: {
        'www-authenticate': [{key: 'WWW-Authenticate', value:'Basic'}]
      },
    };
    callback(null, response);
  }

  // User has authenticated
  callback(null, request);
};

This is okay for some purposes, but I really don’t like using shared secrets.  And since my desire is to only allow certain IP addresses to access my content, using our existing Security Groups makes a lot of sense.

So here’s the code I came up with:

import boto3
import datetime
import json

vpc_id = 'vpc-012345abcdef'
security_group_names = [‘allow_https_from_trusted_ips’]

ip_ranges = []
ip_ranges_fetched_at = datetime.datetime.fromtimestamp(0)
ip_ranges_ttl = datetime.timedelta(minutes=10)

response_headers = {
    'Content-Type': [ { 'value': 'text/plain' } ],
}

def ip_to_int(ip):
    o = map(int, ip.split('.'))
    o = list(o)
    res = (16777216 * o[0]) + (65536 * o[1]) + (256 * o[2]) + o[3]
    return res

def ip_in_subnet(ip, cidr):
    ip_network, mask_length = cidr.split('/')
    mask_length = mask_length and int(mask_length) or 32

    ip_int = ip_to_int(ip)

    mask_length_from_right = 32 - mask_length

    ip_network_int = ip_to_int(ip_network)
    bin_string = '{0:b}'.format(ip_network_int)
    
    chop_amount = 0
    for i in range(mask_length_from_right):
        if i < len(bin_string):
            chop_amount += int(bin_string[len(bin_string)-1-i]) * 2**i

    min_val = ip_network_int - chop_amount
    max_val = min_val + 2 ** mask_length_from_right - 1

    return min_val <= ip_int and ip_int <= max_val

def get_ip_ranges():
    global ip_ranges
    global ip_ranges_fetched_at

    now = datetime.datetime.now()

    if ip_ranges is not None and ip_ranges_fetched_at > now - ip_ranges_ttl:
        print('Using cached ip_ranges')
    else:
        print('Refreshing ip_ranges')
        ec2 = boto3.client('ec2')
        ip_ranges = []
        response = ec2.describe_security_groups(Filters=[
            {
                'Name': 'vpc-id',
                'Values': [ vpc_id ]
            }, {
                'Name': 'group-name',
                'Values': security_group_names
            }
        ])
        for sg_dict in response['SecurityGroups']:
            for perm_dict in sg_dict['IpPermissions']:
                if perm_dict['FromPort'] in (80,443) and perm_dict['IpProtocol'] == 'tcp':
                    ip_ranges += perm_dict['IpRanges']

        ip_ranges_fetched_at = now

    return ip_ranges
    
    
def lambda_handler(event, context):
    try:
        request = event['Records'][0]['cf']['request']

        for ip_range in get_ip_ranges():
            if ip_in_subnet(request['clientIp'], ip_range['CidrIp']):
                print("ALLOW: Client is coming from known IP %s (%s)" % (request['clientIp'], repr(ip_range['Description'])))
                return request

        print("DENY: Client is coming from unknown IP (%s)" % request["clientIp"])

        return {
            'status': 403,
            'statusDescription': 'Forbidden',
            'body': 'Forbidden',
            'headers': response_headers
        }

    except Exception as e:
        print("ERROR: %s (%s)" % (e.__class__.__name__, str(e)))
        return {
            'status': 500,
            'statusDescription': 'Internal Server Error',
            'body': 'Internal Server Error',
            'headers': response_headers
        }

This is mostly self-explanatory, but here are a few key points:

  • `security_group_names` is the list of security group names that you want to use to determine allowed IP addresses.  You’ll obviously need to change that to names of YOUR Security Groups.  Note that the script only adds the IP range to the allow list if it’s for TCP 80 or TCP 443.
  • As Jason Byrne points out, with Lambda it’s possible to define and re-use global vars (but that’s kind of a side effect of how Lambda works—not a feature—so you can’t rely on it).  Instead of having to hit the EC2 API on every single request to define the Security Groups we care about, we’re caching them in the `ip_ranges` global for the `ip_ranges_ttl` timedelta (10 minutes).  This speeds up successive requests a bit.
  • If the client’s IP is in ip_ranges, the original request is returned to CloudFront (and it does it’s normal thing: returns the resource from S3 to the client)
  • If the client’s IP is not in ip_ranges, the function returns a new response:  403 Forbidden
  • If an exception is raised, the client logs the details and returns a new response:  500 Internal Server Error

So that’s it! I hope you found this useful and it saves you a lot of time.

 

All third-party trademarks referenced by Cofense whether in logo form, name form or product form, or otherwise, remain the property of their respective holders, and use of these trademarks in no way indicates any relationship between Cofense and the holders of the trademarks.