Process S3 Events using AWS Lambda

Process S3 Events with AWS Lambda

The key question we are going to answer in this post is: How to process S3 objects without servers using AWS Lambda?

Amazon Simple Storage Service (S3) lets you store, retrieve, update, and version objects in the Cloud. Not only does it guarantee durability and high availability at low cost, but it also enables event-driven systems by notifying you whenever objects are created, updated, or deleted. This lets you connect S3 to SQS queues, SNS notifications, or AWS Lambda for custom processing.

Here we will learn how to process new image files uploaded into S3 with a Lambda Function. As a simple example, we will use Python to compress the input image and generate a gzipped version.

AWS Lambda is only one of three possible S3 Events Destinations, which also include SNS topics and SQS queues. A direct connection to Lambda is ideal when you require simple compute tasks to be executed on-demand. Alternatively, you may configure S3 to simply publish a new message on an SNS topic and use this single event to trigger multiple destinations, such as mobile push, SMS messages, HTTP endpoints and SQS queues.

Here we will see a simple scenario containing a single Lambda function invoked every time a new S3 Object is uploaded. Please note that the used model is non-stream based (i.e. async).

What type of events can be triggered?

There are only three main types of events, with a few sub-types depending on the API call used.

The three main event types are:

  • ObjectCreated - Both creation and update of an S3 Object (Put, Post, Copy or CompleteMultipartUpload)
  • ObjectRemoved - Deletion, batch-deletion or versioned object marked for deletion (Delete or DeleteMarkerCreated)
  • ReducedRedundancyLostObject - RRS storage class object loss

Note that the events configuration must be setup on the S3 Buckets and that you won’t receive notifications for failed operations.

In the next steps, you will create a new S3 Bucket and connect its ObjectCreated events to AWS Lambda.

Create an S3 bucket

You can create an S3 bucket using the S3 management console. As with many other AWS services, you can also use the AWS API or CLI (command line interface) as well. This lab uses the AWS console however.

Implement a Lambda Function to process S3 events

Now that you have a bucket ready to use, you can create a new Lambda Function and implement the processing logic.

The goal is to create a compressed version of every image uploaded into the S3 bucket.

You will save the original images with a prefix /images/ and the compressed objects with a prefix /zip/. You can think of these prefixes as S3 folders within your bucket, which will allow us to avoid conflicts and infinite recursions. Indeed, if you listen to every creation event to generate new compressed files, you may end up with infinite loops. If new images are always uploaded to one folder, compressed and saved in another folder, then the looping error is avoided. In this simple case, prefixes are the best way to keep the Lambda Function simple and avoid problems.

Create a Function

Enter the following in the Function code:

import os, zipfile, StringIO
import boto3

s3 = boto3.client('s3')

def lambda_handler(event, context):

    # read bucket and key from event data
    record = event['Records'][0]['s3']
    bucket = record['bucket']['name']
    key = record['object']['key']

    # generate new key name
    new_key = "zip/%s.zip" % os.path.basename(key)

    # read the source obj content
    body = s3.get_object(Bucket=bucket, Key=key)['Body'].read()

    # create new obj with compressed data
    s3.put_object(
        Body=compress(body, key),
        Key=new_key,
        Bucket=bucket,
    )

    return "OK"

def compress(body, key):
    data = StringIO.StringIO()
    with zipfile.ZipFile(data, 'w', zipfile.ZIP_DEFLATED) as f:
        f.writestr(os.path.basename(key), body)
    data.seek(0)
    return data.read()

The lambda_handler function will take care of extracting the S3 Object information from the given event data. Then, it will read the object’s body, compress it (in memory) and upload a new S3 Object with the same name into the /zip/ folder. The second function is just a simple Python utility to compress an input string with the ZIP algorithm.

Now add a trigger, go to the Triggers box and select S3 from the triggers list:

In the next step, you will test the trigger and verify that zipped files are created whenever a png file is uploaded in the /images/ folder.

Test the trigger

You will now quickly test that your trigger is working properly. Switch to S3 again, click on your bucket name and create a new folder named images by clicking the + Create Folder button and entering images in the New folder field. Then click Save.

Upload an image to this new folder.

The AWS Lambda invocation will be completely transparent and you will not notice anything until you examine S3 further.

If everything is correctly configured, you will be able to inspect the bucket root and find a new zip folder with a new file inside (image.png.zip):

And the compressed file

Summary

This post shows how we can process S3 events using AWS Lambda. A simple, yet powerful way to perform some data cleaning/pre-processing tasks when new objects get uploaded to S3.