AWS Step Function to create a state machine

Introduction to AWS Step Functions

AWS Step Functions is a web service that enables you to coordinate the components of distributed applications and microservices using visual workflows.

You build applications from individual components that each perform a discrete function, or task, allowing you to scale and change applications quickly. Step Functions provides a reliable way to coordinate components and step through the functions of your application. Step Functions provides a graphical console for visualising the components of your application as a series of steps. It automatically triggers and tracks each step and retries when there are errors, so your application executes in order as expected, every time. Step Functions logs the state of each step, so when things do go wrong, you can diagnose and debug problems quickly.

Step Functions manages the operations and underlying infrastructure for you to ensure your application is available at any scale.

In this image, you can see the graphs that achieve the example described below.

With Step Functions, you are able to easily coordinate a complex process composed of different tasks. For example, if you have an image and you need to convert it into multiple formats, scale in different resolution and analyse it with Amazon Rekognition, you can split this process into single and atomic tasks represented by Lambda Functions and execute them in parallel. After that, you can have another function that checks the result of this process.

Without using this service you have to coordinate yourself each Lambda Function and manage every kind of errors in all steps of this complex process.

AWS Step Functions is a useful service for breaking down complex processes into smaller and easier tasks.

In this post, you have an example real-world-scenario that you will build a solution to using AWS Step Functions. You will write small, atomic and reusable Lambda Functions, you can reuse them in several places:

  • in different flows
  • as an endpoint behind API Gateway
  • as a single function in response to an event

The details of your specific scenario follow.

Step Function tasks

Suppose that we are developing a solution that combines Lambda and Step Functions for a fictitious gaming application. Assume that we have already developed our game using Lambda, DynamoDB and other AWS services. What is missing are several housekeeping tasks when a user completes a level or the entire game.

This process seems easy but consists of the following different tasks:

  • Generate a report for the last completed level
  • If the user ends a level, update the CompletedLevel table
  • If the user ends the entire game, update the CompletedGame table
  • Log a metric to CloudWatch

To accomplish this you need to create a state machine. All state machines in AWS are defined using the Amazon States Language.

Amazon States Language is a JSON-based language used to describe state machines declaratively.

In the AWS Console, you can build the workflow and visualize it through a flowchart. The screenshot below represents your flow:

The JSON necessary to create this chart is shown below and you will analyze it in greater detail in the following steps:

As you can see this is not a linear flow. Thankfully, AWS Step Functions provides different kinds of steps.

Explaining what we are going to build with Step Functions

For our example scenario we have 3 main tasks to perform:

  • Generate a report
  • Update DynamoDB tables
  • If everything goes right, put a metric on CloudWatch.

The first 2 are completely independent tasks, whereas the last one must be executed only when the previous tasks are completed. In order to minimize the amount of time needed to complete the entire process, you will generate the report and update the database in parallel. Of course, parallel tasks are possible here because the tasks are not dependent upon each other.

Create a parallel Step Type

Let’s start describing the Parallel task.

The JSON snippet below shows how you can define a task of type Parallel along with its branches:

{
     "Comment": "An example of the Amazon States Language using a parallel state to execute two branches at the same time.",
     "StartAt": "StartTask",
     "States": {
        "StartTask": {
            "Type": "Parallel",
            "Next": "CWMetric",
            "Branches": [...]
        }
    }
}

The JSON template to define a flow is composed of different fields and each one defines a different aspect of the flow that you are building.

The StartAt field defines the first state where the flow begins and the States object contains the definitions of all reachable states.

Going back to our example, it starts from the StartTask state, this is a Parallel state (as you can see in its Type property) and for this reason is composed of branches. This is a very powerful feature of Steps Functions because here you can define a sub-flow of tasks that could include all of kind of tasks you need. Also note that every branch has both the StartAt and the States property. In the former, you have to declare where the branch starts and in the latter every state of the branch.

Let’s start implementing our first task: Gen Report. Gen Report is a very simple one, In fact, its Type is Task. This means that it represents a Lambda Function and for this reason, you have to specify the function’s ARN in its Resource field. You have added another piece to your template, look at it below:

You have added another piece to your template, the branch object:

{
    "StartTask": {
        "Type": "Parallel",
        "Next": "CWMetric",
        "Branches": [
            {
                "StartAt": "Gen Report",
                "States": {
                    """ Here you can define all your branches """
                    "Gen Report": {
                        "Type": "Task",
                        "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME"
                    }
                }
            }
            """ UpdateDB branch goes here """
        ]
    }
}

In order to implement your flow you have to create a Lambda Function that executes this task. You will go to the Lambda Console next and finally write some code!

Create your Lambda Function: GenerateReport

Note: This Lambda will use an S3 bucket, before creating the Lambda function, you need to know what S3 bucket to use. Go to the S3 console and read the bucket name.

Go to Lambda and create a function:

Check Author from scratch and fill all fields as explained below:

  • Name: GenerateReport
  • Runtime: Python 2.7
  • Role: Choose existing role
  • Existing role: lambda_execution_role
import boto3
bucket_name = "BUCKET_NAME"
s3_client = boto3.client('s3')

def lambda_handler(event, context):
    level = event.get('level')
    user_id = event.get('user_id')
    score = event.get('score')
    max_score = event.get('max_score')

    report = 'Completed Level: %s\nMy Score: %s\%s\n' % (level, score, max_score)

    s3_client.put_object(
        ACL='public-read',
        Bucket=bucket_name,
        Key="%s_report_%s.txt" % (user_id, level),
        Body=report
    )

    return event

The first thing that you need to care about is: what is this function going to receive as the event parameter? In general, the next function receives as input what the previous one returned as output. However, the first function of the flow (like in this case) receives the event provided when the flow starts. At the end of the lab, you will see that when you are starting an execution you can provide a start event.

Looking at your code, you can see that this Lamba Function is very simple and only calls the s3.put_object API to upload your simple report.

Create a conditional Step Type

In the other branch of your parallel task, you need to update the DynamoDB tables.

In many scenarios, the flow is dependent upon a choice. In order to fulfill this requirement AWS Step Functions provides a Choice step.

Your second branch starts with the UpdateDB task and its type is choice. This means that when the flow arrives here a choice has to be made, and it is up to AWS Step Functions to make that choice. The next step that gets executed is based on that choice.

In the Choices field shown in the below tempate, you have to declare a series of comparisons. In your case, you have only two possibilities so you need to declare one comparison and use the Default field that explains which task to run in case no condition is satisfied. Your conditional step template is shown below:

{
    "StartAt": "UpdateDB",
    "States": {
        "UpdateDB": {
            "Type": "Choice",
            "Choices": [{
                "Variable": "$.level",
                "StringEquals": "latest",
                "Next": "Last Level"
            }],
            "Default": "Simple Level"
        },
        "Last Level": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-west-2:ACCOUNT_ID:function:EndsLastLevel",
            "End": true
        },
        "Simple Level": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-west-2:ACCOUNT_ID:function:EndsSimpleLevel",
            "End": true
        }
    }
}

A choice is composed of at least 3 fields:

  • Variable: the variable that needs to be evaluated, defined using JMESpath
  • Kind of comparison: AWS provides different kinds of comparisons, based on the type of the variable
  • Next: the name of the task that will be executed in case the comparison result is true.

Remember that only the first evaluation that is true is going to be executed.

In your case, you evaluate the $.level variable and with the comparison StringEquals you check if its value is “latest”. In this case, the Last Level task will be executed, otherwise, the Simple Level will be.

A very important thing is that to handle this kind of flow you need only to declare it in your template, no code, no lambda are required, thanks to AWS Step Functions.

Now you have to implement your functions to handle database updates. In this example you want to be completely serverless, so you use DynamoDB to store your information.

Create Lambda Functions: EndsSimpleLevel and EndsLastLevel

Next you are going to create two Lambda functions:

  • EndsSimpleLevel - This function will be called when the user ends a simple level, not the latest level.
  • EndsLastLevel - This function will be called when the user completes the last level.

Let’s start by creating the Lambda function: EndsSimpleLevel:

import time
import boto3

dynamodb_client = boto3.client('dynamodb')
table_name = 'CompletedLevel'

def lambda_handler(event, context):
    user_id = event.get('user_id')
    level = event.get('level')

    update_params = {
        "TableName": table_name,
        "Key": {
            "user_id": {
                "S": user_id
            }
        },
        "AttributeUpdates": {
            "last_level": {
                "Value": {
                    "S": level
                },
                "Action": "PUT"
            },
            "timestamp": {
                "Value": {
                    "S": str(time.time())
                },
                "Action": "PUT"
            }
        }
    }

    dynamodb_client.update_item(**update_params)
    return event

As you can see the code is pretty simple. It calls the update_item API on the CompletedLevel table and terminates the function returning the event received. As mentioned before, the result of each function is very important because it’s either what the next function will receive at the start of its flow, or the result of the entire execution.

Now the EndsLastLevel Lambda function:

import boto3
import time

table_name = 'CompletedLevel'
dynamodb_client = boto3.client('dynamodb')

def lambda_handler(event, context):
    user_id = event.get('user_id')
    total_score = event.get('total_score')

    put_params = {
        "TableName": table_name,
        "Item": {
            "user_id": {
                "S": user_id
            },
            "completed": {
                "BOOL": True
            },
            "timestamp": {
                "S": str(time.time())
            },
            "total_score": {
                "N": str(total_score)
            }
        }
    }

    dynamodb_client.put_item(**put_params)
    return event

This code is quite simple too. It calls the DynamoDB put_item API on the other table (CompletedGame) and terminates returning the event received.

Logging results to CloudWatch

The last Lambda function you need to create is the simplest one: it updates your CompleteLevel metric on CloudWatch. To handle it you can create a task that runs at the end of each branch and conditional steps.

This is the representation of this task:

"CWMetric": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:us-west-2:ACCOUNT_ID:function:PutMetric",
    "End": true
}

There is nothing special here, but the End field set to true indicates that after this function the flow ends.

Create Lambda Function: CompleteLevel

Create a new Lambda function will be executed after both branches in your first parallel step. Because it comes after a parallel step, this function will receive an array loaded with the results of all the previous functions.

import boto3
import json
cw_client = boto3.client('cloudwatch')

metric_name = 'CompleteLevel'

def lambda_handler(event, context):
    source_event = {}
    if len(event) > 0:
        source_event = event[0]
    time_played = source_event.get('time_played', False)    

    cw_client.put_metric_data(
        Namespace='FLOW',
        MetricData=[{
            "MetricName": metric_name,
            "Value": time_played,
            "Unit": "Seconds"
        }]
    )

    return True

Remember that the result of this function is going to be the result of your entire execution. Up until this point, each Lambda function returned the event parameter (which is used as input to the next step), this last step simply returns True. This will be the result of the entire execution.

In the next step, you will finally see how to both create and test this flow.

Here are the 4 Lambda functions that we created so far:

Creating a State Machine

You have created all the Lambda functions required and now you are ready to create your state machine.

Under Find services in the AWS Console, select Step Functions. Expand the left menu and click on State machines. Click on Create state machine.

AWS provides several different kinds of blueprints to help you to get started. After the lab, it is recommended you tinker around with several different blueprints and see what happens. In this lab, you have covered only a few types of steps.

The following JSON is entered to create the state machine:

{
  "Comment": "An example of the Amazon States Language using a Parallel and a Choice state to execute two branches at the same time.",
  "StartAt": "StartTask",
  "States": {
    "StartTask": {
      "Type": "Parallel",
      "Next": "CWMetric",
      "Branches": [
        {
          "StartAt": "Gen Report",
          "States": {
            "Gen Report": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:ACCOUNT_ID:function:GenerateReport",
              "End": true
            }
          }
        },
        {
          "StartAt": "UpdateDB",
          "States": {
            "UpdateDB": {
              "Type": "Choice",
              "Choices": [
                {
                  "Variable": "$.level",
                  "StringEquals": "latest",
                  "Next": "Last Level"
                }
              ],
              "Default": "Simple Level"
            },
            "Last Level": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:ACCOUNT_ID:function:EndsLastLevel",
              "End": true
            },
            "Simple Level": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:ACCOUNT_ID:function:EndsSimpleLevel",
              "End": true
            }
          }
        }
      ]
    },
    "CWMetric": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-west-2:ACCOUNT_ID:function:PutMetric",
      "End": true
    }
  }
}

Configure Service Role for AWS Step Function Service

The last thing you need to do is choose a name and configure the IAM Role that the AWS Step Functions service will assume to execute your state machine.

This role is a service role that will be assumed by the AWS Step Functions service, the only permission that this service needs is the lambda:invokeFunction permission.

Execute the State Machine

Finally, you have created your first state machine and now it’s time to run it and verify it’s working correctly.

As mentioned before, each execution needs an input that will be passed to the first task. Click the Start execution button and paste the JSON shown below. Note: it’s ok to leave the execution id with the random one AWS generates for you.

In this JSON all the params that your execution needs are reported. In fact, all the information about the user is included, which level has been finished and its related scores.

{
    "user_id": "12345678901234567890",
    "level": "latest",
    "score": 10,
    "max_score": 100,
    "time_played": 9885983982,
    "total_score": 10000
}

Click the Start Execution button at the lower-right of your console. In the upper section, you can see the overall Execution Details. On the boxes below you can see a flowchart that indicates in what state the process is in and the Step details. You can expand the Input, Output, and Exception sections you can see detail about the status of the task and what were the input and outputs used during execution. Error information can also be displayed here, which is really useful if any debugging efforts are needed.

Here is an example screenshot of the AWS console after successful execution of your state machine:

At the bottom of the page is a list of all transactions and events of this execution. Observe the list of all transactions and events. You can expand/collapse each transaction by clicking on the arrow adjacent to the ID number. Of course, every Lambda call is logged to CloudWatch as well.