My first experience with AWS Step Functions
Step functions are the way to coordinate components of a distributed application. Every component performs a well defined, separated task where output becomes an input of other tasks. Step functions consist of State Machines.
Amazon States Language
Amazon State Language is JSON-based language for defining state machines used by Step Functions. It is a collection of states leading execution from start state (can be only one) to very last state (many states can be the last one). Example of ASL
{
"Comment": "First state machine",
"StartAt": "HelloSt",
"States": {
"HelloSt": {
"Type": "Pass",
"Result": "Hello",
"Next": "World"
},
"WorldSt": {
"Type": "Pass",
"Result": "World",
"End": true
}
}
}
Above state machine will start on HelloSt
state and pass result "Hello"
to the WorldSt
state input. Below is the graphic representation of this ASL:
Conditional statements are also supported:
{
"Comment": "First state machine",
"StartAt": "CheckName",
"States": {
"CheckName": {
"Type": "Choice",
"Choices": [
{
"Not": {
"Variable": "$.name",
"StringEquals": ""
},
"Next": "HelloName"
},
{
"Variable": "$.name",
"StringEquals": "",
"Next": "HelloWorld"
}
]
},
"HelloName": {
"Type": "Pass",
"Result": "Hello",
"End": true
},
"HelloWorld": {
"Type": "Pass",
"Result": "World",
"End": true
}
}
}
First state CheckName
will decide, based on the name
input parameter, to which next state flow should be directed. Above code will produce state machine:
For more information visit the official documentation of Amazon States Language.
The use case I was interested most
I was most interested in a use case like
Send an email to the customer after defined time passed since some fact occurs
I see a nice use case for Wait
state type here. Wait state delays state machine from continuing for a specified time - it can be a defined number of seconds, fixed timestamp, it can also be passed as a call parameter. The state machine will be very simple
{
"Comment": "Send an incentive email",
"StartAt": "Wait",
"States": {
"Wait": {
"Type": "Wait",
"TimestampPath": "$.scheduledAt",
"Next": "SendEmail"
},
"SendEmail": {
"Type": "Task",
"Resource": "arn:aws:lambda:eu-central-1:074085690123:function:send-incentive-email",
"End": true
}
}
}
I choose a way to provide the exact time when WaitForSchedule
state should pass execution further to SendEmail
state by using TimestampPath
attribute in state definition. Now I can invoke a state machine with an example event
{
"scheduledAt": "2019-12-31T08:13:00Z", # This is when state will transit to `SendEmail`
"locale": "en",
"airline": "Lufthansa",
"departure_airport": "Frankfurt",
"arrival_airport": "Gdańsk",
"compensation": "600"
}
Example execution trigger can be implemented, with an official SDK, like below
require 'aws-sdk-states'
require 'json'
client = Aws::States::Client.new
# assuming there is only one state machine defined
state_machine_arn = client.list_state_machines.state_machines[0].state_machine_arn
client.start_execution({
state_machine_arn: state_machine_arn,
input: {
scheduledAt: (Time.now.utc + 60).strftime('%Y-%m-%dT%H:%M:%SZ'),
locale: "en",
airline: "Lufthansa",
departure_airport: "Frankfurt",
arrival_airport: "Gdańsk",
compensation: 600
}.to_json
})
Execution can be named. Such name must be unique in the scope of state machine - sounds like a good way to achieve exactly one execution.
SendEmail
state can be extended by Retry
(max attempts and exponential backoff supported!) and Catch
attributes. Both limited to particular ErrorType
.
Pricing
Free Tier includes 4000 state transitions. To make it simple, it is a count of edges between states. Sending delayed email required three state transitions to perform. Every next 1000 state transition costs $0.025.
Summary
Step functions look quite interesting, they were introduced in December 2016 and I haven’t played with them since today. Step function may look very limited and simple at first, but I can imagine some big state machines orchestrating complex execution logic.
State machine can be executed
- via API action (like in the above example)
- CloudWatch events (haven’t tried)
- Amazon API Gateway (abstraction over abstraction?)
- from other State Machine (we need to go deeper …)
What state machine can play with
- Lambda functions (like in the above example)
- DymamoDB (read & write)
- SNS (publishing message)
- SQS (putting a message into the queue)
- some other AWS services I haven’t heard of
Where I find it useful
- data processing and ETLs
- delaying Lambda execution (like in the above example)
- kind of continuous integration with Activities?
What I haven’t touched
State function offers also tasks type Parallel
and Map
. First can execute many tasks at once, for example sending email and sending SMS at the same time (duration savings). Second can execute the same step for every item in input array concurrently (duration and state transition savings).
All of “real executors” - Task
, Parallel
and Map
offers the Catch
and Retry
options. In my opinion, Catch
should be used more like workflow fallback in case of error instead of handling errors inside Lambda Function. Retry
is … well, retry.
Every step in workflow and workflow itself can have defined limit after which move error (no output from state/workflow) or to a fallback defined state.
Any logging capabilities of workflows execution.
Activities - special kind of workflows that allow you to execute any task on Amazon EC2, Amazon Elastic Container Service, mobile device, …
Pros
- relatively easy to start
- terraform support
- simple and express workflows
- maximum execution time for workflow is 1 year
- Express workflows looks fast, according to documentation event rates greater than 100,000 events per second
- tooling inside AWS Console is nice, allow to execute and check when which state was executed with inputs and outputs
- Lambda functions can be independent, without dependency on each other, without referencing other resources, pure input and output
Cons
- pricing for complex and long-running workflows
- can’t be executed by SNS event
- workflow source code inside Terraform
- while there is some linter for Amazon State Language, I find missing test tool as a drawback
- lack of scheduling vs. scheduled Lambda functions