This content is part of the Conference Coverage: Your guide to AWS re:Invent 2017 news and analysis

Properly match an AWS job scheduler to a project

AWS Step Functions, AWS Batch and Amazon Simple Workflow Service manage sequential jobs. Which fits your enterprise needs? Determine project priorities before choosing a service.

Developers working in AWS face a variety of challenges when they must create and manage multiple jobs in sequence....

Three job schedulers are available in AWS: Step Functions, Simple Workflow Service and AWS Batch. Each has pros and cons, so enterprise IT must determine which is ideal for each specific project.

The AWS Step Functions service enables developers to process complex operations that run through multiple functions sequentially. Step Functions link Lambda functions to custom logic, including retry logic for any given step, which enables asynchronous requests to process.

A similar service called Amazon Simple Workflow Service (SWF) processes several functions in sequence. SWF manages individual jobs, tracks the execution state and handles retry logic. Instead of running on Lambda, as Step Functions does, SWF runs on standard Elastic Compute Cloud (EC2) instances -- or any location in which you want to run a worker process. SWF handles the state, but requires developers to manage workers.

The AWS Step Functions service enables developers to process complex operations that run through multiple functions sequentially.

With AWS Batch, developers can run complex commands across multiple instances. Unlike with Step Functions or SWF, Batch runs multiple EC2 instances automatically in Docker containers. Developers can configure AWS Batch to automatically run Spot Instances or On-Demand Instances, according to need.

As a rule of thumb, large Elastic MapReduce-style tasks work best on AWS Batch; most other functionality can run on Step Functions. Some developers, however, prefer to use the more familiar and controlled environment in SWF.

Using Step Functions

AWS Step Functions basically runs Lambda functions, so developers familiar with Lambda should feel right at home with the service. Step Functions is commonly used to handle error logic. For example, a user requests an export of a search result. With AWS Lambda, this would be an asynchronous call to a Lambda function, which would then email the results of the search to the user. If that search didn't return quickly enough, or if the user didn't have the permissions required to perform the update, the Lambda function would fail to return any sort of message.

Developers use AWS Step Functions to manage the request and to process any errors from the Lambda function that performs the file export. When an error occurs, Step Functions determines if it's a user error, such as an invalid input or a lack of permissions, or a system error, which will retry later. By using retry logic, Step Functions uses an exponential backoff method to retry a request later or to send an error message to a developer if it can't complete the export.

Using SWF

Amazon Simple Workflow Service works in the same manner as Amazon Simple Queue Service (SQS). SWF is split into Domains, each of which can have multiple Activities. Each Activity is a queue that SWF manages; developers register any number of workers within that Activity to process requests. Developers submit requests to Activities as they would in SQS, except SWF also manages the wait for a worker response, and will allow retry logic and error checking and handling.

SWF's biggest advantage over Step Functions is that it runs code that can't run on Lambda; this includes machine learning tasks that require external libraries and processes that require more than a few minutes to run, as these processes would cause a Lambda function to time out. SWF also enables developers to scale workers more directly, so they can throttle requests to prevent overloading systems that take the output.

Quiz: VMs vs. containers: Do you know the difference?

Containers and VMs represent different ways of abstracting resources, have their own use cases and present different challenges. How well do you know these technologies? Take this quiz to find out.

SWF works in conjunction with Auto Scaling, enabling the system to dynamically handle additional loads. It allows work with Spot Instances for non-mission-critical tasks.

The downside of SWF is that it's not serverless, so developers still need to manage EC2 instances. Additionally, they must run a separate dispatcher with SWF to manage jobs, as well as the complex linking logic that Step Functions supports.

Using AWS Batch

AWS Batch processes long, complex operations, such as financial forecasting or genome sequencing. The process of running an AWS Batch job is similar to that needed to set up Auto Scaling for EC2 instances. AWS Batch automatically manages EC2 instances for developers and accepts jobs submitted to a queue similar to SQS. This eliminates the need to manage infrastructure while handling jobs sequentially.

AWS Batch also makes it easy to set up jobs on Spot Instances. Therefore, low-priority requests run when extra capacity is available, which can reduce costs for enterprises that need to run less critical tasks.

Developers specify how many virtual CPU units an individual Docker instance needs -- just like with EC2 Container Service (ECS) -- and AWS automatically chooses the proper instance size for a job. This makes AWS Batch ideal for tasks that take a longer period of time or that require more resources than a Lambda task can handle.

Next Steps

Step Functions tops list of new AWS developer tools

Visualize AWS Lambda deployments

Should you choose ECS for container management?

Dig Deeper on AWS instances strategy and setup