The Easy Way to Run Cron Jobs on AWS
Scheduling repetitive recurring tasks through code is a very common requirement in software development. Typically this is done by configuring a cron job.
Cron will be familiar to anyone who has used an operating system similar to Unix e.g. Linux and macOS. It is a program designed to schedule tasks known as cron jobs which are entered as commands and stored in a file located at the path /etc/crontab
.
Cron jobs can be set up directly by system administrators, but it is often the case that software developers create them through code, typically through a library which abstracts away the underlying implementation and hides it behind a more programmer-friendly interface.
Common Use Cases
There are lots of reasons you might want to use a cron job.
Maybe you are running an online news service which hides subscriber-only content behind a paywall. A regular cron job might run in order to send reminder emails to users whose subscriptions are due to expire soon.
Another example might be an ecommerce business which would like regular updates on different item categories' sales performance. A daily cron job might run queries against the sales database, extract the data, compile the resulting data into a neatly formatted report, and then email the report to management. The same data might also be persisted to another database table as a materialised view or snapshot.
Other applications might be companies offering media content to users. Each item, be it audio, video or otherwise - might need to be encoded to different formats to accommodate different screen and platform requirements. These encoding jobs might be operated through a cron job.
Different programming languages and frameworks typically have at least one widely used library for cron jobs.
Two Cron Libraries
The Rails whenever gem simplifies task scheduling by allowing Ruby programmers to set up scheduled in a single schedule.rb
file. What is particularly useful here is that the syntax allows developers to hook into one of their models' instance methods.
Simple example code for whenever gem:
every 1.day, at: ['4:30 am', '6:00 pm'] do
runner "Mymodel.task_to_run_in_two_times_every_day"
end
Meteor.js, which is a client-server web framework running on Node.js, has the SyncedCron library which is itself based on the later library. This persists cron jobs to a MongoDB collection under the hood.
Simple example code for the Meteor SyncedCron package:
SyncedCron.add({
name: 'Crunch some important numbers for the marketing department',
schedule: function(parser) {
// parser is a later.parse object
return parser.text('every 2 hours');
},
job: function() {
var numbersCrunched = CrushSomeNumbers();
return numbersCrunched;
}
});
Both of these libraries are perfectly adequate solutions for their respective frameworks, although personally I find the cron syntax used by whenever to be slightly more intuitive.
Cron Cloud Challenges
Challenges can arise when task scheduling moves to a cloud environment where machines are running at scale. When there are a number of different servers in operation, how can we ensure that a given cron job is only executed on one of those servers instead of being run concurrently on all of them?
This is especially true considering that previous attempts to solve this problem have been rendered obsolete by changes in cloud services, for example the by whenever-elasticbeanstalk gem which is no longer a functional solution.
Nonetheless, there are of course solutions to the problem.
In an AWS environment, a simple shell script can be written in order to ensure only run server in an auto-scaling group will execute the cron job.
There are, however, some downsides to any solution using shell scripting:
- It assumes a certain familiarity with shell scripting and the Linux Command Line which you may not have in your team, particularly if you are a startup with little sysadmin or DevOps expertise
- It assumes a certain familiarity with the underlying cloud infrastructure, which you may not have or need if you are using a Platform as a Service (PaaS) offering like Heroku or Elastic Beanstalk
- It introduces an operational and maintenance overhead because the cloud API receiving the shell script's requests may itself be subject to changes in future
Luckily, there is an AWS service which makes scheduling cron jobs much easier for us.
AWS EventBridge
EventBridge is the new name for the service formerly called CloudWatch Events. The service is designed to link together different AWS services. If an event occurs in one service, EventBridge can be configured to respond to that event in various ways.
Quick Clarification By "event", we mean any particular activity in an AWS environment or custom application. In practice, an event might be a signal from CloudWatch that a metric has entered the "alarm" state, or alternatively it might be an API call to reboot an EC2 instance.
EventBridge lets users define rules, each of which have a trigger, as well as targets, which are destinations intended to process event information. Applications for EventBridge rules could include any of the following:
- Relaying the event information to another service in the AWS cloud
- Triggering a notification through Simple Notification Service or Simple Email Service
- Triggering an AWS API call
- Triggering an AWS Lambda function
It is the last item on this list to which we turn our attention. It turns out that EventBridge can run rules on a schedule, thereby making them self-triggering and entirely autonomous. What makes this so powerful is that it allows developers to run Lambda functions on a schedule, which can either be rate-based or cron-based.
Side Note For readers who aren't familiar with AWS Lambda, it is a Function as a Service (FaaS) product which allows developers to write snippets of code which can be run on the cloud without the need to understand or even provision the underlying infrastructure.
EventBridge Scheduling
Rate Expressions
Rate-based scheduling expressions are quite easy to grasp. They simply allow you to define an invocation interval for your EventBridge rules. For example, the following two rate expressions can be conceptualised in plain English as "run every day" and "run every 10 minutes". Please note that the first invocation begins upon creation of the rule and continues from that point on.
rate(1 day) // Run every day
rate(10 minutes) // Run every 10 minutes
Cron Expressions
Cron expressions are likely familiar to most software developers. They consist of six required fields, each of which is a unit of time: minutes, hours, day-of-month, month, day-of-week and year.
The valid range of values for each of these fields is quite self-explanatory, but it is possible to add precision to the month field using expressions like JUN-AUG
, which might indicate summer months. Likewise, a value of SAT-SUN
in the day-of-week field would indicate "weekend days".
Quick Point The dash (
-
) character in theJUN-AUG
andSAT-SUN
examples above is one of seven special characters that can be used in cron expressions. Other such values include:*
, the wildcard, which denotes "all values" of a field;,
, the comma, which enables the entry of multiple values, as inOCT,NOV,DEC
for the month field; the#
, which is a wildcard specific to the day-of-week field, where it can be used to denote a particular instance of the day of the week in a month, as in4#1
, which would mean "the first Wednesday of the month".
Sample cron expressions:
cron(0,13,*,*,?,*) // Run at 13:00 (UTC) every day
cron(0 9 ? * 6#2 *) // Run on the second Friday of every month at 09:00 (UTC)
cron(0/15,*,?,*,MON-FRI,*) // Run every 15 minutes every weekday (Monday to Friday)
cron(0,9,1,*,?,*) // Run at 9:00 am (UTC) on the 1st day of every month
In the examples above, a question mark (?
) indicates indifference to the value of that field. For clarification, the question mark in the final cron expression is used to indicate that the day-of-week value does not matter.
Cron expressions are both flexible and precise, with the obvious benefit that they do not require much code to define.
Cloud Formation Example
The following is an example of a simple Cloud Formation template to define a Lambda function invoked through a rate-based EventBridge rule in the YAML markup language. In the example, the Lambda function will be invoked every day at 13:00 (UTC).
This is a very simple template consisting of just four resources: an IAM role for the Lambda function; the Lambda function, which simply returns "Hello World!" as a string; an EventBridge rule with a cron expression; and an IAM role which allows EventBridge to invoke the Lambda function.
Cloud Formation Template for EventBridge cron job:
Resources:
HelloWorldLambdaRole:
Type: AWS::IAM::Role
Properties:
RoleName: HelloWorldLambdaRole
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
HelloWorldLambdaFunction:
Type: AWS::Lambda::Function
Properties:
FunctionName: HelloWorldLambdaFunction
Role: !GetAtt HelloWorldLambdaRole.Arn
Runtime: python3.7
Handler: index.hello_world_handler
Code:
ZipFile: |
def hello_world_handler(event, context):
message = 'Hello World!'
return message
ScheduledRule:
Type: AWS::Events::Rule
Properties:
Description: "Invocation schedule for the Hello World Lambda. It will execute the function at 13:00 UTC every day"
ScheduleExpression: cron(0 13 * * ? *)
State: "ENABLED"
Targets:
- Arn:
Fn::GetAtt:
- "HelloWorldLambdaFunction"
- "Arn"
Id: "TargetFunctionV1"
PermissionForEventsToInvokeLambda:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref "HelloWorldLambdaFunction"
Action: "lambda:InvokeFunction"
Principal: "events.amazonaws.com"
SourceArn:
Fn::GetAtt:
- "ScheduledRule"
- "Arn"
Please note that the Lambda function referred to in the above template does not appear in the template; if you wish to use this template, you will need to ensure you have defined that Lambda function resource elsewhere.
Cloud Development Kit (CDK) Example
This is another example of how we might automate the creation of a Lambda function invoked on a schedule defined in an EventBridge rule. This time the function and rule are created using the Cloud Development Kit in the TypeScript programming language.
The CDK example is functionally equivalent to the CloudFormation one above in terms of outputs. In both cases, the outputs are an EventBridge rule which has a basic "Hello World" Lambda function as its target.
First comes the EventBridge rule, which uses a scheduled expression equivalent to "at 13:00 (UTC) every day".
constructs/event-construct.ts
import { Construct } from "constructs";
import * as events from "aws-cdk-lib/aws-events";
export class EventConstruct extends Construct {
public eventRule: events.Rule;
constructor(scope: Construct, id: string) {
super(scope, id);
this.eventRule = new events.Rule(this, "ruleForDailyAt1300UTC", {
schedule: events.Schedule.cron({ minute: "0", hour: "13" }),
});
}
}
constructs/lambda-construct.ts
import * as lambda from "aws-cdk-lib/aws-lambda";
import { Construct } from "constructs";
export class HelloWorldLambdaConstruct extends Construct {
public lambda: lambda.Function;
constructor(scope: Construct, id: string) {
super(scope, id);
this.lambda = new lambda.Function(this, "helloWorldLambdaFunction", {
runtime: lambda.Runtime.NODEJS_14_X,
code: lambda.Code.fromAsset("lib/lambdas"),
handler: "hello-world.handler",
});
}
}
And then we do likewise for the Lambda schedule stack:
scheduled-lambda-stack.ts
import * as cdk from "aws-cdk-lib";
import * as targets from "aws-cdk-lib/aws-events-targets";
import * as events from "aws-cdk-lib/aws-events";
import { Construct } from "constructs";
import { EventConstruct } from "./constructs/event-construct";
import { HelloWorldLambdaConstruct } from "./constructs/lambda-construct";
export class ScheduledLambdaStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const lambdaConstruct = new HelloWorldLambdaConstruct(
this,
"HelloWorldLambdaConstruct"
);
const eventRule = new EventConstruct(this, "EventConstruct");
// Here we add the "hello world" Lambda function as a target of the EventBridge rule
eventRule.eventRule.addTarget(
new targets.LambdaFunction(lambdaConstruct.lambda, {
event: events.RuleTargetInput.fromObject({ message: "'Hello World!' Lambda Function" }),
})
);
// Grant the EventBridge rule permission to invoke the Lambda function
targets.addLambdaPermission(eventRule.eventRule, lambdaConstruct.lambda);
}
}
Next we come to the code for the Lambda function itself, written in Node.js.
lambdas/hello-world.js
exports.handler = async function (event) {
return {
statusCode: 200,
headers: { "Content-Type": "text/plain" },
body: `Hello World!`,
};
};
To turn this TypeScript code into a usable infrastructure template, we can run cdk synth
.
We can verify that this has worked by checking in the cdk.out
directory in the project. Assuming the template has been created successfully, the command cdk deploy
will deploy the CloudFormation stack.
Summary
EventBridge is a very effective cloud-native solution to task scheduling difficulties which can arise in a traditional server environment. Because it integrates so easily with other parts of your cloud infrastructure, it is an attractive option for cron jobs in the cloud.
Related Articles
A CI/CD Pipeline using CodeBuild, RDS and Route53
CodePipeline is a managed product that can be used to create an automated…
August 9th, 2022
Passing the Solutions Architect Professional Exam
A few months ago I successfully achieved the AWS Solutions Architect…
January 17th, 2022
Jemalloc and Rails on Amazon Linux
We were facing a major memory issue with our production Ruby on Rails…
June 8th, 2022