🛡️ Laconia Batch — Reads large number of records without time limit.
Reads large number of records without Lambda time limit.
AWS Lambda maximum execution duration per request is 300 seconds, hence it is
impossible to utilise a Lambda to execute a long running task. laconia-batch
handles your batch processing needs by providing a beautifully designed API
which abstracts the time limitaton problem.
Check out FAQ
Install laconia-batch using yarn:
yarn add laconia-batch
Or via npm:
npm install --save laconia-batch
These are the currently supported input sources:
- DynamoDB
- S3
Example of batch processing by scanning a dynamodb table:
const laconiaBatch = require("laconia-batch");
module.exports.handler = laconiaBatch(
_ =>
laconiaBatch.dynamoDb({
operation: "SCAN",
dynamoDbParams: { TableName: "Music" }
}),
{ itemsPerSecond: 2 }
).on("item", ({ event }, item) => processItem(event, context));
Rate limiting is supported out of the box by setting the batchOptions.itemsPerSecond
option.
laconia-batch
works around the Lambda's time limitation by using recursion.
It will automatically recurse when Lambda timeout is about to happen, then resumes
from where it left off in the new invocation.
Imagine if you are about to process the array [1, 2, 3, 4, 5] and each requests can only handle two items, the following will happen:
- request 1: Process 1
- request 1: Process 2
- request 1: Not enough time, recursing with current cursor
- request 2: Process 3
- request 2: Process 4
- request 2: Not enough time, recursing with current cursor
- request 3: Process 5
-
readerFn(laconiaContext)
- This
Function
is called when your Lambda is invoked - The function must return a reader object i.e.
dynamoDb()
,s3()
- Will be called with
laconiaContext
object, which can be destructured to{event, context}
- This
-
batchOptions
-
itemsPerSecond
- Optional
- Rate limit will not be applied if value is not set
- Can be set to decimal, i.e. 0.5 will equate to 1 item per 2 second.
-
timeNeededToRecurseInMillis
- Optional
- The value set here will be used to check if the current execution is to be stopped
- If you have a very slow item processing, the batch processor might not have enough time to recurse and your Lambda execution might be timing out. You can increase this value to increase the chance of the the recursion to happen
-
Example:
// Use all default batch options (No rate limiting)
laconiaBatch(_ => dynamoDb());
// Customise batch options
laconiaBatch(_ => dynamoDb(), {
itemsPerSecond: 2,
timeNeededToRecurseInMillis: 10000
});
There are events that you can listen to when laconia-batch
is working.
- item:
laconiaContext, item
- Fired on every item read.
-
item
is an object found during the read -
laconiaContext
can be destructured to{event, context}
- start:
laconiaContext
- Fired when the batch process is started for the very first time
-
laconiaContext
can be destructured to{event, context}
- stop:
laconiaContext, cursor
- Fired when the current execution is timing out and about to be recursed
-
cursor
contains the information of how the last item is being read -
laconiaContext
can be destructured to{event, context}
- end:
laconiaContext
- Fired when the batch processor can no longer find any more records
-
laconiaContext
can be destructured to{event, context}
Example:
laconiaBatch({ ... })
.on('start', (laconiaContext) => ... )
.on('item', (laconiaContext, item) => ... )
.on('stop', (laconiaContext, cursor) => ... )
.on('end', (laconiaContext) => ... )
Creates a reader for Dynamo DB table.
-
operation
- Mandatory
- Valid values are:
'SCAN'
and'QUERY'
-
dynamoDbParams
- Mandatory
- This parameter is used when documentClent's operations are called
-
ExclusiveStartKey
param can't be used as it will be overridden in the processing time!
-
documentClient = new AWS.DynamoDB.DocumentClient()
- Optional
- Set this option if there's a need to cutomise the AWS.DynamoDB.DocumentClient instantation
- Used for DynamoDB operation
Example:
// Scans the entire Music table
dynamoDb({
operation: "SCAN",
dynamoDbParams: { TableName: "Music" }
});
// Queries Music table with a more complicated DynamoDB parameters
dynamoDb({
operation: "QUERY",
dynamoDbParams: {
TableName: "Music",
Limit: 1,
ExpressionAttributeValues: {
":a": "Bar"
},
FilterExpression: "Artist = :a"
}
});
Creates a reader for an array stored in s3.
-
path
- Mandatory
- The path to the array to be processed
- Set to
'.'
if the object stored in s3 is the array - Set to a path if an object is stored in s3 and the array is a property of the object
-
lodash.get
is used to retrieve the array
-
-
s3Params
- Mandatory
- This parameter is used when
s3.getObject
is called to retrieve the array stored in s3
-
s3 = new AWS.S3()
- Optional
- Set this option if there's a need to cutomise the AWS.S3 instantation
- Used for S3 operation
Example:
// Reads an array from array.json in MyBucket
s3({
path: ".",
s3Params: {
Bucket: "MyBucket",
Key: "array.json"
}
});
// Reads the array retrieved at database.music[0]["category"].list from object.json in MyBucket
s3({
path: 'database.music[0]["category"].list',
s3Params: {
Bucket: "MyBucket",
Key: "object.json"
}
});