Question

I'm planning on the following for a use case scenario for AWS Lambda and want to make sure I'm headed in the right direction and that there's not some better/easier solution out there.

The database is in AWS RDS (SQL Server), and the web application is in AWS ECS Fargate / Dockerized. The database has ~8,000 records in it. Language wil be .NET Core 3.1 / EF Core 3.1 (when it is supported here shortly hopefully).

What needs to happen is every day something wakes up and gets the IDs of the records that need to be updated today (those that have not been updated in a month). Each of those IDs are sent to a remote API and the results used to update the original database record.

For this I am thinking I can use two Lambda functions?

AWS Docs say you can run up to 1,000 Lambdas as a time (and request more if you have to), so that should be sufficient for my needs at the moment.

Function 1

  • Wakes up and gets the list of IDs
  • For each ID, spawn Function 2 and pass along the ID
  • Exit

Function 2

  • Receive the ID of the record to update
  • Make a call to the remote service to get the updated data for ID
  • If success update the database fields and LastUpdated date for ID
  • If error mark record as error/retry
  • Exit

This seems like what Lambda was made for to me? Am I wrong or overlooking something basic that will bite me when I start?

Was it helpful?

Solution

You should probably look over this: AWS lambda best practices. The first bullet is one thing you should definitely consider:

Initialize SDK clients and database connections outside of the function handler, and cache static assets locally in the /tmp directory. Subsequent invocations processed by the same instance of your function can reuse these resources. This saves execution time and cost.

Initializing a DB connection is typically one of the most expensive parts of working with a DB. The above is good advice but if you are running lots of concurrent instances, I would expect you need a separate connection for each one e.g. connections are typically not threadsafe in Java. A thousand connections all working on the same tables could cause contention on the DB and actually result in a slower overall execution time i.e. what I call the 'Stooges' problem. When all 3 Stooges all try to go through the door at the same time, the throughput is less than if they go through one at a time.

A middle path would be to pass in a list of ids up to a certain limit or split the ids across a fixed number of instances. This allows for control over the amount of concurrency.

Licensed under: CC-BY-SA with attribution
scroll top