Are AWS Lambda functions a good fit for the use case of fetching huge amount of records from database on User request?

softwareengineering.stackexchange https://softwareengineering.stackexchange.com/questions/396672

Question

We have a use case where an AWS Lambda function is called from AWS API Gateway on user request from the browser, it fetches data from the database and returns back to the client.

The amount of time taken to retrieve the data or the amount of data depends on what filters the user has selected. Do you think the Lambda function way is the right fit for this kind of use case?

Additional questions: (If the answer is yes, the Lambda function can be a good fit).

  • How do we estimate the memory requirement for the Lambda function for such a use case?
  • The Lambda function (for the above use case) waits until data is fetched from the database. Is there any better cost-effective way?
    • Is having a dedicated AWS EC2 instance host your web app rather a better way?
Was it helpful?

Solution

Sounds like a possible use case but it really depends on the type of database, how much data you're returning and if the hard limits on AWS services prohibit the use of API Gateway/Lambda.

SQL Limitations

If you're running lots of long SQL queries it's probably not going to to make sense to do it directly from Lambda or you might risk hitting your database too hard. You're also basically wasting compute time in Lambda that isn't doing anything but wait for a response from the SQL database. There's also the issue with getting all the response data back and passing it back to the user in a format that makes sense.

You can handle a decent amount of data using Lambda requests and responses but if you're getting into hundreds of megabytes of data you might want to think about another solution. There's some harder limits for API Gateway than for Lambda so be sure to look at both.

Other DB Options

For your use case (filters and searches by the user) if you're using SQL that query may or may not be optimized well but likely your database has some connection limitations and potential issues with concurrent queries. If you need to use a database like this you could implement some caching mechanisms or look at designing an application so that it:

  1. Sends and validates a query into a system and provides some response id to lookup later
  2. Queues the query with something like SQS
  3. Executes the queries in order one at a time or in parallel with some limit to prevent hammering the DB this could be done in Lambda but might also make sense to have some limited set of query executors in ECS.
  4. Then return a result to something like an S3 and update the query ID so that you can reference the result.
  5. The frontend would have the query ID from step one and then have to periodically check for the result

This is just one possible solution of many for the SQL use case and might not be needed at all if your database performs well and doesn't have exceptional amounts of queries.

If you can pick a database to support search more effectively you might want to take a look at using something like ElasticSearch if you haven't already. That can support search queries more easily. Or if you want a search tool that's managed by someone else with less maintenance you could look at something like Algolia. I've had some success with Algolia/Lambda/API Gateway and search in some my projects.

Memory Requirements and EC2

For function memory requirements you're gonna need to see how much data you're returning with the typical queries and potentially push Lambda memory up to support that. You could also stream query data through Lambda into a file in S3 and return that location to your application.

As for EC2, you could have an EC2 or ECS query executor as part of your application that is long running. But you'd also probably want to look at how your database or your application handles the query queuing like I mentioned above.

OTHER TIPS

I believe there is a 6Mb limit on the request/response, which would make it unsuitable.

https://docs.aws.amazon.com/lambda/latest/dg/limits.html

Ideally you want a resumable stream of data for large stuff.

Licensed under: CC-BY-SA with attribution
scroll top