Question

so my ingestion data comes from IOT -> Kinesis firehose -> s3, the thing is - i want to throw away jsons that are not complied with my schema.

I have to throw them before it reaches S3, as i'm using a glue crawler to build a schema off it, and it can cause issues if the jsons are not similar later in the queries and processing.

One way to do it is using transformation lambda, with maximum buffer of 3MB (which seems wasteful - as our data rate is huge - it will cause large number of lambda invocations).

Even so - i don't want to hard-code the schema into the lambda.

So - if there's no other decent option but a transformation lambda - where should i keep the schema? should i use AWS AppConfig? then request that schema and validate the json against it?

There's also Glue Schema registry - but i'm not sure if it can be easily integrated into just a lambda.

Any other ideas would be appriciated.

No correct solution

Licensed under: CC-BY-SA with attribution
scroll top