Streaming File Storage in Distributed (and Containerized) Systems

https://softwareengineering.stackexchange.com/questions/419691

19-03-2021
|

Question

I'm trying to implement a system that allows a user to upload files over HTTP, saves the file to object storage as well as any metadata surrounding the file that already exists to a NoSQL database (i.e. the groups who have access to it, it's name, size, etc.). At that point it will be sent to a document parser will will perform some operations on it to glean more metadata, which will then be saved back to the NoSQL database and also put into ElasticSearch.

I thought initially that this was not an uncommon architecture. However, the tools that I'm finding for object storage seem to not very easily fit the technical capabilities I have at the file's entrypoint into the backend (I can go into this in more detail if needed, but it's lower-level).

That leads me to think that maybe I'm not taking the best approach. Here, you can find the general flow of the file and it's metadata inwards.

My question is, is this a "bad" architecture? Is it uncommon? If so, what are some alternatives? If not, what are some tools for object storage that might better suit this architecture?

I would like this system to be able to be hosted offline using docker containers (each box in the architecture being a container, aside from the web UI). This obviously means that using a cloud-based service like S3 is out of the question.

I would also like to avoid storing files on the API servers themselves, and avoid the need to load entire files into memory on these servers (for obvious reasons).

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange