Two processes in a single docker container or two services connecting to the same db?

https://softwareengineering.stackexchange.com/questions/331773

29-12-2020
|

Question

I recently started moving a monolithic application to microservices architecture using docker containers. The general idea of the app is:

scraping data -> format the data -> save the data to MySQL -> serve data via REST API.

I want to split each of the steps into a separate service. I think I have two choices, what is the best practice in microservices architecture here?

Option one
Scraper service - scrapes and publishes to Kafka
Formatter service - consumes messages from Kafka and formats it
API service - consumes Kafka messages, updates MySQL and exposes a REST API
Drawback: If I'm not wrong, docker containers should preferably run only one process per container

Option two
Scraper service - scrapes and publishes to Kafka
Formatter service - consumes messages from Kafka and formats it
Saving to DB service - receives the formatted information and just updates MySQL (runs as python process)
API service - exposes a REST API that serves requests with python flask.
Drawback: Two services connecting to the same DB, supposely not recommended as they would not be decoupled

What is the best practice here? should I go with option one and run flask server and kafka listener in the same container?

Thanks!

Solution

I would suggest something along the following lines.

Scraper: scrapes the data and published to Kafka
Formatter/Persistence: Reads from Kafka, sends data to the storage layer
Storage: 1 "real" database where you performs writes. Replicate this db to as many read only copies as you need.
API: Accesses only the read-only replicas to serve the data.

The concept of eventual consistency comes into play here. You can spin up as many replicas and API containers as you need to meet demand, at the cost of them sometimes returning different (old) data. At some point the replica dbs get refreshed and the API starts serving the newest data. This way, writing new data doesn't bottleneck the response times of your reads.

OTHER TIPS

Without any doubt it's option two, and the drawback you evocate is the same for option one since you would have one service ("API service") with 2 really different responsabilities (save to DB + expose to API) grouped in one deployment package.

These 2 services (save to DB and expose to API) could share a common DAO layer though, duplicated in both services. OR the "expose to API service" is read-only, so they would be fully independant services even if they interact with the same db.

UPDATE : just if you need to see that sharing a databases between 2 microservices is not an antipattern : http://microservices.io/patterns/data/shared-database.html

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange