How to access large amounts of data for machine learning in a microservices architecture

https://softwareengineering.stackexchange.com/questions/270185

06-10-2020
|

Question

Imagine you are building a product recommendation algorithm for an eccommerce application which is built as a microservices architecture, this architecture having separate services for users and products. The algorithm should be exposed as a recommendation service which, given a user id, returns a list of recommended products based on their buying history.

Here's the problem, assume that the reccomendation algorithm is offline so runs in batches and requires every user's buying history for each run, how do you get this data from the user service? In a monolithic architecture you would just read directly from a reporting copy of the database allowing large complex queries without impacting production performance.

The obvious answer to me seems to be that you just make a huge request to the users service on each run and make sure you have capacity to deal with that request, are there better solutions?

Solution

Keep a copy of the users' purchasing history within your recommendations service. Then, for each new batch run, it only has to request the updated users from the users service (a delta update, if you want to call it like that).

You will need a big update to ramp up the recommendations service, and possibly a big synchronization update from time to time, but you do not need to request the whole purchasing history each time.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange