Aggregating data between microservices in twitter-like application

https://softwareengineering.stackexchange.com/questions/402474

05-03-2021
|

Pergunta

I'm developing a Twitter-like app and I have some doubts about my service oriented architecture.

I have a User Service with a REST endpoint POST /users/{userId}/follow so the "connected" user starts following the user userId. After that, this service publish an event UserStartedFollowing with a payload like this:

{ "followerId": "1234",  "followedId": "77638" }

Then, I have a Timeline Service that listens this events and build the timeline (the comments, aka tweets) asynchronously. This service exposes an endpoint GET /messages that returns all the messages (aka tweets) of people that you are following (previously posted using another endpoint POST /messages). The response is like this:

[{
  userId: "786387",
  content: "This is an example of a message"
}]

Problem comes here. In the app, I need to show the message author's name. But I don't have that information in the Timeline Service, but only the ID. What is the better way to do this?

I have thought about fetching each user name from her ID from the User Service (from some kind of API gateway), caching the data. But maybe it is too heavy.

Another option could be to include the followed user name in the UserStartedFollowing event and store it in the Timeline Service, too. So then it will be included in the /messages response.

What do you think?

Solução

What you have here is not bad at all. You will find; however, that the user interface has different needs than a proper API. There are several ways to address this. The one that Facebook chose to employ was GraphQL, which when combined with React allows you to assemble all this information together. See also: Intro Tutorial.

What GraphQL allows the front end to do is to package up it's needs into one query to the GraphQL Server. With intelligent design in how you implement that for your application, the infrastructure can consolidate the requests. Lets say you have a feed that looks like this:

[{
  userId: "786387",
  content: "This is an example of a message"
},{
  userId: "786387",
  content: "The same user said something else"
}]

You can create a query that will get everything you need and GraphQL will only request the User information for the user id 786387 one time. That optimizes the load on the back-end, and lets you think about where you would implement caching should you need it.

Another benefit here is that GraphQL does allow you to customize the size and shape of your data. If you only care about displaying the user's name on screen, you can specify that in your request and you won't receive any other user information.

GraphQL is not the only solution, but it is a pretty decent generic solution for problems like this. Another alternative is to create your own custom federation service that acts as an interface between the front-end and the micro-service. Falcor was created by NetFlix with many of the same use cases as GraphQL. I remember hearing that NetFlix has since moved to GraphQL, but my sources could be wrong.

As a dissenting opinion, I have another blog article that shows that GraphQL is not a panacea. The first part of the article actually paints a really good picture for why you would want to use it:

Schema stitching allows each microservice to have a GraphQL endpoint, and a master to tie it all together
It's strongly typed
There's bindings for almost every language you use
You can use multiple front-end frameworks, so you aren't stuck with React
Schemas support versioning

But then it also points out some deficiencies as well:

Queries can get complex
It's harder to rate-limit because not every query has the same impact
It can be harder to apply caching to the GraphQL layer (but you can still apply it to the backing REST calls

At the end of the day, it's worth a look to see if it solves the needs you have. The alternative is to create your own custom layer that is specific to your website. That would be quicker to get up and running because it's a lower learning curve, but you lose the advantages of decoupling your back-end from your frond-end code.

Outras dicas

I think you have a conflict of priorities of normalized vs. denormalized data. Simply put, for storage and consistency you want to have a users table (userid and name) and a messages table (messageid, userid and message) and a user to user table (userid, userid followed) where you track who is following whom. You don't want to permanently store 100 copies of a message when a user has 100 subscribers.

(I know you have SoA and not one database, stay with me, this is more of a conceptual view)

For presentation, you do want a denormalized view for the most recent data, for example the last day's feed for each user or even just the user's first page load. Most services of this kind have an infinite scrolling feature where the user can get more data displayed, but likely it's not as effective on the back-end of their service, as few users will dig days into their news feed hence the designers can afford to do less caching or be less efficient about retrieving the data.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange