Bringing data from 30 APIs based on data came from the APIs

https://softwareengineering.stackexchange.com/questions/392496

25-02-2021
|

Domanda

I have to write server (.net core) that is going to read data from 30 different remote APIs. I suppose to run some sort of decision trees, if I find something in one API goes to another API, else some other and so on.

Goes something like that:

if apiA.response.data has 'some info' {
  //read apiB
  if apiB.response.data has 'some info' {
     //read apiC
     ....
  }else{
    //read apiJ
    ....
  }
}else{
    //read apiQ
    ....
}

So each cycle can end up with between 20-30 API calls depends on the result and the use case.

What I thought to do is to put some sort of queue management (rabbit) and let it handle it.

So ApiAHandler will queue a task to run AnotherApiHandler base on the result.
AnotherApiHandler will queue a task to run SomeOtherApiHandler base on the result.

and so on.

Each handler will persist to the result to the database assuming that each handler knows what to with it and how to persist it.

What do you think? is it efficient to handle it like that?

A little bit more about the rules The rules are not so complex, after we extract the data we need to run few if statements and take a decision, the decision is what api handler to trigger and what data context to send it this handler. a decision is also can bring the task to an end.

It is unpredictable how often they will be changed, I think not that often. I think to deploy a new version when we need a change can be possible.

The app needs to be testable. system tests and unit tests.

Soluzione

When we discuss message-driven designs, it's important to distinguish between the features queuing in general (including local, in-memory queues) and the features of distributed messaging. It used to be that one of the big benefits of messaging systems was that they helped with the problem of scarcity in computing resources, specifically memory. Before 64-bit systems were ubiquitous, you effectively had an upper bound of 4GB of memory on a machine. Messaging allowed you to scale horizontally to handle large volumes of transactions without big iron. Distributed messaging allows you to spread your logic across many systems in a loosely coupled way. I used to work at a company where 3rd quarter volume was much larger than the rest of the year. The message driven approach allowed us to add more servers in for that period and remove them later.

Now, that aspect of distributed messaging isn't as interesting. Running a machine with 32GB of RAM isn't anything special. The question then is what does distributed messaging buy us that in-memory queuing doesn't. The main thing I would argue is that being able to put work on different machines/containers/functions is still useful especially in a micro-services type environment where different parts of the process are managed by different teams, potentially with different languages and platforms. There is also a level of fault-tolerance that you get with messaging but I would caution you against relying solely on messaging platforms to 'guarantee' delivery. There are so many ways that can go wrong regardless of the quality of the platform or your installation of it.

If you are thinking of using messaging to manage synchronous request-response calls: just don't. You are only adding a bunch of problems and getting less than what HTTP provides. It's a needless complication. In general, using distributed queues for anything synchronous is going to be problematic. The main point is that you even out your load so that spikes can be handled. But corollary to that is when you have spikes, some transactions are going to wait. With the system I mentioned above, we had transactions that would not get a response for weeks. If your clients are timing out and re-sending requests during periods of high load, nothing good will come of it. You could end up in a massive downward spiral and run out of resources.

The approach that is valid for queuing (local or distributed) is that your transaction moves through the bus and at each 'stop' different actions are taken and the message is transformed for processing at the next step. However, it also can create a lot of new challenges. You now need to deal with poison messages, you need to make sure your transactions are coordinated with reads and writes because reads are destructive. What do you do when you write to a DB and your commit on it succeeds but your commit on the queue fails? If you really want that to be correct, no you need some sort of XA transaction.

The upshot: if you can't articulate exactly what problems a distributed messaging platform solves for you, then it's highly unlikely that it's a good choice. Just because you 'could' use one is not a good reason to do so. For scaling, there are so many other options that are far simpler to get right that offer similar, if not greater advantages.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a softwareengineering.stackexchange