Question

Since MADDPG uses a centralized critic for training, why not simply treat all cooperating agents as a single meta-agent with a concatenated observation space and a concatenated action space? In my opinion, MADDPG is centralized enough, so it won't hurt to go one step further.

Was it helpful?

Solution

MADDPG can be used to model agents that have limited observation and communication capabilities after training, which is an interesting and useful real world scenario.

why not simply treat all cooperating agents as a single meta-agent with a concatenated observation space and a concatenated action space?

Any real world implementation will then require resources to provide and manage that overview. This may not be practical or desirable in all cases.

There is no single fix for this, it is an open area of research. Whether to invest in better communication and central processing, or better autonomy for multiple agents is likely to have different answers depending on the problem and current technology limits for either approach.

MADDPG reduces the role of central processing to assessment of global reward signals during training. That means:

  • Each agent works with local signals only, and simpler observation and action spaces as a result. Only the reward signal processing is handled externally.

  • Trained agents can theoretically be used in environments where a central processor is not available.

So, for example, agents can be trained in simulation with all the oversight that allows, or with a carefully instrumented environment including high bandwidth connections between agents and central processing. They can then be deployed into matching environments where the central oversight is not available, or too costly.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top