Computer scientists are frequently confronted with problems that are relevant to real-world scenarios. For example, “multiagent problems,” which are defined by multi-stage decision-making by multiple decision-makers or “agents,” have applications in search-and-rescue missions, firefighting, and emergency response.

Multiagent problems are frequently solved using the machine learning technique known as reinforcement learning (RL), which is concerned with how intelligent agents make decisions in an unfamiliar environment. Policy iteration (PI) is a common approach used in such endeavors, which begins with a ‘base policy’ and then improves on it to generate a ‘rollout policy’ (with the process of the generation called a rollout). The rollout is straightforward, dependable, and well-suited for online, model-free implementation.

Multiagent problems are often solved using a machine learning technique known as reinforcement learning (RL), which concerns itself with how intelligent agents make decisions in an environment unfamiliar to them.

However, there is a serious problem. “The amount of total computation in a standard rollout algorithm grows exponentially with the number of agents. Even for a small number of agents, this can make computations prohibitively expensive “Prof. Dimitri Bertsekas of the Massachusetts Institute of Technology and Arizona State University in the United States, who studies large-scale computation and communication and control optimization, explains.

PI is, in essence, a repeated application of rollout in which the rollout policy at each iteration becomes the base policy for the next iteration. In a typical multiagent rollout policy, all agents are allowed to influence the rollout algorithm at the same time (“all-agents-at-once” policy). Prof. Bertsekas has developed an approach that could change the game in a new study published in the IEEE/CAA Journal of Automatica Sinica.

The recent trend toward automation by machine learning and artificial intelligence systems raises old questions about the role of humans in human–machine systems in a new light. The difficulties that come with sharing control with computer systems are well known to industrial psychologists and systems analysts who study the human operators of complex machines.

Prof. Bertsekas’ paper focused on applying PI to problems involving multiple-component control, with each component chosen by a different agent. He assumed that all agents had perfect state information and shared it with one another. He then reformulated the problem by balancing control space and state space complexity. Furthermore, rather than an all-agents-at-once policy, he implemented an agent-by-agent policy in which only one agent was allowed to execute a rollout algorithm at a time, with coordinating information provided by the other agents.

The outcome was impressive. Instead of exponentially increasing complexity, Prof. Bertsekas discovered only a linear increase in computation with the number of agents, resulting in a significant reduction in computation cost. Furthermore, the computational simplification did not degrade the quality of the improved policy, which performed similarly to the standard rollout algorithm.

Prof. Bertsekas then investigated exact and approximate PI algorithms using the new version of agent-by-agent policy improvement and repeated rollout applications. He investigated the use of neural networks to encode successive rollout policies and to precompute signaling policies that coordinate the parallel computations of different agents for highly complex problems.

Overall, Prof. Bertsekas is upbeat about his findings and the prospects for his approach in the future. “The concept of agent-by-agent rollout can be applied to difficult multidimensional control problems, as well as deterministic discrete/combinatorial optimization problems involving constraints that couple the controls of different stages,” he observes. He has written two books on RL, one of which, titled “Rollout, Policy Iteration, and Distributed Reinforcement Learning,” will be published by Tsinghua Press in China and will go into great detail about the subject of his research.

The new approach to multiagent systems might very well revolutionize how complex sequential decision problems are solved.