How to Train Your AI Soldier Robots (and the Humans Who Command Them)

Print Friendly, PDF & Email
Image courtesy of Devin Rumbaugh/DVIDS.

This article was originally published by War on the Rocks on 21 February 2020.

Artificial intelligence (AI) is often portrayed as a single omnipotent force — the computer as God. Often the AI is evil, or at least misguided. According to Hollywood, humans can outwit the computer (“2001: A Space Odyssey”), reason with it (“Wargames”), blow it up (“Star Wars: The Phantom Menace”), or be defeated by it (“Dr. Strangelove”). Sometimes the AI is an automated version of a human, perhaps a human fighter’s faithful companion (the robot R2-D2 in “Star Wars”).

These science fiction tropes are legitimate models for military discussion — and many are being discussed. But there are­ other possibilities. In particular, machine learning may give rise to new forms of intelligence; not natural, but not really “artificial” if the term implies having been designed in detail by a person. Such new forms of intelligence may resemble that of humans or other animals, and we will discuss them using language associated with humans, but we are not discussing robots that have been deliberately programmed to emulate human intelligence. Through machine learning they have been programmed by their own experiences. We speculate that some of the characteristics that humans have evolved over millennia will also evolve in future AI, characteristics that have evolved purely for their success in a wide range of situations that are real, for humans, or simulated, for robots.

As the capabilities of AI-enabled robots increase, and in particular as behaviors emerge that are both complex and outside past human experience, how will we organize, train, and command them — and the humans who will supervise and maintain them? Existing methods and structures, such as military ranks and doctrine, that have evolved over millennia to manage the complexity of human behavior will likely be necessary. But because robots will evolve new behaviors we cannot yet imagine, they are unlikely to be sufficient. Instead, the military and its partners will need to learn new types of organization and new approaches to training. It is impossible to predict what these will be but very possible they will differ greatly from approaches that have worked in the past. Ongoing experimentation will be essential.

How to Respond to AI Advances

The development of AI, especially machine learning, will lead to unpredictable new types of robots. Advances in AI suggest that humans will have the ability to create many types of robots, ­­of different shapes, sizes, or degrees of independence or autonomy. It is conceivable that humans may one day be able to design tiny AI bullets to pierce only designated targets, automated aircraft to fly as loyal wingmen alongside human pilots, or thousands of AI fish to swim up an enemy’s river. Or we could design AI not as a device but as a global grid that analyzes vast amounts of diverse data. Multiple programs funded by the Department of Defense are on their way to developing robots with varying degrees of autonomy.

In science fiction, robots are often depicted as behaving in groups (like the robot dogs in “Metalhead”). Researchers inspired by animal behaviors have developed AI concepts such as “swarms,” in which relatively simple rules for each robot can result in complex emergent phenomena on a larger scale. This is a legitimate and important area of investigation. Nevertheless, simply imitating the known behaviors of animals has its limits. After observing the genocidal nature of military operations among ants, biologists Bert Holldobler and E. O. Wilson wrote, “If ants had nuclear weapons, they would probably end the world in a week.” Nor would we want to limit AI to imitating human behavior. In any case, a major point of machine learning is the possibility of uncovering new behaviors or strategies. Some of these will be very different from all past experience; human, animal, and automated. We will likely encounter behaviors that, although not human, are so complex that some human language, such as “personality,” may seem appropriately descriptive. Robots with new, sophisticated patterns of behavior may require new forms of organization.

Military structure and scheme of maneuver is key to victory. Groups often fight best when they don’t simply swarm but execute sophisticated maneuvers in hierarchical structures. Modern military tactics were honed over centuries of experimentation and testing. This was a lengthy, expensive, and bloody process.

The development of appropriate organizations and tactics for AI systems will also likely be expensive, although one can hope that through the use of simulation it will not be bloody. But it may happen quickly. The competitive international environment creates pressure to use machine learning to develop AI organizational structure and tactics, techniques, and procedures as fast as possible.

Despite our considerable experience organizing humans, when dealing with robots with new, unfamiliar, and likely rapidly-evolving personalities we confront something of a blank slate. But we must think beyond established paradigms, beyond the computer as all-powerful or the computer as loyal sidekick.

Humans fight in a hierarchy of groups, each soldier in a squad or each battalion in a brigade exercising a combination of obedience and autonomy. Decisions are constantly made at all levels of the organization. Deciding what decisions can be made at what levels is itself an important decision. In an effective organization, decision-makers at all levels have a good idea of how others will act, even when direct communication is not possible.

Imagine an operation in which several hundred underwater robots are swimming up a river to accomplish a mission. They are spotted and attacked. A decision must be made: Should they retreat? Who decides? Communications will likely be imperfect. Some mid-level commander, likely one of the robot swimmers, will decide based on limited information. The decision will likely be difficult and depend on the intelligence, experience, and judgment of the robot commander. It is essential that the swimmers know who or what is issuing legitimate orders. That is, there will have to be some structure, some hierarchy.

The optimal unit structure will be worked out through experience. Achieving as much experience as possible in peacetime is essential. That means training.

Training Robot Warriors

Robots with AI-enabled technologies will have to be exercised regularly, partly to test them and understand their capabilities and partly to provide them with the opportunity to learn from recreating combat. This doesn’t mean that each individual hardware item has to be trained, but that the software has to develop by learning from its mistakes in virtual testbeds and, to the extent that they are feasible, realistic field tests. People learn best from the most realistic training possible. There is no reason to expect machines to be any different in that regard. Furthermore, as capabilities, threats, and missions evolve, robots will need to be continuously trained and tested to maintain effectiveness.

Training may seem a strange word for machine learning in a simulated operational environment. But then, conventional training is human learning in a controlled environment. Robots, like humans, will need to learn what to expect from their comrades. And as they train and learn highly complex patterns, it may make sense to think of such patterns as “personalities” and “memories.” At least, the patterns may appear that way to the humans interacting with them. The point of such anthropomorphic language is not that the machines have become human, but that their complexity is such that it is helpful to think in these terms.

One big difference between people and machines is that, in theory at least, the products of machine learning, the code for these “memories” or “personalities,” can be uploaded directly from one very experienced robot to any number of others. If all robots are given identical training and the same coded memories, we might end up with a uniformity among a unit’s members that, in the aggregate, is less than optimal for the unit as a whole.

Diversity of perspective is accepted as a valuable aid to human teamwork. Groupthink is widely understood to be a threat. It’s reasonable to assume that diversity will also be beneficial to teams of robots. It may be desirable to create a library of many different “personalities” or “memories” that could be assigned to different robots for particular missions. Different personalities could be deliberately created by using somewhat different sets of training testbeds to develop software for the same mission.

If AI can create autonomous robots with human-like characteristics, what is the ideal personality mix for each mission? Again, we are using the anthropomorphic term “personality” for the details of the robot’s behavior patterns. One could call it a robot’s “programming” if that did not suggest the existence of an intentional programmer. The robot’s “personalities” have evolved from the robot’s participation in a very large number of simulations. It is unlikely that any human will fully understand a given personality or be able to fully predict all aspects of a robot’s behavior.

In a simple case, there may be one optimum personality for all the robots of one type. In more complicated situations, where robots will interact with each other, having robots that respond differently to the same stimuli could make a unit more robust. These are things that military planners can hope to learn through testing and training. Of course, attributes of personality that may have evolved for one set of situations may be less than optimal, or positively dangerous, in another. We talk a lot about “artificial intelligence.” We don’t discuss “artificial mental illness.” But there is no reason to rule it out.

Of course, humans will need to be trained to interact with the machines. Machine learning systems already often exhibit sophisticated behaviors that are difficult to describe. It’s unclear how future AI-enabled robots will behave in combat. Humans, and other robots, will need experience to know what to expect — and to deal with any unexpected behaviors that may emerge. Planners need experience to know which plans might work.

But the human-robot relationship might turn out to be something completely different. For all of human history, generals have had to learn their soldiers’ capabilities. They knew best exactly what their troops could do. They could judge the psychological state of their subordinates. They might even know when they were being lied to. But today’s commanders do not know, yet, what their AI might prove capable of. In a sense, it is the AI troops that will have to train their commanders.

In traditional military services, the primary peacetime occupation of the combat unit is training. Every single servicemember has to be trained up to the standard necessary for wartime proficiency. This is a huge task. In a robot unit, planners, maintainers, and logisticians will have to be trained to train and maintain the machines but may spend little time working on their hardware except during deployment.

What would the units look like? What is the optimal unit rank structure? How does the human rank structure relate to the robot rank structure? There are a million questions as we enter uncharted territory. The way to find out is to put robot units out onto test ranges where they can operate continuously, test software, and improve machine learning. AI units working together can learn and teach each other — and humans.


AI-enabled robots will need to be organized, trained, and maintained. While these systems will have human-like characteristics, they will likely “develop” distinct personalities. The military will need an extensive training program to inform new doctrines and concepts to manage this powerful, but unprecedented, capability.

It’s unclear what structures will prove effective to manage AI robots. Only by continuous experimentation can people, including computer scientists and military operators, understand the developing world of multi-unit human and robot forces. We must hope that experiments lead to correct solutions. There is no guarantee that we will get it right. But there is every reason to believe that as technology enables the development of new and more complex patterns of robot behavior, new types of military organizations will emerge.

About the Author

Thomas Hamilton is a Senior Physical Scientist at the nonprofit, nonpartisan RAND Corporation.

For more information on issues and events that shape our world, please visit the CSS website.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.