A team at Los Alamos National Laboratory has developed a novel approach for comparing neural networks that looks inside the “black box” of artificial intelligence to help researchers understand neural network behavior. Neural networks recognize patterns in datasets and are used in applications such as virtual assistants, facial recognition systems, and self-driving cars.
“The artificial intelligence research community does not necessarily have a complete understanding of what neural networks are doing; they give us good results, but we don’t know how or why,” said Haydn Jones, a researcher in Los Alamos’ Advanced Research in Cyber Systems group. “Our new method compares neural networks more effectively, which is an important step toward better understanding the mathematics behind AI.”
Jones is the lead author of the paper “If You’ve Trained One You’ve Trained Them All: Inter-Architecture Similarity Increases With Robustness,” which was presented recently at the Conference on Uncertainty in Artificial Intelligence. In addition to studying network similarity, the paper is a crucial step toward characterizing the behavior of robust neural networks.
The artificial intelligence research community does not necessarily have a complete understanding of what neural networks are doing; they give us good results, but we don’t know how or why. Our new method compares neural networks more effectively, which is an important step toward better understanding the mathematics behind AI.
Haydn Jones
High-performance neural networks are fragile. Self-driving cars, for example, use neural networks to detect signs. When conditions are ideal, they perform admirably. However, even minor imperfections, such as a sticker on a stop sign, can cause the neural network to misidentify the sign and never stop.
Researchers are looking for ways to improve network robustness in order to improve neural networks. One cutting-edge method involves “attacking” networks during their training process. Researchers purposefully introduce anomalies and train the AI to ignore them. This is known as adversarial training, and it makes it more difficult to fool the networks.
Jones, Los Alamos collaborators Jacob Springer and Garrett Kenyon, and Jones’ mentor Juston Moore, applied their new metric of network similarity to adversarially trained neural networks, and found, surprisingly, that adversarial training causes neural networks in the computer vision domain to converge to very similar data representations, regardless of network architecture, as the magnitude of the attack increases.
“We found that when we train neural networks to be robust against adversarial attacks, they begin to do the same things,” Jones said.
There has been considerable effort in industry and academia to find the “right architecture” for neural networks, but the Los Alamos team’s findings indicate that the addition of adversarial training significantly narrows this search space. As a result, the AI research community may not need to spend as much time exploring new architectures, knowing that adversarial training causes diverse architectures to converge to similar solutions.
“We are making it easier to understand how robust AI might work by discovering that robust neural networks are similar to one another. We may even be gaining insight into how humans and other animals perceive” Jones explained.