New Delhi– Despite their rapid advancements, artificial intelligence (AI) systems continue to fall short in understanding the social dynamics and context necessary for interacting effectively with people, according to new research from Johns Hopkins University.
The study revealed that humans outperform current AI models in interpreting social interactions within moving scenes—a critical skill for technologies like self-driving cars, assistive robots, and other AI-driven systems designed to navigate the complexities of the real world.
“AI for a self-driving car, for example, needs to recognize the intentions, goals, and actions of drivers and pedestrians. You’d want it to predict whether a pedestrian is about to step into the street or if two people are simply engaged in conversation,” explained lead author Leyla Isik, assistant professor of cognitive science at Johns Hopkins University. “Whenever an AI system interacts with humans, it must recognize human behavior—and right now, these systems can’t do that effectively.”
To evaluate how AI models compare to human perception, the researchers asked participants to watch three-second video clips and rate specific features crucial for understanding social interactions on a scale of one to five. The clips showed people engaging in different scenarios: interacting with each other, performing parallel activities, or acting independently.
The researchers then tasked more than 350 AI models—including language, video, and image-based systems—with predicting how humans would judge these scenes and how their brains might respond while watching. Large language models were also asked to interpret human-written captions for the clips.
The findings highlighted a significant gap between AI performance and human perception, especially in dynamic situations, even though AI has shown strong capabilities in analyzing still images.
“It’s not enough for AI to recognize objects and faces in static images—that was just the first step and it got us far,” said Kathy Garcia, a doctoral student in Isik’s lab. “But real life isn’t static. For AI to truly function in the real world, it needs to understand unfolding stories, relationships, and social dynamics. This research points to a blind spot in current AI development.”
The researchers suggest that this shortcoming may be rooted in the architecture of AI neural networks. These systems are modeled after the part of the human brain responsible for processing static images, whereas the ability to interpret dynamic social scenes is governed by a different region.
The study emphasizes the need for a shift in AI design to better mimic the brain’s capacity for processing complex social interactions, paving the way for more advanced, human-aware AI systems. (Source: IANS)