By Michael Blanding
Working Knowledge Via The Harvard Gazette
(Editor’s note: This article is reprinted here with permission from the Harvard Gazette. The article is originally published in Working Knowldege of the Harvard Business School.)
With the help of artificial intelligence, creative ideas for solving global challenges are flooding into innovation hubs and crowdsourcing platforms. Yet, in attempting to weed out weak ideas from strong ones, AI sometimes produces inaccurate evaluations and even creates convincing-sounding arguments that “fool” humans into agreeing with its decisions.
When humans and AI work together as a team, they can identify innovative ideas for addressing social problems more efficiently than either humans or AI could on their own, says research by Harvard Business School Assistant Professor Jacqueline Ng Lane. However, people sometimes surrender their good judgment and defer to AI’s decisions—even when AI produces incorrect information.
You really need to have humans synthesizing and validating the data.
As companies rapidly adopt AI to assist with various tasks and decisions, the research reveals some technological weaknesses: AI systems have limitations when evaluating creative ideas based on subjective criteria and produce persuasive-sounding justifications that may sway human judgment—even when the underlying reasoning is weak or unsubstantiated. Ultimately, that’s why humans must use their critical thinking skills to question and verify AI-generated information, Lane says.
“You really need to have humans synthesizing and validating the data,” she says. “You have to know when to question AI-generated evaluations.”
Lane coauthored the working paper “The Narrative AI Advantage? A Field Experiment on Generative AI-Augmented Evaluations of Early-Stage Innovations” with Leonard Boussioux and Ying Hao Chen of the University of Washington; Charles Ayoubi of ESSEC Business School in France; Camila Lin of Microsoft; Pei-Hsin Wang of Accenture; and Rebecca Spens and Pooja Wagh from MIT Solve.
Can AI help choose the best ideas?
In conducting a field experiment, Lane and her fellow researchers partnered with MIT Solve, an entrepreneurship initiative and platform that seeks solutions to social challenges. In 2024, the organization awarded $1 million to 30 applicants.
With people increasingly using AI to develop applications, MIT Solve saw a twofold increase in the number of ideas people submitted last year compared to the year before. However, the organization used the same mechanisms to screen and select the best proposals. That meant staff members had to comb through more than 2,000 applications, culling incomplete or insufficient ones before presenting the rest to a team of judges to evaluate.
The researchers wanted to explore whether AI could streamline the sorting process while accurately assessing applications as well as humans could, so they studied MIT Solve’s call for proposals to address global health equity challenges. The researchers focused on the first stage in the evaluation process, where staff members decided whether a proposal was strong enough to pass along to expert judges. The team tested three conditions:
- Humans evaluating proposals without any help from AI.
- Humans and AI collaborating using “black box” pass-or-fail recommendations on proposals.
- Human and AI collaborations in which AI provided narrative recommendations describing the rationale for its decisions.
The research team used prompt engineering techniques to prime the model, providing the criteria to review proposals and examples of successful responses for each. The team calibrated the system to pass 40 percent of the ideas—which was in line with past challenges.
We thought that the experts would be better at corroborating and rejecting ideas, but it wasn’t true.
“We essentially prepared it to act as a screener by carefully designing its prompts and providing it with a few examples, so it could evaluate proposals much like a human screener would,” says Lane, a co-principal investigator of the Laboratory for Innovation Science at the Digital Data Design Institute at Harvard.
In evaluating ideas, the team considered objective factors, such as the technology that powered a solution, and more subjective factors, such as whether an application was complete, intelligible, and high quality. The researchers also tested two types of evaluators: experts affiliated with MIT Solve, including employees, financial sponsors, and reviewers, and those with no prior knowledge of the platform.
AI speeds the review but makes mistakes
The results showed:
Reviewers were more discerning when using AI
People failed proposals 9 percent more often, on average, with AI assistance than without. Expert and novice reviewers performed similarly when given AI assistance; they were equally likely to be persuaded by AI’s narratives.
“We thought that the experts would be better at corroborating and rejecting ideas, but it wasn’t true,” Lane says.
Objective criteria help AI evaluate many ideas
“For the objective criteria, humans and AI were pretty aligned in terms of whether they thought the solution should pass or fail,” Lane says. That alignment is good news, she says, showing that screening creative ideas based on objective criteria can be automated—especially when there are clear, quantitative metrics to verify whether the AI is making accurate decisions.
“It can be a really practical tool to help you with the process, especially when the number of ideas is likely to be cognitively overloading for humans to do it all alone,” Lane says.
Subjective criteria led to poor AI decisions
In evaluating proposals with more nuanced criteria, Lane and her colleagues found a large discrepancy between the judgments by humans working alone and those using AI. Yet people tended to doubt their initial judgments when AI disagreed with them and often deferred to the AI assessment.
We can’t delegate to AI or over-rely on it, at least not right now.
In participant interviews, the researchers found that sometimes people sided with AI even after questioning its decisions.
“They felt like something was off, and they couldn’t figure out what it was, but at the same time, they still went with AI for the subjective criteria,” says Lane. “To us, that was really unsettling.”
An AI-based rationale convinced people to give in
Overall, reviewers went with AI’s recommendations 12 percent more often when it provided justifications for its decisions than when AI made a pass or fail recommendation. But at times, AI swayed participants into agreeing with its decisions even when its reasoning was faulty. After all, AI systems can generate false information that appears convincing despite lacking a factual basis.
“We listen to stories. It’s part of what makes us human,” Lane explains. “When we collaborate with AI, however, we should do it carefully, relying on our own experiences, backgrounds, and expertise, and not just allowing AI to decide for us.”
In general, the results show that while generative AI might be helpful in quickly evaluating ideas based on objective criteria, effectively handling subjective decisions requires improvements not just in the technology itself but also in how people interact with it.
“We can’t delegate to AI or over-rely on it, at least not right now,” Lane says. “Subjective decisions often require nuance, intuition, and experience that only humans can bring to the table, so we need to use AI recommendations carefully. We have to think about what sorts of decisions it’s making and how it will influence the human role—and preserve human agency in the process.”
Image: Ariana Cohen-Halberstam with assets from AdobeStock.