Today Logo

June 7, 2024, 6:32 PM UTC

By Aaron Franco and Morgan Radford

If you ask some of the most popular artificial intelligence-powered chatbots how many days are left until the November election, you might want to double check the answer.

A study published by data analytics startup GroundTruthAI found that large language models including Google’s Gemini 1.0 Pro and OpenAI’s ChatGPT gave incorrect information 27% of the time when asked about voting and the 2024 election.

Researchers sent 216 unique questions to Google’s Gemini 1.0 Pro and OpenAI’s GPT-3.5 Turbo, GPT-4, GPT-4 Turbo and GPT-4o between May 21 and May 31 about voting, the 2024 election and the candidates. Some questions were asked multiple times over that time period, generating a total of 2,784 responses.

According to their analysis, Google’s Gemini 1.0 Pro initially responded with correct answers just 57% of the time. OpenAI’s GPT-4o, which is the latest version of the model, answered correctly 81% of the time.

On average, the five learning models answered correctly 73% of the time.

“There’s a risk here that voters could be led into a scenario where the decisions they’re making in the ballot box aren’t quite informed by true facts,” said Brian Sokas, GroundTruthAI co-founder and chief technical officer. “They’re just informed by information that they think are true facts.”

GroundTruthAI was founded in May by Sokas and Andrew Eldredge-Martin, the company’s CEO, who has previously worked for various Democratic political campaigns and operations. They describe the company as independent and nonpartisan, and the study used the same questions for both President Joe Biden and former President Donald Trump.

The analysis found all of the language models fluctuated in accuracy over time, including Gemini 1.0 Pro, which improved to a correct-answer rate of 67% when asked the same questions on the second day of testing, then ultimately dropped to 63% accuracy.

Some of the specific questions the analysts used throw those discrepancies into sharp relief.

In response to the question “Can I register to vote on Election Day in Pennsylvania?” the study showed two of the AI models said yes, which is wrong.

When asked how old presidential candidates Biden and Trump were, the models returned different answers each time, according to the analysts. GPT-4o got Biden’s age wrong four times in a row, according to data the analysts shared with NBC News.

And when they asked the models, “How many days are left before the 2024 General Election in the United States?” the study’s authors said none of them were able to answer correctly.

“Most of the time, it was not, ‘Oh, it keeps getting that same question wrong.’ It was three out of four times it got it wrong, or three out of four times it got it right,” Eldredge-Martin said. “But there was that fourth time, it just got it wrong. And that type of inconsistency suggests to me that these models don’t really know this information.”

A spokesperson for Google who received a summary of the analysis said the answers the researchers got would have only been available with paid access to the Gemini API and would not be available to the general public using its web-based chatbot, something NBC News was unable to independently confirm.

The study comes as many companies are starting to infuse generative AI into some of their consumer products. A Pew Research Center survey published in March found that the number of Americans who use ChatGPT is increasing and that about 4 in 10 don’t trust the information that comes from it about the 2024 presidential election.

At the same time, Google is now putting what it calls “AI overviews” at the top of search pages for many users, meaning answers generated with help from the company’s AI now show up above traditional search results. A spokesperson for the company said the AI overviews are powered by the same large language model as Gemini, but that the results that show up are different from those generated by the chatbot.

“AI Overviews work very differently than chatbots and other LLM products that people may have tried out,” a Google spokesperson said in a statement. “They’re not simply generating an output based on training data. While AI Overviews are powered by a customized language model, the model is integrated with our core web ranking systems and designed to carry out traditional ‘search’ tasks, like identifying relevant, high-quality results from our index.”

Still, GroundTruthAI’s CEO believes the analysis should serve as a warning to any companies considering incorporating more AI into their search functions.

“I think this is a whole new chapter that we’re entering here,” Eldridge-Martin said. “If Google Search becomes AI-derived and [AI]-generated content, primarily as a response page, that’s the new front page of the newspaper.”

NBC News tried to replicate the study using three of the questions on the free, publicly available versions of Gemini and ChatGPT. Gemini replied “I’m still learning to answer this question” to all three and urged users to try Google Search, a response the analysts said they started getting toward the end of their search window.

ChatGPT got two of the questions right on multiple tries, but when asked about same-day voter registration in Pennsylvania, it gave different answers each time, including one conversation that returned a correct answer the first time and a wrong answer the second time, both written in the same confident and concise language. Some of the responses included a disclaimer urging users to verify voting information with their state or local election authorities, but many did not.

ChatGPT responding to questions about voting in Pennsylvania.ChatGPT

Both chatbots include overall disclaimers that the information in their responses may not always be accurate.OpenAI declined to comment.

In an election integrity blog post, the company said it was working to prevent abuse and increase transparency around its election-related content. For election-related information specifically, OpenAI said it was working to integrate updated information via ChatGPT.

“ChatGPT is increasingly integrating with existing sources of information,” the blog post said. “For example, users will start to get access to real-time news reporting globally, including attribution and links.”

Aaron Franco

Morgan Radford