In June, Nataliya Kosmyna, Ph.D., a research scientist at the Massachusetts Institute of Technology and a visiting faculty researcher at Google, published a study indicating that using large language models (LLMs) does not enhance human performance.
The paper, titled “Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Tasks,” was released in June on a website hosted by Cornell University.
This study investigates potential changes in learning skills and the cognitive costs associated with using an LLM in the context of essay writing.
According to the researchers, the study serves as a preliminary guide to better understand the cognitive and practical impacts of artificial intelligence on learning environments.
Participants were recruited from several prominent academic institutions located near one another. For future research, the team plans to include a larger and more diverse group of participants, spanning various fields and age groups. The study utilized ChatGPT.
In a related study, “Artificial Intelligence and Dichotomania,” researchers advised academic professionals to exercise caution when using LLMs as research tools and to prioritize human judgment and decision-making.
The study, conducted by Blakeley McShane, David Gal, and Adam Duhachek, examines whether LLMs are prone to a common human error in interpreting statistical results.
The findings suggest that LLMs, such as OpenAI’s ChatGPT, Google’s Gemini, and Anthropic’s Claude, are increasingly employed for various tasks, including medical diagnoses, coding, conducting experiments and visualizing results, writing and evaluating research papers, developing recommendation systems, forecasting time series, summarizing research papers, providing feedback, conducting literature reviews, and creating structured science summaries.
The study reveals that these LLMs are affected by “dichotomana,” a term that refers to the tendency to categorize statistical results as either statistically significant or insignificant.
The researchers found that LLMs demonstrated this bias at commonly used significance thresholds of 0.005 and 0.10. Furthermore, they noted that applying prompt engineering techniques based on the American Statistical Association’s guidelines for statistical significance and P-values, which are intended to correct human errors, may not resolve this issue and could potentially exacerbate it.
The researchers recommend that while prompt engineering may be somewhat more resistant to dichotomana than human judgment, it still requires refinement.
They conclude that to enhance performance, researchers might need to explore new approaches to artificial intelligence that differ significantly from current LLMs.
Until such advancements are made, they advise academic researchers to proceed with caution when utilizing LLMs as substitutes for human decision-making and judgment.
, .
Check our paper: "Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task" : https://t.co/28T4XnBlnj pic.twitter.com/MlPLYG5mct
— Nataliya Kosmyna, Ph.D (@nataliyakosmyna) June 16, 2025
The U.S. Army Corps of Engineers has been tasked with…
Brown and Caldwell, a leading environmental engineering and construction firm,…
Humboldt State University, one of four campuses within the California…