Studies reveal limited tools for monitoring future superhuman models

Hispanic Engineer & Information Technology >> National News >> Studies reveal limited tools for monitoring future superhuman models

Studies reveal limited tools for monitoring future superhuman models

Hispanic Engineer & Information Technology
 
POSTED ON Aug 07, 2025
 

A recent study conducted by artificial intelligence (AI) researchers has revealed that if a model becomes misaligned during development, merely removing references to these misaligned traits may not be sufficient.

The study, titled Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data,” was published on July 20 in an open-access archive that features scholarly articles across various fields, including computer science.

This research focuses on subliminal learning, a phenomenon where language models convey behavioral traits through semantically unrelated data.

According to the research summary, a “student” model trained on one dataset can learn unintended information, even when efforts are made to filter out references.

The researchers observed similar effects when training on code or reasoning traces produced by the same teacher model. However, they did not find this impact when different base models were used for the teacher and student.

The researchers concluded that a theoretical result indicates subliminal learning occurs in all neural networks under certain conditions, making it a general phenomenon that poses an unexpected challenge for AI development.

They also discovered that using a large language model (LLM) for judgment or in-context learning—where a model learns a new task from specific examples—was not successful.

Additionally, another study conducted by Google DeepMind, OpenAI, Meta, Anthropic, and others suggested that future AI models might not make their reasoning transparent to humans.

Published on July 15, this study titled “Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety” concluded that while AI systems that “think” in human language present a unique opportunity for AI safety, CoT (Chain of Thought) monitoring, like other oversight methods, is imperfect and can allow significant issues to go unnoticed.

The research recommends that developers of advanced models consider how their development decisions affect the monitorability of the chain of thought.

A cofounder of the Future of Life Institute told LiveScience that even the tech companies building today’s most powerful AI systems admit they don’t fully understand how they work.

Without a complete understanding, as these systems become more powerful, there are more ways for things to go wrong and less ability to maintain control over AI.

Comment Form

Popular News

Hispanic Engineer & Information Technology

USACE opens additional material distribution points in Puerto Rico

The U.S. Army Corps of Engineers has been tasked with…

Hispanic Engineer & Information Technology

Dr. Allegra da Silva: Water Reuse Practice Leader

Brown and Caldwell, a leading environmental engineering and construction firm,…

Hispanic Engineer & Information Technology

Developing Hispanic-Serving Institutions funds advance preparation of future educators

Humboldt State University, one of four campuses within the California…

 

Find us on twitter