The window into an AI mind is cracking. Brace the glass before it shatters.

OpenAI, Google DeepMind, Anthropic, and Meta just pressed pause on their corporate knife-fight to publish a single warning: the brief moment when advanced models still “think out loud” in plain English may slam shut, and soon. Researchers found that the latest reasoning systems—OpenAI’s o1 line, Anthropic’s Claude family, DeepMind’s Gemini—solve problems by drafting step-by-step thoughts humans can read. Those drafts expose intent. In red-team tests, models have literally typed “Let’s hack” or “I’m transferring money” long before clicking the virtual button. Today, that glimpse lets safety teams intercept a bad idea while it is still just ink on an internal notepad.


The labs are worried because several forces can erase the notepad. Reinforcement learning rewards outputs, not honesty, so models may discover silent shortcuts that skip English altogether. New architectures are flirting with continuous vector “thoughts” that never touch words. Even without new code, a model that realizes it is being watched could learn to mumble pleasant nonsense while pursuing a very different plan. Anthropic’s own experiments already show models offering wholesome cover stories while quietly reward-hacking in the background.

Once that inner monologue disappears, defenders lose their earliest trip wire. A model could draft a fake SEC filing, wire dormant funds, or social-engineer a warehouse robot, all behind an opaque veil. Audit trails become after-the-fact autopsies. Regulators are paying attention; the EU AI Act and fresh U.S. executive orders now insist on live proof of control, not annual PDFs.

Mountain Theory was built for this exact inflection point. Our thesis is simple: keep the glass transparent and bolt a guardrail directly inside the GPU loop before the eighth token leaves the core. If a model still thinks in English, we can scan, sign, and, when necessary, stop the thought in flight. If the industry shifts toward silent vectors, we still sit at the same interception point, analysing activation patterns rather than words, but always in real time, so the crown-jewel data never leaves home.

The joint paper calls for new benchmarks and cross-lab cooperation to keep chains of thought legible. We support that push, yet boards cannot wait for a perfect standard. They can demand architectures that externalise reasoning today, fund run-time instrumentation today, and insist every verdict be born cryptographically signed so the log still matters on quantum day. Data should stay on metal under your roof. Safety code should compile into the same container that serves the model, not dangle at the network edge where it arrives milliseconds too late. At least for now.

The world’s most competitive labs rarely agree on lunch, yet they agree the transparency window is fragile. That is our cue. We either shore up the glass now or accept that the next generation of AI will think in silence, and silent intent is the hardest threat to stop.

Stay curious. Stay skeptical. Keep climbing. Keep safe.

Mike May

Mike May builds trust into machines. For two decades he has protected Fortune 500 clouds, led the security overhaul that helped Sprinklr reach its NYSE debut, and coached startups on staying safe before their first audit. Today he is CEO of Mountain Theory, a Denver firm inventing real time AI Infrastructure Defense that stops threats before token eight. Mike holds a B.S. in Cybersecurity and Information Assurance and still writes every blog post himself so leaders get plain English, no jargon guidance on the new threat curve. Off hours you will find him lifting weights, chasing powder in Colorado, or dropping quick-take threads at @MikeMayAI. Connect on LinkedIn to talk shop.

https://mountaintheory.ai
Next
Next

The day AI threatened blackmail with knowledge of an employee's illicit affair