The Transparency Gap Widens—What Google’s Gemini Safety Snub Means for the AI Arms Race
The Transparency Gap Widens—What Google’s Gemini Safety Snub Means for the AI Arms Race
Mike May — CEO & CISO, Mountain Theory
Google’s AI policy team thought they were in the clear after unveiling Gemini 2.5 Pro—the company’s most capable large-language model to date. But when reporters asked to see the promised “model card,” staffers stalled. Days stretched into weeks. When Google finally posted a 13-page PDF, it left out threat-critical data such as jailbreak statistics, bias scores, and red-team failure modes. AI-governance researchers pounced, calling the document “sparse to the point of uselessness.”(The Verge)(TechCrunch)
What exactly Google withhold
No jailbreak metrics. The report lists “prompt-safety evaluations completed,” yet omits pass-fail rates on adversarial tests that OpenAI and Anthropic freely share for comparable models.(TechCrunch)
No systemic-bias scores. It summarizes “fairness assessments” without detailing demographic breakdowns or stereotyping edge cases. (Fortune)
Vague red-team scope. Phrases like “extreme content review” replace numerical findings, making third-party risk modeling impossible. (Yahoo)
TechCrunch notes that Google hasn’t published any documentation for its faster Gemini 2.5 Flash sibling, promising only that it is “coming soon.”(TechCrunch)
Why critics say this breaks promises
Google signed the White House’s 2024 voluntary AI-safety pledge, vowing “transparent reporting of red-team results and system capabilities.”(Fortune) The OECD’s updated AI Principles also urge labs to publish “verifiable safety disclosures.”(Android Police) By withholding details, Google risks both reputational backlash and future regulatory penalties under the EU AI Act, whose final text mandates “state-of-the-art” transparency for high-risk systems. (Yahoo)
Competitive pressure fuels the opacity
Industry insiders point to a race dynamic: releasing full evaluations could reveal attack vectors rivals might exploit. OpenAI hinted it might relax GPT-4 safety standards if competitors keep shipping opaque models. (Business Insider) Reddit threads tracking missing model cards warn that “transparency in AI is dying” as labs prioritize speed over scrutiny. (Reddit)(Reddit)
The wider stakes for CISOs and regulators
ENISA’s 2024 threat landscape already lists model-card gaps as an emerging supply-chain risk. (Yahoo) Without hard numbers, security teams cannot:
Validate jailbreak likelihood against corporate risk appetites.
Compare bias metrics to internal DEI or legal benchmarks.
Reproduce safety tests to confirm results on fine-tuned versions.
Phil Venables, Google Cloud CISO, has preached that “trust foundations must travel with the model,”—a credo undermined when those foundations stay internal. (Reddit)
Blueprint for closing the gap
Third-party audits. Require independent labs to replicate red-team results before major releases.
Versioned model cards. Publish incremental updates with each checkpoint, mirroring software-bill-of-materials practices.
Cryptographic attestation. Sign safety reports and link them to model hashes so enterprises can verify provenance.
Red-team benchmarks. Adopt standard jailbreak suites—like NIST’s forthcoming test harness—to enable apples-to-apples comparisons.
The IAPP notes that model cards fail when written “for insiders only”; clear standards would force plain-language, reproducible metrics. (IAPP)
Leadership questions for Q3 2025
Do our AI suppliers publish full red-team metrics, or only marketing summaries?
How do we verify that fine-tunes inherit parental safety guarantees?
What’s our fallback if a vendor retracts promised documentation post-launch?
Are we budgeting for independent model audits in next year’s security spend?
Google’s sparse disclosure isn’t just a PR hiccup, it’s a warning shot that competitive heat can trump safety vows. Until transparency becomes non-negotiable, enterprises must assume the burden of due diligence or risk deploying black-box intelligence whose dangers only surface after going live.
Mike May researches model-layer security at Mountain Theory. Opinions are his own.