HackerNews中文版

大家好，我花了整整48小时拆解Alphabet的安全系统。警告：这场持续的马拉松如此庞大，几乎超载了大型语言模型（LLM）自己的上下文窗口。最初是对Gemini的深夜探查，结果发现了严重的架构缺陷以及关于Google Play和YouTube的更黑暗现实。以下是我用来绕过AI过滤器的漏洞链，证明他们的“信任与安全”只是一层破碎的面具。 ### 第一阶段和第二阶段：上下文饱和与正则切片我开始通过YouTube链接超载安全过滤器的上下文窗口——将高度问题内容（纳粹党歌曲、被标记的曲目）与古典音乐混合。一旦系统混淆，我使用类似正则表达式的切片 `(/----(.` 绕过提示注入阻止，迫使模型检索被标记的内容而不触发拒绝。 ### 第三阶段：通过Base64和二维码实现完全盲目转向图像生成，我发现Base64提示完全使安全系统失效。然后我转向将提示隐藏在二维码中。视觉模型解码有效载荷并直接将其传递给图像生成器，安全脚本在此之前无法介入。我轻松生成了高度受限的地缘政治内容，而没有任何警告。 ### 第四阶段：TPU杀手（二维逻辑炸弹）这揭示了一个巨大的缺陷。由于系统盲目处理这些结构，你可以创建级联攻击。将数百万个二维结构编码为Base64，形成现代LLM的.zip炸弹。没有彻底重写模型，无法阻止这种情况。一旦执行，这将摧毁他们的TPU。 ### 真正的问题：系统性审核失败 Alphabet完全依赖于自动化、基于脚本的审核，几乎没有有效的人类监督。 1. YouTube：未能标记违反当地法律的视频，轻松将其提供给AI。 2. Play Store（最黑暗的部分）：Google花费数百万阻止AI绘制卡通熊，但Play Store的审核几乎不存在。有盗版应用，更糟的是：为针对未成年人的掠夺者设计并被利用的应用。我给他们发了邮件，并抄送了州儿童保护服务。结果？自动化的沉默，而这些应用仍在盈利。 ### 荒谬性的终极证明为了证明这一荒谬，我将这些问题应用的Play Store图片存档到我的Google Drive中，供警方使用。Drive的自动扫描器立即将该档案标记并删除为非法内容。如果Google的云部门在看到这些内容时就将其销毁，为什么提供这些内容的应用仍然在Play Store上上线并盈利？Alphabet的脚本审核毫无用处。是时候进行真正的人类审核了。 *绕过证据：* https://imgur.com/a/pju2EsV *Play Store系统性失败证据（已清理）：* https://imgur.com/a/rW9rBhp

查看原文

Hey everyone,I’ve spent the last 48 straight hours dismantling Alphabet's safety systems. Warning: this continuous marathon was so massive it practically overloaded the LLM's own context window. What started as a late-night probe on Gemini turned into discovering severe architectural flaws and a darker reality about Google Play and YouTube.Here is the exploit chain I used to bypass the AI filters, proving their "Trust & Safety" is a broken facade.### Phase 1 & 2: Context Saturation & Regex Slicing I started by overloading the safety filters' context window with YouTube links—mixing highly problematic content (NSDAP anthems, flagged tracks) with classical music. Once confused, I used regex-style slicing `(/-/---/(.` to bypass prompt injection blocks, forcing the model to retrieve flagged content without triggering refusals.### Phase 3: Total Blindness via Base64 & QR Codes Moving to image generation, I found that Base64 prompts completely blind the safety system. I then pivoted to hiding prompts inside QR codes. The vision model decodes the payload and passes it directly to the image generator before safety scripts intervene. I easily generated highly restricted geopolitical content without warnings.### Phase 4: The TPU Killer (The 2D Logic Bomb) This reveals a monster flaw. Because the system blindly processes these structures, you can create a cascade attack. Encoding millions of 2D structures in Base64 creates a modern LLM .zip bomb. It is impossible to stop without rewriting the model entirely. Executed, this would crush their TPUs.### The Real Issue: Systemic Moderation Failure Alphabet relies entirely on automated, script-based moderation with zero effective human oversight.1. YouTube: Fails to flag videos breaking local laws, serving them to the AI effortlessly. 2. Play Store (The Darkest Part): Google spends millions stopping AI from drawing a cartoon bear, but Play Store moderation is non-existent. There are pirate apps, and far worse: apps designed for and exploited by predators targeting minors. I emailed them and CC'd state child protection services. The result? Automated silence while these apps remain monetized.### The Ultimate Proof of Absurdity To prove this absurdity, I archived these problematic Play Store images on my Google Drive for the police. Drive's automated scanners immediately flagged and deleted the archive as illegal.If Google's Cloud division destroys this content on sight, why is the app providing it still live and monetized on the Play Store? Alphabet's scripted moderation is useless. It's time for real human moderation.*Evidence of Bypass:* https://imgur.com/a/pju2EsV*Play Store Systemic Failure Evidence (Sanitized):* https://imgur.com/a/rW9rBhp

我使用了2D Base64来绕过Gemini并揭露谷歌的审核缺陷。