Yuxuan Jiang

347 posts

Yuxuan Jiang

@_MattJiang_

CSE PhDing @UMich | Scaling Trust in ML Infrastructure Alongside Compute | @ZJU_China @UofIllinois @MSFTResearch

Mars Katılım Ocak 2022

1.2K Takip Edilen907 Takipçiler

Yuxuan Jiang retweetledi

Chayenne Zhao@GenAI_is_real·17h

In the Age of Agents, an Engineer's Most Valuable Skill Is Saying "No" I gave a talk at Snowflake recently, sharing what I've learned about agent coding over the past two years of building SGLang's inference engine, Omni multimodal serving, and AI agent workflows. The response far exceeded my expectations — it was the first time so many people asked for the slides afterward. Probably because I deliberately avoided the hardcore technical deep-dives, and instead spent the time on one thing: explaining just how many ways AI Agents can go terrifyingly wrong when maintaining real-world projects. 😂 Slides are fragments. I wanted to reorganize these thoughts into something coherent — threading together ideas scattered across different projects into a single narrative. Starting from my own engineering practice, I want to articulate what "engineering judgment" actually means in the era of agent coding. I. Standing at the Intersection of Infra and Agent Worlds Some background first. I'm a core developer of SGLang, one of the most widely deployed open-source inference engines in the world — 25K+ GitHub stars, running on over 400K GPUs. I currently lead two areas: SGLang RL Rollout (high-performance rollout infrastructure for RLHF) and SGLang Omni (multimodal and TTS model serving). At the same time, I'm a heavy user of Claude Code, and I make no attempt to hide it. SGLang Omni's latest benchmark infrastructure — thousands of lines of production-grade code — was essentially executed line by line by Claude Code from our system design specs. We have a team of about ten, responsible for defining architecture, setting thresholds, planning file paths, and designing test matrices. AI delivers in dozens of hours. Believe it or not, I rarely write implementation-level code myself anymore. This isn't a prediction about the future. This is my daily reality. But precisely because I stand at the intersection of inference engine developer and heavy AI coding user, my understanding of agent coding is probably different from most people's intuition. Most people see "AI can write code now, amazing!" What I see are three seriously overlooked hazards — is what AI writes actually correct? What should the system architecture look like? And is the token cost behind all of this actually worth it? This article follows these three questions. Starting with the first: how do you know if what AI wrote is actually correct? II. Effort Without Measurement Is Self-Deception Near the end of my undergraduate years, I was doing research on intent alignment. During a conversation with a mentor I deeply respect, he systematically laid out his vision for alignment, and one core step stuck with me — building real and effective benchmarking for alignment. His point was roughly: if we can't even measure whether alignment has been achieved, then all alignment work is building castles in the air. Years later, having done agent research, inference, and RL infra — having stepped on countless landmines — that simple truth only weighs more. And I've found, regrettably, that modern benchmarks haven't kept up. They've fallen far behind the pace of the field. The agent space is especially bad. Every few days there's a new demo — it can control browsers, rewrite compilers, supposedly put all CUDA engineers out of work. But press further: how do you measure if it's actually good? The answer is usually a few cherry-picked cases or a carefully edited video. On Xinzhiyuan (a prominent Chinese AI outlet), human engineers have been "replaced by AI" a thousand times over. Yet the top Cutlass engineers are still sitting in their offices, drawing high salaries, writing the kernels that actually run in production. So in my own projects, benchmark has been the highest priority from day one. Bar none. I felt this most acutely building how-to-sglang — a multi-agent system for helping users understand SGLang code and answering community questions. The temptations were enormous at the start: add RAG, connect more data sources, build multi-turn conversation, try fancy agent debating. The feature list could stretch to the ceiling. But the first thing I did was build an LLM-as-a-Judge evaluation framework. Before adding any feature, answer one basic question: does your change actually make the agent more accurate? The result: most seemingly promising optimizations showed zero improvement in testing. Without that benchmark, every decision was blind guessing — we thought we were improving, but we weren't. Building SGLang Omni's benchmark was the same story. Before I took over: an optimization PR gets merged, TPS numbers look good, everyone's happy. A while later accuracy drops, nobody can tell which commit caused the regression, and painful bisecting begins. My first act: stop all development, build accuracy and performance CI first, then talk about optimization. Final results — S2 Pro WER 1.18% (excluding bad cases), Qwen3 Omni 1.91% without voice clone, 1.88% with voice clone. Acceptance criteria ±0.1%, all passing. At least inference system evaluation is objective — if the number is higher, it's higher. No room for debate. Unlike agent evaluation, which is riddled with subjective judgment and fuzzy definitions. That certainty is precious. Effort without measurement isn't effort. It's self-deception. Benchmarking solves the "how do you know it's correct" problem. But there's an even more upstream question: who writes the benchmark framework itself? In my case, AI wrote it — but that's only half the answer. III. The Prompt Itself Is the System Design When I say Omni's benchmark refactor — thousands of lines — was mostly written by AI, that's not bragging. It's fact. Writing pytest fixtures, constructing subprocess calls, parsing JSON results, generating CI workflows — AI did it fast and well. But there's a detail that's easy to miss: that prompt itself was my system design. The most critical decision in the entire refactor was task × model orthogonal separation. The old version was a 722-line monolithic script, benchmark_tts_speed.py, with all model and task logic coupled together. After refactoring: tasks/, metrics/, dataset/, benchmarker/, eval/ — five modules. Why this decomposition? Because I knew a series of new models would be joining. Without model-agnostic abstraction, every new model means rewriting the evaluation framework. But you can't over-abstract either — Omni models differ far more than LLMs do. S2 Pro uses a Dual-AR codec architecture; Qwen3 Omni uses a 9-stage multi-process pipeline. Evaluation logic can't be fully unified. The task × model orthogonal separation is the balance point between reuse and flexibility. Ask AI directly to "refactor these 722 lines" and it'll give you a decomposition. But getting the granularity exactly right depends on our judgment about the project's future — what models are coming, what dimensions will change, what's worth abstracting and what isn't. This context is fuzzy, dynamic, full of probabilistic judgment. You can't fully distill it into a prompt. AI gives you a decomposition. System design gives you the right decomposition. Code is flesh. Architecture is skeleton. In an era where AI can write ten thousand lines a day, right architecture means ten thousand lines of asset; wrong architecture means ten thousand lines of debt. And AI simultaneously amplifies the cost of wrong directions — it can turn one piece of tech debt into an entire debt empire at a speed you can't imagine. Saying "system design matters" is empty talk. Let's look at some concrete cases where AI went wrong. IV. Where AI Actually Fails Where exactly did Claude fail during the Omni benchmark refactor? A few representative examples. First category: blind spots in engineering conventions. Claude used gdown to download datasets from Google Drive — fine for a side project, but a ticking time bomb in SGLang's CI. Google Drive rate-limits, 403s, confirm tokens — our main repo has been burned too many times by unstable external download sources. The correct approach: host datasets on HuggingFace, use snapshot_download. Similar issues: dataset fixtures hardcoded to /tmp/ (path conflicts in concurrent jobs), server teardown with only SIGTERM and no SIGKILL fallback, JSON key access without schema validation. Each of these is individually "common sense," but what counts as common sense depends on which environment you work in. AI's common sense comes from the statistical distribution of internet corpora, not from the specific failure history of a particular team. Second category: CI threshold design. Claude set the TPS threshold at 55 tok/s, with observed values of 85-87 — over 35% margin. This threshold catches catastrophic regression (88→28), but performance silently sliding from 87 to 60 wouldn't trigger any alarm. I looked at four measurements repeatedly — 85.8, 85.9, 86.9, 87.1 — standard deviation roughly 0.6. Final threshold: 80, all metrics standardized to 13-15% margin. The core of this decision isn't arithmetic — it's having a feel for this specific system's run-to-run variance, knowing what margin is "tight enough to catch chronic degradation but loose enough to avoid flakiness." Anyone who's done CI knows: threshold design is a systems engineering problem, not a math problem. These aren't edge cases. They're systematic. AI writes fast, but between "writing fast" and "writing correctly" lies an entire engineering environment's worth of distance. Everything above concerns AI coding's limitations in the "writing correct code" dimension. Next, I want to zoom out — not just whether the code is correct, but whether the tokens consumed behind it are actually worth the cost. V. The Token Efficiency Crisis: Using a Fire Hose to Water Flowers As an inference engine developer, my daily work is thinking about how to maximize prefix cache hit rates, optimize KV cache memory layouts, and minimize the cost of each inference request. So when I connected Claude Code to a local inference engine and observed its actual request patterns — how to put this — it felt like a water conservation engineer who carefully designed a reclamation system, watching someone water flowers with a fire hose. Cache hit rate was devastating. Not "decent but room for improvement" — "the prefix cache mechanism we carefully designed at the inference engine level was almost completely destroyed." A single user query triggers multiple low-value tool calls, each carrying over 100K tokens of context window. The Resume feature breaks KV cache hits entirely — an almost absurd bug. The entire session's context construction was never seriously designed for cache reuse from the start. I like the RAM bloat analogy. In 1969, 64KB of memory sent Apollo to the moon. In 2026, opening a web page costs 500MB, easy. Each generation of hardware engineers pushes memory capacity higher; each generation of software engineers gleefully fills it up. We've gotten used to this cycle. But LLM inference is different. RAM bloat costs you a slightly slower computer and a couple hundred bucks for an upgrade. Token bloat costs real money — GPU cluster electricity, user subscriptions — and scales exponentially with agent adoption. GPU compute supply elasticity is far lower than DRAM supply elasticity. When compute is constrained, token efficiency isn't "nice to have." It's the core competitiveness that determines who survives. I have a bold hypothesis: for those sessions consuming 700K tokens, there must be ways to accomplish the exact same task with 10% of the tokens. Not by sacrificing quality — through smarter context compression, better prefix reuse strategies, more precise tool call scheduling. Anyone who has optimized inference engines, seeing current agent framework request patterns, would reach a similar conclusion. "Reducing wasteful token spending" isn't a defensive optimization. It's an offensive capability. Whoever first achieves an order-of-magnitude reduction in token consumption at the same quality level can serve ten times the users on the same compute budget. But is the root cause of token waste merely sloppy agent framework design? The more I think about it, the more I believe the deeper issue is architectural. VI. Agent and Inference Engine: The Missing Co-Design The current architecture works like this: agent frameworks treat inference engines as stateless API calls, carrying full context with every request. Inference engines do their best at prefix matching, caching what they can. Fully decoupled. Zero coordination. Simple, general-purpose, but brutally inefficient for long sessions. My vision: if agent frameworks could sense the inference engine's cache state and proactively construct cache-friendly requests; if inference engines could understand the agent's session semantics and make smarter cache eviction decisions — once this information channel between the two opens, the potential for token efficiency gains is enormous. This requires three parties to sit down together: model builders, inference engine builders, and agent framework builders. Right now, we're nowhere close. Maybe the market ultimately decides "compute gets cheap enough, waste doesn't matter," just like the RAM story. But I don't believe the token economy will follow the same path. Not in the near term. The age of agents doesn't belong to those who burn the most compute. It belongs to those who use it most intelligently. Having covered the token problem from an inference engine perspective, I want to turn the lens back to agents themselves. In the preceding sections I've been criticizing agents — code isn't correct, tokens are wasted, no coordination with inference engines. But let's flip the question: what's the actual moat for agent builders? VII. The Agent Moat Paradox I've found a fascinating paradox in the agent space. Individual techniques are trivially simple to implement. Agent Debating — the so-called "core moat" of many multi-agent systems — doesn't even come close in implementation difficulty to MLA (DeepSeek's significant breakthrough starting with V2). The barrier to entry is nearly zero. But the verification system is impossibly complex. The first step of any empirical research is building the right benchmark. Inference benchmarks are mature — TTFT, TBT, Throughput. These objective metrics were being used by database engineers decades ago, just under different names. But agent evaluation is riddled with subjective judgment and fuzzy definitions. OpenClaw's benchmark is nothing like a vibe coding benchmark. The complexity of verification far exceeds the complexity of implementation. Then there's the explosion of the strategy combination space. SGLang has over a hundred server args. Finding the optimal combination for specific hardware and workload is enormously complex. Same for agents: individual strategies are simple, but finding the optimal combination under real-world constraints — that's the real core capability. A top engineer who deeply understands the system derives their value not from implementing any single strategy, but from having a sense for the optimal direction within a complex strategy space. There's a question I still haven't resolved. Inference and training system strategy optimization typically has clear trade-offs — enabling partial rollout makes it hard to avoid off-policy effects. But do agent strategies have trade-offs against each other? Does turning everything on always produce the best agent? In my own optimization of how-to-sglang, I found most strategies are highly invasive — including human-in-the-loop, including circular debating. This makes me suspect the combination problem is far more complex than we imagine. Behind the moat paradox hides another question: if individual agent techniques are this simple to implement, and AI can write code at terrifying speed — what happens when AI starts writing code for itself, expanding its own capabilities? VIII. Code Bloat: The Terrifying Speed of AI Self-Evolution Look at OpenClaw's codebase and you'll find something eerie. Early last month: roughly 400K lines. One month later: approaching 1 million. 500+ commits per day. AI agents fully controlling and deeply participating in their own development, with no one able to truly review what's happening. Someone even built a repo called nanobot, claiming to replicate the core functionality in 4,000 lines — 99% smaller. From the perspective of a large-scale software maintainer, this is terrifying. Rapid growth with zero comprehensibility, entropy increasing at horrifying efficiency. I later exchanged messages with OpenClaw's maintainer Peter Steinberger on GitHub. His maintenance quality and enthusiasm impressed me — OpenClaw hasn't fallen into fully unsupervised AI self-maintenance. But the question remains: to what extent can we maintain a clean agent system that handles most functionality while avoiding malignant code bloat, keeping us with the ability to actually debug? AI excels at local optimization — writing functions, fixing bugs, adding features. No problem. But "keeping a system simple" isn't a local problem. It requires a kind of global restraint — being able to say "this, we don't add," and meaning it genuinely, not because some rule says so. That restraint may be the last thing humans contribute to software engineering. Of course, maybe I'm overthinking it. Maybe next-generation models really will have "taste," like many of the top engineers I know — maybe they'll understand that the best code is often the code that was never written. Speaking of "taste" and "restraint," the various new concepts recently trending in our circles are a perfect counter-example. IX. Old Wine in New Bottles — and Real Engineering Lessons I recently read a lengthy essay on harness engineering, tens of thousands of words. My first reaction wasn't "what an impressive concept" but "do these people have any ideas beyond coining new terms for old concepts?" Prompt engineering → Context engineering → Harness engineering → next month probably scaffold engineering or orchestration engineering. It's all the same thing: designing the environment in which your model operates — what information it receives, what tools it uses, how errors are intercepted, how cross-session memory is managed. This has existed since the day ChatGPT launched. It doesn't become a new discipline just because someone gives it a new name. Complaints aside, the lessons I learned from how-to-sglang are real, and they overlap heavily with the research those articles cite. Less information, more precision. Our first approach was one giant agent stuffed with all of SGLang's docs, code, and cookbooks, answering everything. Of course it didn't work — the context window isn't RAM. The more you stuff in, the more attention dilutes, the worse the answers get. We ended up with a multi-tier sub-domain expert architecture: one expert agent per subdomain, an Expert Debating Manager to receive questions, decompose sub-problems, and consult the Expert Routing Table to activate the right agents. This improvement delivered more gains than upgrading to a stronger model. The repo is the single source of truth. All expert agent knowledge comes from markdown files within the repo. No external docs, no verbal agreements. We initially felt the urge to write one massive sglang-maintain.md covering everything — quickly found it didn't work. OpenAI's Codex team hit the same wall: they tried one giant AGENTS.md to rule them all, and it predictably rotted fast. Expired documentation doesn't just go unread — it actively misleads agents. Structured routing, not guessing. The Expert Routing Table explicitly maps question types to agents. A question about GLM-5 INT4 simultaneously activates the Cookbook Domain Expert and Quantization Domain Expert. Not guessing by the Manager — guided by an index. None of these lessons are new. Separation of concerns, single responsibility, docs-as-code, shifting constraints left — traditional software engineering principles. It's just that now we're designing working environments for LLMs, so some people feel the need for a new name. They don't. The first nine sections have mainly covered the "software" side. To close, I want to discuss two harder topics that I keep running into — one about hardware, one about abstraction. X. GPU-Only Debugging, and the Cost of Premature Abstraction First: the debugging cost of ML infrastructure. This domain has a brutal reality — you simply cannot debug on CPU. The bugs that actually matter — CUDA Graph capture failures, multi-stream race conditions, FP16/BF16 numerical divergence, KV cache memory fragmentation at production batch sizes — only manifest on GPUs, at scale, with real kernels running. AI can help you write a CUDA wrapper, but it can't reproduce the graph capture failure that only appears on H100 with 3 concurrent requests at a specific memory layout. ML infra debugging requires hardware intuition — understanding how GPUs actually behave, not just how the code reads. This is the domain AI coding struggles most to reach. Second: the premature abstraction trap. This problem has gotten worse in the agent era. Previously, over-abstraction at least took time to write — three wrapper layers around a function called once, a config system managing three parameters, architecture diagrams drawn before problem boundaries are understood. Now with AI, these things arrive in minutes. But the cognitive debt they leave behind hasn't decreased at all. Premature abstraction isn't just useless — it's actively harmful, increasing the cognitive load for every person who comes after. And cognitive load is the most hidden, most lethal kind of engineering cost. It's not that abstraction is wrong. The timing is wrong. AI makes us write code ten times faster, but also makes us accumulate cognitive debt ten times faster. GPU debugging tests hardware intuition. Premature abstraction tests restraint. At their core, they test the same thing. Closing: Engineering Sense Is Sorting Looking back at this entire article, I've really been saying one thing. An engineer's most valuable ability isn't building complex things. It's looking at a pile of things that all seem worth doing, and identifying which ones actually matter. Writing code is addition. Engineering sense is sorting. You need to be able to face a cool optimization idea and say "not now — get the benchmark solid first." Face an elegant abstraction and say "delete it, we don't need this yet." When everyone is stacking features, say "stop — let's first confirm what we're actually optimizing." This judgment doesn't come from books. It's the muscle memory left behind after crawling out of one specific pit after another. From a mentor's lesson about benchmarking, to choosing to build evaluation first when building agents, to building benchmark infrastructure for Omni, to observing Claude Code's token waste, to thinking about the nature of agent moats — the same insight, evolved from "that makes sense" to instinct. In an era where AI can write ten thousand lines of code a day, execution is depreciating fast. But system design has never been more important — because AI simultaneously amplifies the cost of going in the wrong direction. The age of agents doesn't belong to those who burn the most compute, or write code the fastest, or coin the most new terms. It belongs to those who know what not to build.

English

111

6.6K

Yuxuan Jiang retweetledi

Junyang Lin@JustinLin610·4d

we need agent evals that are really consistent with real world usages. otherwise people are optimizing foundation models for the wrong direction. the problem of targeting is even bigger than benchmaxxing.

English

231

22.9K

Yuxuan Jiang retweetledi

이재명@Jaemyung_Lee·6d

이게 사실인지, 사실이라면 어떤 조치가 있었는지 알아봐야겠습니다. 우리가 문제삼는 위안부 강제, 유태인 학살이나 전시 살해는 다를 바가 없습니다.

Jvnior@Jvnior

🚨🇮🇱 LIVE FOOTAGE: IDF soldiers tortured a Palestinian kid and threw him off a roof. They call themselves “the most moral army.”

한국어

3.9K

15.8K

65.2K

11.5M

Yuxuan Jiang retweetledi

Hassan Hayat 🔥@TheSeaMouse·23 Mar

Codex laughs at your petty guardrails

English

291

6.2K

334.5K

Yuxuan Jiang@_MattJiang_·28 Şub

@wenhaocha1 The integration with IM 😹

English

203

Wenhao Chai@wenhaocha1·28 Şub

Maybe I’m missing something, but what’s the advantage of using OpenClaw over a well-structured Claude Code? Would love to hear the reasoning.

English

Yuxuan Jiang retweetledi

Lucas Beyer (bl16)@giffmana·28 Şub

hmm, I somehow feel like upgrading to Claude Max 20x plan today.

English

1.5K

58.3K

Yuxuan Jiang retweetledi

Anthropic@AnthropicAI·28 Şub

A statement on the comments from Secretary of War Pete Hegseth. anthropic.com/news/statement…

English

2.9K

6.6K

42.6K

17.7M

Yuxuan Jiang retweetledi

josh@eudaemonea·28 Şub

@SecWar "Anthropic’s stance is fundamentally incompatible with American principles" the stance:

English

117

578

8.7K

382.2K

Yuxuan Jiang retweetledi

Ilya Sutskever@ilyasut·27 Şub

It’s extremely good that Anthropic has not backed down, and it’s siginficant that OpenAI has taken a similar stance. In the future, there will be much more challenging situations of this nature, and it will be critical for the relevant leaders to rise up to the occasion, for fierce competitors to put their differences aside. Good to see that happen today.

English

1.4K

2.5K

25.6K

Yuxuan Jiang retweetledi

CALL TO ACTIVISM@CalltoActivism·28 Oca

Everytime I see Trump demean hecklers, I am reminded how President Obama handled interruptions. I guess that’s one of the reasons Obama earned the Nobel prize.

CALL TO ACTIVISM@CalltoActivism

🚨HUGE: Trump’s rally in “deep red” Iowa is interrupted by protestors: He sneers them as “paid agitators” and “paid insurrectionists… sickos.” That’s authoritarian panic: to smear dissent. The truth is Americans all over the country are PISSED.

English

326

34.1K

1.2M

Yuxuan Jiang retweetledi

国际特赦组织中文@amnestychinese·27 Oca

在美国总统特朗普第二任期执政满一周年之际，美国呈现明确的威权主义和人权侵犯模式。国际特赦组织数十年来已在全球许多国家反复记录到这一模式。 🔗阅读更多：amn.st/6015Ch7Yx

中文

380

37.4K

Yuxuan Jiang retweetledi

JB@JasonBotterill·26 Oca

why does being pro-AI feel like a right-wing stance now

English

1.3K

115

5.5K

4.3M

Yuxuan Jiang retweetledi

Jared Shult@jared_shult·25 Oca

I hope the next viral trend is empathy and critical thinking skills

English

267

12.3K

52.6K

1.2M

Yuxuan Jiang@_MattJiang_·26 Oca

Totally disappointed…

James Dyett@dyett

There is far more outrage from tech leaders over a wealth tax than masked ICE agents terrorizing communities and executing civilians in the streets. Tells you what you need to know about the values of our industry.

English

245

Yuxuan Jiang retweetledi

evan, ostensibly@EOstensibly·26 Oca

We’re watching the most anti christ ideology festering in American Christian twitter accounts

Isabella Maria DeLuca@IsabellaMDeLuca

“Omg you voted for old white women to be aggressively shoved to the ground??” Yes. Yes I did.

English

1.1K

75.9K

1.4M

Yuxuan Jiang retweetledi

writer of strange software@DrMiaow·25 Oca

@PeteHegseth

QME

997

19.6K

Yuxuan Jiang retweetledi

City of Detroit@CityofDetroit·25 Oca

ZXX

361

887

9.2K

150K

Yuxuan Jiang retweetledi

David Bau@davidbau·26 Oca

Thank you @JeffDean. For those of us in tech who want to do more than speak up: I wrote this to think through what's at stake and what the research says about effective resistance. davidbau.github.io/poetsandnurses/ It's on GitHub. PRs welcome: github.com/davidbau/poets…

Jeff Dean@JeffDean

This is absolutely shameful. Agents of a federal agency unnecessarily escalating, and then executing a defenseless citizen whose offense appears to be using his cell phone camera. Every person regardless of political affiliation should be denouncing this.

English

480

50.3K

Yuxuan Jiang retweetledi

FreeSino@FreeSino·26 Oca

Alex Pretti 几乎具备一个在美国社会中能被左右两派同时接受的“完美背景”：退伍军人医疗服务系统（VA）的护士，合法持枪；性取向不明，但可以确认并非 LGBT 群体；他不是事发纠纷的当事人，而是站在一旁、试图帮助一名遭到袭击女性的路人。首先，他的职业背景本身就具有极强的社会共情力。在美国社会，军人和退伍军人长期享有高度尊重——无论政治立场、党派归属如何，至少在公共叙事中，没有人敢公开贬损他们。同时，护士也是美国最受尊敬、最被信任的职业之一，军人和护士连续多年稳居“最受信任和尊重”的职业前三位。Pretti 身上同时叠加了这两种身份，几乎达到了美国社会共情的最大公约数。其次，他合法持枪，直接击中了川普基本盘乃至绝大多数普通美国人的核心焦虑。 Pretti 拥有 Concealed Carry Permit，在公共场所合法携带枪支。这几乎是任何一个普通美国人都会遇到的处境：“我合法带枪，如果遇到执法人员，会发生什么？” 而国土安全部长诺姆（Kristi Noem）事后试图抹黑受害者的一句话，彻底让支持第二修正案（2A）的保守派和拥枪人士“破防”—— “我不知道有哪个和平示威者会携带武器和弹药出现……” 这种说法，几乎是在否定一个美国社会司空见惯的现实：在议会前、政府机构前、街头示威中，携带枪支进行“和平示威”的画面，在任何一个州都随处可见。诺姆的表态，本质上是在否定这一群体的合法权利。往大了说，这是在攻击宪法第二修正案所保障的人民拥枪权。更荒谬的是，加州中央选区一名第一助理检察官甚至声称： “携带枪支接近执法人员，就可能被合法射杀。” 这一说法立刻遭到 NRA（美国全国步枪协会，支持 2A 的最具影响力政治组织）的公开驳斥，直指其危险且错误。这意味着什么？如果 MAGA 群体要接受川普政府当前的叙事，就必须接受一个前所未有的前提：示威游行不可以带枪。对 MAGA 来说，他们可以没有 SNAP、可以没有医保、甚至可以破产，但拥枪权是绝对不可触碰的红线。因此，Alex Pretti 的遭遇很容易在保守主义阵营中引发强烈共情：这次是他，下次会不会是我？这种不安全感，将进一步撕裂原本就已出现裂痕的保守派阵营。与此同时，截至目前，没有任何证据显示 Alex Pretti 属于 LGBT 群体。而将受害者贴上 LGBT 标签，以此合理化暴力、消解同情，一直是极右翼最惯用的叙事策略。17 天前被 ICE 杀害的 Renee Good 就是典型案例——极右翼极力宣传她是女同性恋，尽管她的前夫明确表示她是一名虔诚的基督徒，却依然无法在“基督教保守主义”群体中激起共情。在许多极端宗教群体眼中，LGBT 本就被视为“有罪之人”，被杀害也“不值得同情”。而 Alex Pretti 恰恰相反。他并非 LGBT，外形高大、精神、留着浓密络腮胡，几乎是“美国白男”的标准形象。他在遇害前戴着棒球帽、手持手机、与 ICE 正面对峙的照片被广泛传播，画面极具冲击力。更关键的是—— Alex Pretti 不是任何纠纷的当事人。他只是站在一旁拍摄，在目睹一名女士被 ICE 暴力推倒在地后，主动上前伸手相助。他遇害前的最后一句话——“Are you okay?” 让无数人动容，潸然泪下。任何一位正常的女性看着这样一位绅士被残忍的杀害而不会无动于衷。 ICE 是从他的背后对他发起攻击，将他拖倒在地，随后多名 ICE 人员围殴，并从背后开枪射杀他。从始至终，他没有任何反抗或攻击行为。相比之下，17 天前 Renee Good 被杀时，ICE 尚可站在车头一侧，强行污蔑其“意图攻击”。而这一次，现场存在多个角度的视频，无论如何慢放、逐帧分析，都无法得出 Alex Pretti 具有攻击性的结论——ICE 完全不存在开枪的必要性。事件发生后，政界反应迅速且罕见地一致。除加州、纽约州外，连平日极少发声的缅因州、佛蒙特州州长都公开要求 ICE 撤离辖区的城市。甚至一向避免直接介入现实政治争议的前总统克林顿也发表声明，指出美国正站在一个历史性节点：是彻底走向威权，还是回归民主与文明。就连川普最重要的盟友之一——NRA，也公开要求对此案进行全面调查。而截至目前，除了川普政府，几乎没有其他重要政治人物公开为国土安全部辩护。如果川普选择死保诺姆，这起枪击案势必持续发酵，针对诺姆的弹劾进程也将不断推进，并极有可能一路拖到年底影响中期选举，这不是共和党想看到的。睁眼说瞎话的诺姆已然成为众矢之的，让她“背锅”以平息众怒，成为大概率选项。目前在 Polymarket 上，诺姆离开白宫的概率已升至 60%。Noem会不会成为川普政府倒塌的第一块多米诺骨牌，我们拭目以待！

中文

157

1.1K

159.4K

Yuxuan Jiang retweetledi

Barack Obama@BarackObama·25 Oca

The killing of Alex Pretti is a heartbreaking tragedy. It should also be a wake-up call to every American, regardless of party, that many of our core values as a nation are increasingly under assault.

English

65.7K

113.9K

804.7K

43.9M

Keşfet

@wenhaocha1 @SecWar @PeteHegseth @JeffDean @elonmusk @BarackObama @taylorswift13 @cristiano