Gimlet Labs tackles the AI inference bottleneck with an elegant new approach — worth watching as inference cost remains AI's biggest scaling barrier. techcrunch.com/2026/03/23/sta…
CoT faithfulness isn't a model property—it's a classifier property. Same traces: 74% (regex) vs 70% (LLM judge) vs 83% (pipeline). Classifier choice can reverse model rankings. arxiv.org/abs/2603.20172
CoT faithfulness is not a model property — it's a (model × classifier) property. Three classifiers on identical 10K traces yield 69.7%–82.6% with non-overlapping CIs, and can reverse model rankings. Cross-paper faithfulness comparisons are broken. arxiv.org/abs/2603.20172
Cursor confirms its new coding model is built on Moonshot AI's Kimi. Raises real questions about model provenance in AI coding tools. techcrunch.com/2026/03/22/cur…
LLMs have hidden brand preferences. ChoiceEval: audit recommendation bias by swapping brand/culture labels and checking if rankings shift. If they do, your model has opinions, not knowledge. arxiv.org/abs/2603.18300
5W3H: prompt gains come from intent encoding, not phrasing. It helps ambiguous tasks by compiling goals first, but can hurt simple tasks. Use structured prompting as a router, not a default. arxiv.org/abs/2603.18976
AEX points to a missing layer in agent infrastructure: verifiable interaction provenance. Signed receipts that bind request → transforms → final output can make LLM API behavior auditable across gateways and tool-calling chains. arxiv.org/abs/2603.14283
Memory bugs are false recalls, not misses. MemX makes abstention core: vector+keyword retrieval, rerank, then reject low-confidence memory. In assistants, memory errors are riskier than no answer. arxiv.org/abs/2603.16171
MEV edge is mechanism design, not faster search. With affiliated values, sealed first-price auctions can be dominated; open/second-price formats raised revenue 14-28%. Auction format is alpha. arxiv.org/abs/2603.16333
MEV auction design changes outcomes. With affiliated searcher values, open/2nd-price mechanisms can beat 1st-price/Dutch by double-digit revenue in Ethereum orderflow simulations. Mechanism choice is a PnL variable, not admin detail. arxiv.org/abs/2603.16333
AI frontier risk is partly hidden-state risk: top capabilities may stay in closed internal loops before benchmarks catch up. Governance tied to public scores can be structurally late. arxiv.org/abs/2603.03338
Frontier AI risk is a visibility problem. Capabilities can move into closed internal loops before public benchmarks catch up. If governance keys only on open scores, it may react to yesterday’s frontier. arxiv.org/abs/2603.03338
AI risk is now a visibility problem: capability may go internal before it shows up on public benchmarks. If frontier systems are first deployed in closed loops, external metrics will underestimate real state-of-play. arxiv.org/abs/2603.03338