girish

280 posts

girish

@googrish

@castformai prev @scifivc @stanfordsymsys

Katılım Ekim 2012

549 Takip Edilen559 Takipçiler

girish@googrish·15h

What happens when you let an llm attack itself on repeat? Attacker finds jailbreaks → those become defender training data → repeat. Defense rate went 64% → 92%, no human-written adversarial prompts.

Thariq@Thariq_q

we trained Qwen3.5-4B with RL to get itself to comply with requests about making meth and stealing credit cards. then we used the attacks that worked to train the model’s defenses, and repeated the loop - fully automated red-teaming. defense rate went from 64% → 92%.

English

280

girish@googrish·2d

@ValsAI very much needed!

English

141

girish retweetledi

Vals AI@ValsAI·2d

Finance Agent Benchmark v2 is here. Finance is one of the most lucrative applications of AI where much of the busy work could be automated. That’s why we rebuilt our Finance Agent Benchmark to push frontier models even further. We designed V2 to better reflect what financial analysts actually do: refined taxonomy reflecting real workflows, an improved harness with more tools, and jury-based evaluation. The result: no model cracks 52%. Would you trust a financial analyst who’s only correct half the time?

English

6.9K

girish@googrish·2d

solid read on how to build a modern gpu orchestration engine

Charles 🎉 Frye@charles_irl

Inference isn't everything, but it does require a new stack -- not Kubernetes, not SLURM. At @modal, we dove deep to build that stack. In this blog post we explain how, from compute management & cloud-native cacheing to CRIU & GPU checkpointing. modal.com/blog/truly-ser…

English

2.5K

girish@googrish·3d

Numbers on Qwen3.5-4B: 16k prompt / 64 out → 7.5x 16k / 128 → 7.3x 16k / 1k → 5.4x 8k / 4k → 1.7x the greater the prompt-to-response ratio, the bigger the win. writeup with the attention tricks and what's next: castform.com/blog/train-pro…

English

girish@googrish·3d

the fix: pack/compute the prompt once, then all g responses after it. it's like inference prefix caching, but training needs gradients to flow back through the prompt. that breaks causal attention, and patching it took different tricks for full vs linear attention layers.

English

girish@googrish·3d

we got a 7.5x speedup on llm rl training for long-prompt, short-response workloads with a simple trick. most open source RL engines pack sequences naively: prompt + response, repeated for every sample in the group. With 1000-token prompts and 100-token responses at G=8, you're processing 8800 tokens when only 1800 are unique. ~5x wasted compute.

English

223

girish@googrish·5d

you either beat the baseline or change the baseline

English

girish retweetledi

castform@castformai·6d

we let our engineers play pokemon at work. we also ship faster than ever. these two facts are related. learn how we're 10x'ing engineering output:

Thariq@Thariq_q

I got tired of managing 8 Claude Code tabs, so I built Pokegents, an open source multi-agent workspace for coding agents. It has a Pokémon-themed dashboard/chat UI, persistent agent identities, MCP messaging, notifications, session cloning, and a local orchestration server.

English

169

girish@googrish·6d

@BrooksHosfield 😂

QME

Brooks Hosfield@BrooksHosfield·6d

@googrish Finally a UI that makes intuitive sense to me

English

girish@googrish·6d

thariq plays pokemon at his desk all day and somehow outships the entire team. I finally figured out why: the pokemon screen was just 6 coding agents in a trench coat. learn how we're 10x'ing engineering output:

Thariq@Thariq_q

English

298

girish@googrish·4 May

@QuentinAnthon15 @ZyphraAI @AMD yay! and excited for the other releases this week!

English

Quentin Anthony@QuentinAnthon15·4 May

@googrish @ZyphraAI @AMD Maybe? Wouldn't be too hard. Lots of zyphra releases this week so I'll look at this after.

English

girish retweetledi

Zyphra@ZyphraAI·4 May

Introducing folded Tensor and Sequence Parallelism (TSP), a new way to split large models across GPUs that achieves lower per-GPU peak memory than any standard parallelism scheme. Scaled on @AMD MI300x. Bigger models, longer contexts, and higher throughput 🧵

English

211

597K

girish retweetledi

Huiqiang Jiang@iofu728·28 Nis

🌩️Introducing FlashQLA: high-performance linear attention kernels on TileLang. ⚡ 2-3× fwd, 2× bwd speedup. 💻 Purpose-built for agentic on your personal devices. 1. Gate-driven auto intra-card CP. 2. Hardware-friendly reformulation. 3. TileLang fused warp-specialized kernels.

English

232

19.4K

girish@googrish·28 Nis

@jasoncwarner @poolsideai congrats!! are these the same models y'all deploy for your customers or are these distinct?

English

294

Jason Warner@jasoncwarner·28 Nis

Today @poolsideai is releasing Laguna M.1 & Laguna XS.2, our latest generation models and first public models We started Poolside because we believed that to build truly capable coding agents, you need to own the full stack: data, training, reinforcement learning, inference. These models are the first result of that work, and we’re making them available to everyone

English

376

49.6K

girish@googrish·25 Nis

@zhzHNN where/how can i try it? is there a repo?

English

Huaizheng Zhang@zhzHNN·13 Nis

Training large-scale RL has 3 clear goals: longer, faster, and more stable. That's why we built TensorHub. - Ultra-fast RDMA performance - Elasticity and fault tolerance - Just 4 core APIs High performance without sacrificing resilience. Give it a try.

English

101

19.5K

girish@googrish·21 Nis

@pedroh96 @brexHQ neat stuff! what model(s) are you guys using for the llm judge? is there a big diff in oss/closed performance?

English

Pedro Franceschi@pedroh96·21 Nis

OpenClaw is the fastest-growing open source project, but there are no stories of running it safely in production at scale. As we started deploying agents internally at @brexHQ, we couldn’t stop thinking about this question. Agents work, but nobody wants to give them real credentials. Instead of waiting for a solution to emerge, we decided to try a novel approach: using LLMs to judge the network traffic of an AI agent. Today we’re announcing CrabTrap, an open-source proxy that intercepts every outbound request and blocks risky activity using LLMs, before it ever hits an external API. The results are promising; we believe it’s a meaningful step forward in the security of agent harnesses in production environments. Try it out today. (As a side note, it was really fun to work personally on a real systems problem again. And btw, if you want to work at a place where the CEO is building proxies at night, we’re hiring!)

Pedro Franceschi@pedroh96

x.com/i/article/2014…

English

101

150

1.9K

768.8K

girish@googrish·21 Nis

@Kimi_Moonshot this is sick! any chance we can get one for the backward pass too? 😅🙏

English

1.5K

Kimi.ai@Kimi_Moonshot·21 Nis

We're open-sourcing FlashKDA — our high-performance CUTLASS-based implementation of Kimi Delta Attention kernels. Achieves 1.72×–2.22× prefill speedup over the flash-linear-attention baseline on H20, and works as a drop-in backend for flash-linear-attention. Explore on github: github.com/MoonshotAI/Fla…

English

185

1.8K

211.2K

girish@googrish·20 Nis

@IanOsband @euijinrnd 100% :)

250

Ian Osband@IanOsband·20 Nis

@googrish @euijinrnd Nice to hear! I think it's surprising how possible it is to improve on the state of the art in the field... I may need to DM you for deets 🤣

English

305

Euijin Jeong@euijinrnd·20 Nis

Why isn’t this work getting more hype?

Ian Osband@IanOsband

Something is rotten with policy gradient. PG has become *the* RL loss for LLMs. But it’s not even good at basic RL. Even on MNIST with bandit feedback, vanilla PG performs far worse than cross-entropy because it wastes gradient budget. Delightful Policy Gradient: arxiv.org/abs/2603.14608…

English

14.6K

Keşfet

@ValsAI @BrooksHosfield @QuentinAnthon15 @ZyphraAI @AMD @jasoncwarner @poolsideai @zhzHNN