girish

280 posts

girish banner
girish

girish

@googrish

@castformai prev @scifivc @stanfordsymsys

Katılım Ekim 2012
549 Takip Edilen559 Takipçiler
girish
girish@googrish·
What happens when you let an llm attack itself on repeat? Attacker finds jailbreaks → those become defender training data → repeat. Defense rate went 64% → 92%, no human-written adversarial prompts.
Thariq@Thariq_q

we trained Qwen3.5-4B with RL to get itself to comply with requests about making meth and stealing credit cards. then we used the attacks that worked to train the model’s defenses, and repeated the loop - fully automated red-teaming. defense rate went from 64% → 92%.

English
0
1
4
280
girish retweetledi
Vals AI
Vals AI@ValsAI·
Finance Agent Benchmark v2 is here. Finance is one of the most lucrative applications of AI where much of the busy work could be automated. That’s why we rebuilt our Finance Agent Benchmark to push frontier models even further. We designed V2 to better reflect what financial analysts actually do: refined taxonomy reflecting real workflows, an improved harness with more tools, and jury-based evaluation. The result: no model cracks 52%. Would you trust a financial analyst who’s only correct half the time?
English
10
15
84
6.9K
girish
girish@googrish·
Numbers on Qwen3.5-4B: 16k prompt / 64 out → 7.5x 16k / 128 → 7.3x 16k / 1k → 5.4x 8k / 4k → 1.7x the greater the prompt-to-response ratio, the bigger the win. writeup with the attention tricks and what's next: castform.com/blog/train-pro…
English
0
0
0
88
girish
girish@googrish·
the fix: pack/compute the prompt once, then all g responses after it. it's like inference prefix caching, but training needs gradients to flow back through the prompt. that breaks causal attention, and patching it took different tricks for full vs linear attention layers.
English
1
0
0
72
girish
girish@googrish·
we got a 7.5x speedup on llm rl training for long-prompt, short-response workloads with a simple trick. most open source RL engines pack sequences naively: prompt + response, repeated for every sample in the group. With 1000-token prompts and 100-token responses at G=8, you're processing 8800 tokens when only 1800 are unique. ~5x wasted compute.
girish tweet media
English
1
2
7
223
girish
girish@googrish·
you either beat the baseline or change the baseline
English
0
0
2
78
girish
girish@googrish·
thariq plays pokemon at his desk all day and somehow outships the entire team. I finally figured out why: the pokemon screen was just 6 coding agents in a trench coat. learn how we're 10x'ing engineering output:
Thariq@Thariq_q

I got tired of managing 8 Claude Code tabs, so I built Pokegents, an open source multi-agent workspace for coding agents. It has a Pokémon-themed dashboard/chat UI, persistent agent identities, MCP messaging, notifications, session cloning, and a local orchestration server.

English
1
1
6
298
girish retweetledi
Zyphra
Zyphra@ZyphraAI·
Introducing folded Tensor and Sequence Parallelism (TSP), a new way to split large models across GPUs that achieves lower per-GPU peak memory than any standard parallelism scheme. Scaled on @AMD MI300x. Bigger models, longer contexts, and higher throughput 🧵
Zyphra tweet media
English
6
30
211
597K
girish retweetledi
Huiqiang Jiang
Huiqiang Jiang@iofu728·
🌩️Introducing FlashQLA: high-performance linear attention kernels on TileLang. ⚡ 2-3× fwd, 2× bwd speedup. 💻 Purpose-built for agentic on your personal devices. 1. Gate-driven auto intra-card CP. 2. Hardware-friendly reformulation. 3. TileLang fused warp-specialized kernels.
Huiqiang Jiang tweet mediaHuiqiang Jiang tweet media
English
6
32
232
19.4K
girish
girish@googrish·
@jasoncwarner @poolsideai congrats!! are these the same models y'all deploy for your customers or are these distinct?
English
1
0
1
294
Jason Warner
Jason Warner@jasoncwarner·
Today @poolsideai is releasing Laguna M.1 & Laguna XS.2, our latest generation models and first public models We started Poolside because we believed that to build truly capable coding agents, you need to own the full stack: data, training, reinforcement learning, inference. These models are the first result of that work, and we’re making them available to everyone
English
39
40
376
49.6K
girish
girish@googrish·
@zhzHNN where/how can i try it? is there a repo?
English
0
0
0
27
Huaizheng Zhang
Huaizheng Zhang@zhzHNN·
Training large-scale RL has 3 clear goals: longer, faster, and more stable. That's why we built TensorHub. - Ultra-fast RDMA performance - Elasticity and fault tolerance - Just 4 core APIs High performance without sacrificing resilience. Give it a try.
Huaizheng Zhang tweet media
English
4
13
101
19.5K
girish
girish@googrish·
@pedroh96 @brexHQ neat stuff! what model(s) are you guys using for the llm judge? is there a big diff in oss/closed performance?
English
1
0
1
3K
Pedro Franceschi
Pedro Franceschi@pedroh96·
OpenClaw is the fastest-growing open source project, but there are no stories of running it safely in production at scale. As we started deploying agents internally at @brexHQ, we couldn’t stop thinking about this question. Agents work, but nobody wants to give them real credentials. Instead of waiting for a solution to emerge, we decided to try a novel approach: using LLMs to judge the network traffic of an AI agent. Today we’re announcing CrabTrap, an open-source proxy that intercepts every outbound request and blocks risky activity using LLMs, before it ever hits an external API. The results are promising; we believe it’s a meaningful step forward in the security of agent harnesses in production environments. Try it out today. (As a side note, it was really fun to work personally on a real systems problem again. And btw, if you want to work at a place where the CEO is building proxies at night, we’re hiring!)
Pedro Franceschi@pedroh96

x.com/i/article/2014…

English
101
150
1.9K
768.8K
girish
girish@googrish·
@Kimi_Moonshot this is sick! any chance we can get one for the backward pass too? 😅🙏
English
0
0
2
1.5K
Kimi.ai
Kimi.ai@Kimi_Moonshot·
We're open-sourcing FlashKDA — our high-performance CUTLASS-based implementation of Kimi Delta Attention kernels. Achieves 1.72×–2.22× prefill speedup over the flash-linear-attention baseline on H20, and works as a drop-in backend for flash-linear-attention. Explore on github: github.com/MoonshotAI/Fla…
English
46
185
1.8K
211.2K
Ian Osband
Ian Osband@IanOsband·
@googrish @euijinrnd Nice to hear! I think it's surprising how possible it is to improve on the state of the art in the field... I may need to DM you for deets 🤣
English
1
0
2
305