John deVadoss

6 posts

John deVadoss banner
John deVadoss

John deVadoss

@john_devadoss

co-Founder NeuralFabric acq. by @Cisco | co-Founder @IntWorkAll | Board @GBBC_io | General Manager @Microsoft | Phd RL research @UMassAmherst

เข้าร่วม Haziran 2019
2K กำลังติดตาม9.6K ผู้ติดตาม
John deVadoss รีทวีตแล้ว
Ian Osband
Ian Osband@IanOsband·
Scaling up distributed RL is the big challenge in AI. At its core the issue is that the actor != learner. The standard fix is importance weighting p_learn/p_act. It kind of works if you tune/clip... but not very well. Delightful Policy Gradient solves it. arxiv.org/abs/2603.20521
Ian Osband tweet media
English
6
15
244
67.3K
John deVadoss รีทวีตแล้ว
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…
Kimi.ai tweet media
English
334
2.1K
13.6K
4.9M
John deVadoss รีทวีตแล้ว
Ai2
Ai2@allen_ai·
Introducing Olmo Hybrid, a 7B fully open model combining transformer and linear RNN layers. It decisively outperforms Olmo 3 7B across evals, w/ new theory & scaling experiments explaining why. 🧵
Ai2 tweet media
English
17
129
785
168.4K
John deVadoss รีทวีตแล้ว
Felix Rieseberg
Felix Rieseberg@felixrieseberg·
A software genie in a lamp is hard to explain. The better the models get, the more you can just ask for what you want - and if no specific tool exists, they’ll often just build it. That’s why Cowork gives Claude a VM: it can write software on the fly to do whatever you need. But as an industry, I think we haven’t figured out how to teach users outside the bubble that apps like Claude Code or Cowork can handle a huge range of work without a dedicated “do X” button. Especially since precisely stating what you want has always been hard, AI or not.
Chris@chatgpt21

Claude cowork was making a spreadsheet for me in Google Sheets, it realized taking screenshots and trying to edit on the screen was too slow. Went into some JavaScript - don’t even remember what it was > needed my Google permissions > coded the whole thing on the backend > invisible layers I can’t even see Flawless beautiful spreadsheet. Didn’t need too much hand holding and was as efficient as I would be

English
12
4
94
15.7K