John deVadoss

13 posts

John deVadoss banner
John deVadoss

John deVadoss

@john_devadoss

co-Founder NeuralFabric acq. by @Cisco | co-Founder @IntWorkAll | Board @GBBC_io | General Manager @Microsoft | Phd RL research @UMassAmherst

Katılım Haziran 2019
2K Takip Edilen9.6K Takipçiler
John deVadoss retweetledi
Souradip Chakraborty
Souradip Chakraborty@SOURADIPCHAKR18·
🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL
Souradip Chakraborty tweet media
English
15
85
475
109.1K
John deVadoss retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
Work led by @jacobcares showed that little compute for building an LLM is actually in the final runs. The vast majority of compute goes to developing a recipe. Creating the recipe openly is a huge lever in making sure the research community's compute pushes to new knowledge.
Nathan Lambert tweet media
Ai2@allen_ai

Today we’re bringing new NSF OMAI compute online with NVIDIA Blackwell Ultra-powered systems, turning a $152M national investment from @NSF & @NVIDIA into a foundation for truly open AI research. 🧵

English
5
17
114
19.6K
John deVadoss retweetledi
Xiaomi MiMo
Xiaomi MiMo@XiaomiMiMo·
Xiaomi MiMo-V2.5 is now officially open-sourced! MIT License, supporting commercial deployment, continued training, and fine-tuning - no additional authorization required. Two models, both supporting a 1M-token context window : • MiMo-V2.5-Pro: built for complex agent and coding tasks, ranking No.1 among open-source models on GDPVal-AA and ClawEval • MiMo-V2.5: a native omni-modal model with strong agent capabilities A model's value isn't measured by rankings alone — it's measured by the problems it solves. Let's build with MiMo now! 🤗 Weights: huggingface.co/collections/Xi… 📄 Blog: #blog" target="_blank" rel="nofollow noopener">mimo.xiaomi.com/index#blog
Xiaomi MiMo tweet mediaXiaomi MiMo tweet media
English
144
463
3.4K
773.6K
John deVadoss retweetledi
alex zhang
alex zhang@a1zhang·
New mini experiment + blogpost + trajectories! tldr; we boost performance of RLM(GPT-5.2) to double the best performing number (38.7% --> 65.6%) on LongCoT-mini without any training! An example of the mismanaged geniuses hypothesis (MGH) we (@zli11010, @lateinteraction) proposed earlier this month. The LongCoT benchmark showed that frontier LMs and RLMs struggled to solve difficult compositional reasoning tasks. The paper generally attributes this to the RLMs inability to perform task decomposition, but we argue this is more our fault in how we prompt them; this capability is fully available to GPT-5.2 with an RLM harness! Building on @raw_works's insightful blogpost and @sumeetrm / @CharlieLondon02 et al.'s incredibly useful benchmark, where they originally found RLMs to be incapable of solving the MATH and CS splits altogether. We did not train anything since the release of the initial benchmark. To be fully transparent, these results are not meant to be added to their leaderboard either; benchmarks measure isolated capabilities, and we focus on showing (through different, rather specific prompting) that the capabilities required to solve these tasks are available to the models without additional training! It also has implications about how we would go about training these systems. Full blog below, it's a nice read :)
alex zhang tweet media
English
18
65
490
41.2K
John deVadoss retweetledi
DeepSeek
DeepSeek@deepseek_ai·
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n
DeepSeek tweet media
English
1.6K
7.7K
45.4K
9.7M
John deVadoss retweetledi
Qwen
Qwen@Alibaba_Qwen·
🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… Github: github.com/QwenLM/Qwen3.6 Hugging Face: huggingface.co/Qwen/Qwen3.6-2… huggingface.co/Qwen/Qwen3.6-2… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…
Qwen tweet media
English
544
1.7K
12.5K
3.7M
John deVadoss retweetledi
Arthur Douillard
Arthur Douillard@Ar_Douillard·
The DiLoCo team at Google DeepMind and Google Research is proud to release Decoupled DiLoCo, the next frontier for resilient AI pre-training. Decoupled DiLoCo enables training with datacenters across the world, using heterogeneous hardware, and never halting the system despite hardware failures.
GIF
English
33
86
609
2.7M
John deVadoss retweetledi
Hayden Prairie
Hayden Prairie@hayden_prairie·
We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size. Our scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem! 🧵👇
Hayden Prairie tweet media
English
41
179
1.3K
292.1K
John deVadoss retweetledi
Ian Osband
Ian Osband@IanOsband·
Scaling up distributed RL is the big challenge in AI. At its core the issue is that the actor != learner. The standard fix is importance weighting p_learn/p_act. It kind of works if you tune/clip... but not very well. Delightful Policy Gradient solves it. arxiv.org/abs/2603.20521
Ian Osband tweet media
English
6
15
245
68K