Gavin Bains

46 posts

Gavin Bains banner
Gavin Bains

Gavin Bains

@thegavinbains

rivendell Katılım Mart 2025
166 Takip Edilen122 Takipçiler
Gavin Bains retweetledi
Daanish Khazi
Daanish Khazi@bertgodel·
We're releasing early results from training Kos-1 Experimental, a Kimi K2.5 checkpoint post-trained on the same medical RL data we used for Kos-1 Lite. As clinical workloads become more agentic, we wanted a model that pairs medical domain knowledge with tool-calling knowhow.
Daanish Khazi tweet media
English
4
16
47
3.3K
Rohan Bansal
Rohan Bansal@polyphilz·
A few weeks and 550 million gpt-5.4/5.5 tokens later... I ported the entirety of VS Code/Cursor's git diff view to Lua Introducing glance - a standalone TUI for reviewing and actioning on git diffs Glance supports: - a side-by-side diff view - stage, unstage, discard or commit staged changes from the filetree - a config layer to customize everything from glance's keymaps to its theme - a 3-way merge view for conflict resolution ...and more 👇
English
3
5
14
438
Gavin Bains retweetledi
Harish Kamath
Harish Kamath@kamath_harish·
We've open sourced the HeadVis code! We'd love to see more interesting examples of attention biology in open source models. To start, we've prepared a demo of HeadVis on all the heads of Gemma 3 1B. There's probably a lot of low-hanging fruit that people haven't discovered yet, and we think that learning more about attention biology can help us answer some of the harder questions we have about attention head polysemanticity and superposition. github.com/anthropics/hea… transformer-circuits.pub/2026/headvis/g…
English
1
5
27
924
Gavin Bains
Gavin Bains@thegavinbains·
@ProulxKerem @kylebhiro Security is one of the few vectors that will survive ASI. How do yall think about red teaming, containment, alignment, monitoring, and governance?
English
1
0
2
80
Gavin Bains retweetledi
Kerem Proulx ⌘
Kerem Proulx ⌘@ProulxKerem·
Our autonomous pentesting agent just outperformed the two most popular open source offensive security agents on a benchmark of 60 modern, defense-enabled web apps. Battle-tested in production against our customers' environments from startups to financial institutions, Apex consistently finds and exploits critical vulnerabilities other agents and humans miss. Today we're releasing it open source alongside our internal benchmarks.
English
48
68
299
2M
Gavin Bains retweetledi
Daanish Khazi
Daanish Khazi@bertgodel·
We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.
Daanish Khazi tweet media
English
40
59
321
27K
Gavin Bains retweetledi
Daanish Khazi
Daanish Khazi@bertgodel·
We're excited to partner with @perplexity_ai on their latest release. We were impressed with how performant their new Deep Research product was on early benchmarks and are thrilled that this work is being open sourced.
Daanish Khazi tweet media
Aravind Srinivas@AravSrinivas

Today, we're rolling out an Advanced version of Perplexity Deep Research, achieving state-of-the-art performance on external and internal benchmarks, beating every other deep research tool on accuracy, usability, and reliability across all verticals.

English
31
28
136
9.5K
Resolve AI
Resolve AI@resolveai·
ATFQ = Ask the f****** question. This is something we often say internally. Ask the question, push the understanding, take a deeper dive. Join us next Thursday, Jan 29 at our SF HQ for a night of dev chats featuring a brief AI for prod working session with insights and takeaways from special guest Duncan Winn, VP of Engineering and SRE Lead at Zscaler. Without a doubt, there will be time to ATFQ. 📅 Sign up on Luma: ordnl.link/9mcm1BA
English
7
5
30
375.7K
Gavin Bains retweetledi
Tomas Hernando Kofman
Tomas Hernando Kofman@tomas_hk·
Today we are releasing a new Prompt Optimization algorithm with significant accuracy and efficiency gains, generally available now through our API.
Tomas Hernando Kofman tweet mediaTomas Hernando Kofman tweet mediaTomas Hernando Kofman tweet mediaTomas Hernando Kofman tweet media
English
5
14
37
5.7K
Gavin Bains retweetledi
Annalise Krueger
Annalise Krueger@ketcholito·
“My investment thesis”
Annalise Krueger tweet media
English
0
1
4
413
Gavin Bains retweetledi
sohan choudhury
sohan choudhury@hungrysohan·
We raised $15M to personalize learning with AI. Education is a massive industry. Edtech? Tiny. Why? Educational technology hasn’t meaningfully changed how students learn. Scantrons have become radio buttons, but it’s static content all the way down. Flint is changing that. We’re already personalizing learning with AI for 400,000 students worldwide, and are growing faster than we can handle. Join us :)
Flint@FlintK12_

Education has promised personalization for decades. We’re making it real. Flint raised $15M in Series A funding (co-led by @BasisSet & @patronfund) to bring AI-powered personalized learning to every classroom. Every student deserves learning made for them.

English
24
14
144
28.4K
Gavin Bains retweetledi
Daanish Khazi
Daanish Khazi@bertgodel·
(1/5) New post: "Mismatch Praxis: Rollout Settings and IS Corrections". We pressure-tested solutions for inference/training mismatch. Inference/training mismatch in modern RL frameworks creates a hidden off-policy problem. To resolve the mismatch, various engineering (e.g., FP16 unification, deterministic kernels) and algorithmic (e.g., importance sampling) fixes have been proposed. In this work, we examine how rollout settings (temp, top-p, and top-k) affect mismatch, and how importance sampling corrections bear out in practice. We find that while Sequence-TIS is theoretically optimal, it can succumb to catastrophic variance in long-horizon contexts. Additionally, non-standard rollout settings create subtle mismatch patterns that require careful engineering fixes. Token-TIS with default rollout settings proved to be the most robust setting for long-horizon training.
Daanish Khazi tweet media
English
8
43
137
30.5K
Umesh Khanna 🇨🇦🇺🇸
Umesh Khanna 🇨🇦🇺🇸@forwarddeploy·
Today, we're launching Dropbox Dash! “Where’s that file?” Four words that might be the most expensive sentence in a business. Millions wasted rewriting and rebuilding what we already know-just because context is lost. The cost isn’t just money. It’s momentum. Dropbox Dash fixes this - Dash is your AI teammate that surfaces the content and context you need to stay focused and on track.
English
36
28
211
1.4M
Kasey Zhang
Kasey Zhang@_WEEXIAO·
We've raised $7M to help companies build AI agents that actually learn and work. @Osmosis_AI is a platform for companies to fine-tune models that outperform foundation models with reinforcement learning. Better, faster, and cheaper.
English
138
90
636
1.2M