Gavin Bains (@thegavinbains) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Gavin Bains@thegavinbains·4 Mar

data gf 🤝 compute bf

We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.

Filipino

3

1

17

535

Gavin Bains retweetledi

Daanish Khazi@bertgodel·4d

We're releasing early results from training Kos-1 Experimental, a Kimi K2.5 checkpoint post-trained on the same medical RL data we used for Kos-1 Lite. As clinical workloads become more agentic, we wanted a model that pairs medical domain knowledge with tool-calling knowhow.

English

4

16

47

3.3K

Gavin Bains retweetledi

Nick Khami@skeptrune·14 May

wasted years of my life getting good at rust pre-ai

Joel 🇦🇺@ptr_to_joel

holy wow they merged it

English

30

24

1.1K

86.1K

Gavin Bains@thegavinbains·11 May

@polyphilz Using the heck out of this!

English

0

1

33

Rohan Bansal@polyphilz·11 May

A few weeks and 550 million gpt-5.4/5.5 tokens later... I ported the entirety of VS Code/Cursor's git diff view to Lua Introducing glance - a standalone TUI for reviewing and actioning on git diffs Glance supports: - a side-by-side diff view - stage, unstage, discard or commit staged changes from the filetree - a config layer to customize everything from glance's keymaps to its theme - a 3-way merge view for conflict resolution ...and more 👇

English

3

5

14

438

Gavin Bains retweetledi

Harish Kamath@kamath_harish·6 May

We've open sourced the HeadVis code! We'd love to see more interesting examples of attention biology in open source models. To start, we've prepared a demo of HeadVis on all the heads of Gemma 3 1B. There's probably a lot of low-hanging fruit that people haven't discovered yet, and we think that learning more about attention biology can help us answer some of the harder questions we have about attention head polysemanticity and superposition. github.com/anthropics/hea… transformer-circuits.pub/2026/headvis/g…

English

1

5

27

924

Gavin Bains@thegavinbains·9 Nis

@ArfurRock @fleet_ai @andrewthezhou best in the game

English

0

1

5

3.7K

Arfur Rock@ArfurRock·9 Nis

Fleet closed an unannounced $45M Series A at $725M. Led by insiders Sequoia, Bain, Menlo, SVA. RR grew from $1M 6 months ago → $63M RR now → $160M next Q. Congrats @fleet_ai!

Nicolai Ouporov@nicoup

We are hiring a lot of former founders at @fleet_ai! So much, that we have our own application. just hit me up if you are looking for your next big swing fleetai.com/careers/former…

English

16

13

438

165.2K

Gavin Bains@thegavinbains·19 Mar

@ProulxKerem @kylebhiro Security is one of the few vectors that will survive ASI. How do yall think about red teaming, containment, alignment, monitoring, and governance?

English

1

0

2

80

Gavin Bains retweetledi

Kerem Proulx ⌘@ProulxKerem·19 Mar

Our autonomous pentesting agent just outperformed the two most popular open source offensive security agents on a benchmark of 60 modern, defense-enabled web apps. Battle-tested in production against our customers' environments from startups to financial institutions, Apex consistently finds and exploits critical vulnerabilities other agents and humans miss. Today we're releasing it open source alongside our internal benchmarks.

English

48

68

299

2M

Gavin Bains retweetledi

Daanish Khazi@bertgodel·4 Mar

We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.

English

40

59

321

27K

Gavin Bains@thegavinbains·23 Şub

@hungrysohan could not agree more

English

0

1

22

sohan choudhury@hungrysohan·22 Şub

it is an SF rite of passage to have stayed in this condo

Ryan Hoover@rrhoover

It's that time of year... Selling my SF condo. Open house tomorrow (2/22). • 2 bd, 2bath penthouse • 1,688 sq ft w/ rooftop patio • Rincon hill, near the water • 2 parking spots Link with details below. :)

English

2

0

8

1.5K

Gavin Bains retweetledi

Daanish Khazi@bertgodel·5 Şub

We're excited to partner with @perplexity_ai on their latest release. We were impressed with how performant their new Deep Research product was on early benchmarks and are thrilled that this work is being open sourced.

Aravind Srinivas@AravSrinivas

Today, we're rolling out an Advanced version of Perplexity Deep Research, achieving state-of-the-art performance on external and internal benchmarks, beating every other deep research tool on accuracy, usability, and reliability across all verticals.

English

31

28

136

9.5K

Gavin Bains@thegavinbains·27 Oca

@rheejust @porterdotrun @FirstMarkCap @ycombinator @daltonc @ROWGHANI LFG

0

2

83

Gavin Bains@thegavinbains·23 Oca

@resolveai #ATFQ

QME

0

2

79

Resolve AI@resolveai·22 Oca

ATFQ = Ask the f****** question. This is something we often say internally. Ask the question, push the understanding, take a deeper dive. Join us next Thursday, Jan 29 at our SF HQ for a night of dev chats featuring a brief AI for prod working session with insights and takeaways from special guest Duncan Winn, VP of Engineering and SRE Lead at Zscaler. Without a doubt, there will be time to ATFQ. 📅 Sign up on Luma: ordnl.link/9mcm1BA

English

7

5

30

375.7K

Gavin Bains retweetledi

Tomas Hernando Kofman@tomas_hk·20 Oca

Today we are releasing a new Prompt Optimization algorithm with significant accuracy and efficiency gains, generally available now through our API.

English

5

14

37

5.7K

Gavin Bains retweetledi

Annalise Krueger@ketcholito·8 Ara

“My investment thesis”

English

0

1

4

413

Gavin Bains retweetledi

sohan choudhury@hungrysohan·6 Kas

We raised $15M to personalize learning with AI. Education is a massive industry. Edtech? Tiny. Why? Educational technology hasn’t meaningfully changed how students learn. Scantrons have become radio buttons, but it’s static content all the way down. Flint is changing that. We’re already personalizing learning with AI for 400,000 students worldwide, and are growing faster than we can handle. Join us :)

Flint@FlintK12_

Education has promised personalization for decades. We’re making it real. Flint raised $15M in Series A funding (co-led by @BasisSet & @patronfund) to bring AI-powered personalized learning to every classroom. Every student deserves learning made for them.

English

24

14

144

28.4K

Gavin Bains@thegavinbains·5 Ara

@rishub_nahar @bertgodel thank you king!

English

0

38

Rishub Nahar@rishub_nahar·3 Ara

This is really well done @bertgodel @thegavinbains

Daanish Khazi@bertgodel

(1/5) New post: "Mismatch Praxis: Rollout Settings and IS Corrections". We pressure-tested solutions for inference/training mismatch. Inference/training mismatch in modern RL frameworks creates a hidden off-policy problem. To resolve the mismatch, various engineering (e.g., FP16 unification, deterministic kernels) and algorithmic (e.g., importance sampling) fixes have been proposed. In this work, we examine how rollout settings (temp, top-p, and top-k) affect mismatch, and how importance sampling corrections bear out in practice. We find that while Sequence-TIS is theoretically optimal, it can succumb to catastrophic variance in long-horizon contexts. Additionally, non-standard rollout settings create subtle mismatch patterns that require careful engineering fixes. Token-TIS with default rollout settings proved to be the most robust setting for long-horizon training.

English

1

0

3

144

Gavin Bains retweetledi

Daanish Khazi@bertgodel·3 Ara

(1/5) New post: "Mismatch Praxis: Rollout Settings and IS Corrections". We pressure-tested solutions for inference/training mismatch. Inference/training mismatch in modern RL frameworks creates a hidden off-policy problem. To resolve the mismatch, various engineering (e.g., FP16 unification, deterministic kernels) and algorithmic (e.g., importance sampling) fixes have been proposed. In this work, we examine how rollout settings (temp, top-p, and top-k) affect mismatch, and how importance sampling corrections bear out in practice. We find that while Sequence-TIS is theoretically optimal, it can succumb to catastrophic variance in long-horizon contexts. Additionally, non-standard rollout settings create subtle mismatch patterns that require careful engineering fixes. Token-TIS with default rollout settings proved to be the most robust setting for long-horizon training.

English

8

43

137

30.5K

Gavin Bains@thegavinbains·29 Eki

@forwarddeploy LFG

0

60

Umesh Khanna 🇨🇦🇺🇸@forwarddeploy·28 Eki

Today, we're launching Dropbox Dash! “Where’s that file?” Four words that might be the most expensive sentence in a business. Millions wasted rewriting and rebuilding what we already know-just because context is lost. The cost isn’t just money. It’s momentum. Dropbox Dash fixes this - Dash is your AI teammate that surfaces the content and context you need to stay focused and on track.

English

36

28

211

1.4M

Gavin Bains@thegavinbains·15 Eki

@_WEEXIAO @Osmosis_AI LFG 🐐🐐

0

4

77

Kasey Zhang@_WEEXIAO·15 Eki

We've raised $7M to help companies build AI agents that actually learn and work. @Osmosis_AI is a platform for companies to fine-tune models that outperform foundation models with reinforcement learning. Better, faster, and cheaper.

English

138

90

636

1.2M

Gavin Bains

Keşfet