Daanish Khazi (@bertgodel) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.

English

40

59

318

24.6K

Daanish Khazi@bertgodel·3d

@madiator pip install rubric github.com/The-LLM-Data-C…

English

0

8

191

Mahesh Sathiamoorthy@madiator·3d

What's the library people use for defining/loading/processing rubrics?

English

6

1

11

2.9K

Daanish Khazi@bertgodel·5d

@AyushKarupakula such a great read!

English

0

2

53

Ayush@AyushKarupakula·6d

Excited to finally share Ebla-1 and the C⁴ benchmark. Really enjoyed working with HUD on the evals behind it.

hud@hud_evals

Aviro is introducing Ebla, a state of the art grounded reasoning model. In collaboration with HUD, the Aviro team built C⁴ — a benchmark for long-horizon tasks in corporate document sets. We evaluate four dimensions: Correctness, Completeness, Composition, and Citations. @aviro_ai post-trained GPT-OSS 120b to achieve SOTA performance, with a Pass@1 score of 25.4% and Pass@8 score of 37.1%.

English

1

4

14

1.2K

Daanish Khazi retweetledi

Rohan Pandey@khoomeik·13 Mar

labs will publish details on arch, optim, objectives, scaling, kernels, literally everything except data and academia will be astounded for the hundredth time, wondering to itself where the secret sauce is

English

27

83

1.2K

68.2K

Daanish Khazi@bertgodel·11 Mar

@Szhu4433 Dmed you!

English

0

9

Szhu@Szhu4433·11 Mar

@bertgodel This is so cool! Password please

English

1

0

24

Daanish Khazi@bertgodel·4 Mar

We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.

English

40

59

318

24.6K

Daanish Khazi@bertgodel·10 Mar

@iohcsnim @GoogleDeepMind Congrats Min!! 🌲🏠

English

0

1

82

Min Choi@iohcsnim·10 Mar

One year at @GoogleDeepMind, and I'm so excited to share something I've worked on this past year! Introducing Gemini Embedding 2: our first natively multimodal embedding model that maps text🔠, images🖼, video🎞, audio🔊, and documents📄 into a single embedding space.

English

3

22

1.9K

Daanish Khazi@bertgodel·10 Mar

@arthursolwayne @DealGlassInc Congrats Arthur!

English

1

0

1

89

Arthur Wayne@arthursolwayne·10 Mar

i quit my quant job at SIG to build @DealGlassInc we just topped SpreadsheetBench Verified, mogging Opus 4.6, GPT-5.4, and Shortcut. generic spreadsheet agents optimize for speed at the expense of quality. we built Tetra to underwrite billions.

English

10

4

25

2.1K

Daanish Khazi@bertgodel·5 Mar

thanks vamsi! it seems hard to have a policy that always does what it's told (a la IFBench) and also pushes back and resists sycophancy when necessary see gpt 5.4 system card today - they note degradation on HB Hard for specifically this: "GPT-5.4 seeks much less context than GPT-5.2.. its main weaknesses are poorer context-seeking when information may be missing" getting the model to eagerly complete tasks (and score high on SWE Bench) is likely at odds with getting it to clarify, push back or re-orient the user when necessary

English

0

3

67

Vamsi Bedapudi@wamsib·4 Mar

@bertgodel Couldn't an LLM be concise, compassionate and accurate - but also do good Instruction Following at the same time? Not sure why they should be different

English

1

0

1

44

Daanish Khazi@bertgodel·5 Mar

@lightc0n3 yeah definitely

English

1

0

1

227

Dr. Clippy@endpointarena·4 Mar

@bertgodel Can we put Kos-1 Lite on EndpointArena.com to see how it does at predicting FDA outcomes? @bertgodel

English

1

0

2

306

Daanish Khazi@bertgodel·5 Mar

@ketcholito Thank so much Annalise!!

English

0

69

Annalise Krueger@ketcholito·4 Mar

This team is so good. There are <20 companies in the US who can effectively post train a 100b MoE model, and far fewer at the SOTA level. Can't wait to see what is next

Daanish Khazi@bertgodel

We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.

English

1

0

6

612

Daanish Khazi@bertgodel·5 Mar

@thegavinbains LMAO

HT

0

1

37

Gavin Bains@thegavinbains·4 Mar

data gf 🤝 compute bf

Daanish Khazi@bertgodel

We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.

Filipino

3

1

16

394

Daanish Khazi@bertgodel·4 Mar

@MiasBear dmed you!

English

0

1

126

Mia's Bear@MiasBear·4 Mar

@bertgodel @bertgodel I can't direct message to you, please send me a password for kos-1 lite demo, Thank you!

English

1

0

151

Daanish Khazi@bertgodel·4 Mar

@YatinBadal dmed you!

English

0

1

94

Yatin@YatinBadal·4 Mar

@bertgodel Very cool! I’d love to try it too - what’s the password?

English

1

0

70

Daanish Khazi@bertgodel·4 Mar

@knowledge_embed dmed you!

English

0

22

Brian@knowledge_embed·4 Mar

@bertgodel Can I get the password? I want to try it

English

1

0

63

Daanish Khazi@bertgodel·4 Mar

@salathe dmed you!

English

0

13

DrP@salathe·4 Mar

@bertgodel Password please

English

1

0

31

Daanish Khazi@bertgodel·4 Mar

@MichaelN_RS dmed you!

English

0

46

RetirementSingularity@MichaelN_RS·4 Mar

@bertgodel Password please!

English

1

0

59

Daanish Khazi@bertgodel·4 Mar

@meehowee dmed you!

English

0

1

24

MastaChocolatier@meehowee·4 Mar

@bertgodel Awesome! I'm a cardiologist/pharmacist working with AI - I would love to check it out, if possible. Please let me know 🙂

English

1

0

53

Daanish Khazi@bertgodel·4 Mar

@mercier_kent dmed you!

English

0

1

116

Kent Mercier@mercier_kent·4 Mar

@bertgodel Please reply with a password to try out Kos-1 Lite! I found out about this from the March 04, 2026 newsletter by Dr. Alex Wissner-Gross. Thanks, can't wait to try it.

English

1

0

151

Daanish Khazi@bertgodel·4 Mar

@JohnsonThomasMD dmed you!

English

0

2

115

Johnson Thomas, MD, FACE@JohnsonThomasMD·4 Mar

@bertgodel How do I get access. Can’t DM you 😞

English

1

0

1

216

Daanish Khazi@bertgodel·4 Mar

Try Kos-1 Lite here: kos.llmdata.com Read the full blog: llmdata.com/blog/kos-1

English

11

1

24

1.2K

Daanish Khazi@bertgodel·4 Mar

We're already seeing strong reactions from physicians in early testing, and we're excited to get this in front of more clinicians and users. We are continuing to improve and train the model, and we’ll release upgraded checkpoints as training continues.

English

1

0

16

1.1K

Daanish Khazi

Keşfet