Daanish Khazi

698 posts

Daanish Khazi banner
Daanish Khazi

Daanish Khazi

@bertgodel

@llmdataco | vernunft ist sprache

sf Katılım Şubat 2018
731 Takip Edilen852 Takipçiler
Sabitlenmiş Tweet
Daanish Khazi
Daanish Khazi@bertgodel·
We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.
Daanish Khazi tweet media
English
40
59
318
24.6K
Mahesh Sathiamoorthy
What's the library people use for defining/loading/processing rubrics?
English
6
1
11
2.9K
Ayush
Ayush@AyushKarupakula·
Excited to finally share Ebla-1 and the C⁴ benchmark. Really enjoyed working with HUD on the evals behind it.
hud@hud_evals

Aviro is introducing Ebla, a state of the art grounded reasoning model. In collaboration with HUD, the Aviro team built C⁴ — a benchmark for long-horizon tasks in corporate document sets. We evaluate four dimensions: Correctness, Completeness, Composition, and Citations. @aviro_ai post-trained GPT-OSS 120b to achieve SOTA performance, with a Pass@1 score of 25.4% and Pass@8 score of 37.1%.

English
1
4
14
1.2K
Daanish Khazi retweetledi
Rohan Pandey
Rohan Pandey@khoomeik·
labs will publish details on arch, optim, objectives, scaling, kernels, literally everything except data and academia will be astounded for the hundredth time, wondering to itself where the secret sauce is
English
27
83
1.2K
68.2K
Szhu
Szhu@Szhu4433·
@bertgodel This is so cool! Password please
English
1
0
0
24
Daanish Khazi
Daanish Khazi@bertgodel·
We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.
Daanish Khazi tweet media
English
40
59
318
24.6K
Min Choi
Min Choi@iohcsnim·
One year at @GoogleDeepMind, and I'm so excited to share something I've worked on this past year! Introducing Gemini Embedding 2: our first natively multimodal embedding model that maps text🔠, images🖼, video🎞, audio🔊, and documents📄 into a single embedding space.
English
3
3
22
1.9K
Arthur Wayne
Arthur Wayne@arthursolwayne·
i quit my quant job at SIG to build @DealGlassInc we just topped SpreadsheetBench Verified, mogging Opus 4.6, GPT-5.4, and Shortcut. generic spreadsheet agents optimize for speed at the expense of quality. we built Tetra to underwrite billions.
Arthur Wayne tweet media
English
10
4
25
2.1K
Daanish Khazi
Daanish Khazi@bertgodel·
thanks vamsi! it seems hard to have a policy that always does what it's told (a la IFBench) and also pushes back and resists sycophancy when necessary see gpt 5.4 system card today - they note degradation on HB Hard for specifically this: "GPT-5.4 seeks much less context than GPT-5.2.. its main weaknesses are poorer context-seeking when information may be missing" getting the model to eagerly complete tasks (and score high on SWE Bench) is likely at odds with getting it to clarify, push back or re-orient the user when necessary
English
0
0
3
67
Vamsi Bedapudi
Vamsi Bedapudi@wamsib·
@bertgodel Couldn't an LLM be concise, compassionate and accurate - but also do good Instruction Following at the same time? Not sure why they should be different
English
1
0
1
44
Mia's Bear
Mia's Bear@MiasBear·
@bertgodel @bertgodel I can't direct message to you, please send me a password for kos-1 lite demo, Thank you!
English
1
0
0
151
Yatin
Yatin@YatinBadal·
@bertgodel Very cool! I’d love to try it too - what’s the password?
English
1
0
0
70
Brian
Brian@knowledge_embed·
@bertgodel Can I get the password? I want to try it
English
1
0
0
63
MastaChocolatier
MastaChocolatier@meehowee·
@bertgodel Awesome! I'm a cardiologist/pharmacist working with AI - I would love to check it out, if possible. Please let me know 🙂
English
1
0
0
53
Kent Mercier
Kent Mercier@mercier_kent·
@bertgodel Please reply with a password to try out Kos-1 Lite! I found out about this from the March 04, 2026 newsletter by Dr. Alex Wissner-Gross. Thanks, can't wait to try it.
English
1
0
0
151
Daanish Khazi
Daanish Khazi@bertgodel·
We're already seeing strong reactions from physicians in early testing, and we're excited to get this in front of more clinicians and users. We are continuing to improve and train the model, and we’ll release upgraded checkpoints as training continues.
English
1
0
16
1.1K