Clémentine Fourrier 🍊 is off till Dec 2026 hiking

4.2K posts

Clémentine Fourrier 🍊 is off till Dec 2026 hiking banner
Clémentine Fourrier 🍊 is off till Dec 2026 hiking

Clémentine Fourrier 🍊 is off till Dec 2026 hiking

@clefourrier

Evals/dogs @HuggingFace ✨ "The future is already here, it’s just not very evenly distributed" (Gibson)

Katılım Ekim 2019
414 Takip Edilen6.1K Takipçiler
Clémentine Fourrier 🍊 is off till Dec 2026 hiking retweetledi
Jimmy Lin
Jimmy Lin@lintool·
👀Introducing a brand new @yupp_ai SVG leaderboard ranking frontier models on the generation of coherent and visually appealing SVGs! Gemini 3 Pro by @GoogleDeepMind takes the crown as the most powerful model! 👏 We’re also releasing a public SVG dataset. Details in🧵
Jimmy Lin tweet media
English
32
68
456
69.7K
Clémentine Fourrier 🍊 is off till Dec 2026 hiking
@swyx @huggingface The "time" subset of GAIA2, very tricky for models atm - and we'll probably see very, very good progress on games based benchmarks (with stronger game based benchs coming out) But my biggest prediction is that the main trend of 2026 will be ML for science ^^ (so mb benchs there)
English
0
0
1
93
swyx
swyx@swyx·
@clefourrier @huggingface what is your bet on what benchmarks will be “solved” by the time you get back? fun prediction exercise
English
1
0
0
121
Himanshu Kumar
Himanshu Kumar@codewithimanshu·
@clefourrier @huggingface Cheers to your sabbatical, Clementine! That's a long break, but I wonder if the tech will change much by then, eh?
English
1
0
1
398
🌴Okosisi🌴
🌴Okosisi🌴@TheOkosisi·
This is so good @clefourrier . Can i seek your blessings to use this in my abstract research work and reference it continuously in my soon to be broadcasted LLM thoughts newsletter. ( Will ensure all due credits and mentions are pointed towards yourself and your incredible work). This work alone can lead the proper sensitisation of beginners and professionals alike on the nuances of LLM EVALS. This is structure!
English
1
0
1
458
Clémentine Fourrier 🍊 is off till Dec 2026 hiking
The guide is very beginner friendly, as we go from the basics of tokenization/inference to the nits and tricks of running eval properly, so it's compatible with all levels. Should contain most of what we wrote about evals at HF in a single unified place, with updates ofc :)
English
1
1
30
6.7K
Clémentine Fourrier 🍊 is off till Dec 2026 hiking retweetledi
elie
elie@eliebakouch·
as a researcher, it makes no sense to compare reasoning vs non reasoning models on benches like the ones in Artificial Analysis without normalizing somehow by cost or output tokens. non reasoning models (base/instruct) are important for the open ecosystem since research teams and companies will use them to do RL or other things (like synthetic generation) for specific verticals (think cursor/windsurf) as a user, i get that you don’t care whether the model is reasoning or not, you judge speed, cost, and accuracy (and memory if you want to deploy your model locally) the only advantage of non reasoning models would be speed/cost because they generate fewer tokens BUT speed and cost also depend on other thing like infra -> for speed see how fast some models get on groq or cerebras -> for cost model like deepseek are so cheap that there is very few use case where you'd want to use non reasoning model anyway
English
8
8
92
13.3K