Alexandre Robicquet

614 posts

Alexandre Robicquet banner
Alexandre Robicquet

Alexandre Robicquet

@AlexandreRbcqt

Research @openai. AI/ML @stanford research

San Francisco, CA เข้าร่วม Ağustos 2013
230 กำลังติดตาม1.1K ผู้ติดตาม
Alexandre Robicquet รีทวีตแล้ว
Kevin Weil 🇺🇸
Kevin Weil 🇺🇸@kevinweil·
A look at the future/present
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
13
8
232
26.6K
Alexandre Robicquet
Alexandre Robicquet@AlexandreRbcqt·
magic on
Andrej Karpathy@karpathy

@tobi Who knew early singularity could be this fun? :) I just confirmed that the improvements autoresearch found over the last 2 days of (~650) experiments on depth 12 model transfer well to depth 24 so nanochat is about to get a new leaderboard entry for “time to GPT-2” too. Works 🤷‍♂️

English
0
0
1
42
Alexandre Robicquet รีทวีตแล้ว
Epoch AI
Epoch AI@EpochAIResearch·
GPT-5.4 set a new record on FrontierMath, our benchmark of extremely challenging math problems! We had pre-release access to evaluate the model. On Tiers 1–3, GPT-5.4 Pro scored 50%. On Tier 4 it scored 38%. See thread for commentary and additional experiments.
Epoch AI tweet media
English
30
110
903
120.5K
Alexandre Robicquet รีทวีตแล้ว
Crossing Minds
Crossing Minds@Crossing_Minds·
After 7 years of building @Crossing_Minds , our team is joining @OpenAI. We’ve poured everything into building better retrieval, personalization, and real-time AI. Now we get to bring that work to a mission we deeply believe in. Let’s build what’s next — together. 🧠⚡️ #AGI
Crossing Minds tweet media
English
5
10
56
47.5K
Alexandre Robicquet รีทวีตแล้ว
Sumit
Sumit@_reachsumit·
ICLERB: In-Context Learning Embedding and Reranker Benchmark Introduces an evaluation framework for assessing retrieval models based on their ability to enhance LLM accuracy in in-context learning tasks. 📝arxiv.org/abs/2411.18947
English
0
3
8
798
Alexandre Robicquet
Alexandre Robicquet@AlexandreRbcqt·
Your mood is a reflection of your daily habits Your relationships are a reflection of your daily habits Your mindset is a reflection of your daily habits Your health is a reflection of your daily habits Your future will be a reflection of your daily habits
English
0
0
1
160
Alexandre Robicquet รีทวีตแล้ว
Sumit
Sumit@_reachsumit·
RAG Does Not Work for Enterprises Explores the challenges and requirements for implementing RAG in enterprises proposing potential solutions like semantic search and hybrid queries, and an evaluation framework to validate enterprise-grade RAG solutions 📝arxiv.org/abs/2406.04369
Sumit tweet media
English
18
140
779
107K
Alexandre Robicquet
Alexandre Robicquet@AlexandreRbcqt·
quote of the day for me “Unspoken expectations are premeditated resentments.” ― Neil Strauss
English
1
0
1
284
Alexandre Robicquet
Alexandre Robicquet@AlexandreRbcqt·
My brain automatically disconnect the moment I read “delve” on anybody’s post Not sure if it is the good reflex since the form doesn’t always match the content, but it’s an interesting allergic reaction reflecting surely some Chat GPT abuse from all of us sharing “original” thoughts on social media …
English
0
0
1
279
Alexandre Robicquet รีทวีตแล้ว
Jeff Dean
Jeff Dean@JeffDean·
Introducing Gemma - a family of lightweight, state-of-the-art open models for their class, built from the same research & technology used to create the Gemini models. Blog post: blog.google/technology/dev… Tech report: goo.gle/GemmaReport This thread explores some of the performance characteristics of these models.
Jeff Dean tweet media
English
98
782
4.3K
1.3M
Alexandre Robicquet รีทวีตแล้ว
Sundar Pichai
Sundar Pichai@sundarpichai·
Seeing some qs on what Gemini *is* (beyond the zodiac :). Best way to understand Gemini’s underlying amazing capabilities is to see them in action, take a look ⬇️
English
1.2K
6.5K
32K
8.2M
Alexandre Robicquet รีทวีตแล้ว
Romain Huet
Romain Huet@romainhuet·
To all developers and customers building on @OpenAI, we are here for you. ❤️ Our commitment to serve you remains unwavering, and we are continuing to prioritize stability and security of our systems.
Logan Kilpatrick@OfficialLoganK

A note to @OpenAI developers 🫶: I wanted to express my appreciation for all the warm, thoughtful, and supportive messages I got and I’ve seen posted across the community. Despite a moment of uncertainty, our commitment to developers remained steadfast. In the meantime, please know that we are continuing to prioritize stability and security of our systems. Our engineering team remains on-call and actively monitoring our services. In all events, our commitment remains to our customers and our mission. Thank you for your continued trust.

English
22
27
360
70.6K
Alexandre Robicquet
Alexandre Robicquet@AlexandreRbcqt·
Always have been a huge believer in small incremental adjustments to be the foundations of great changes One habit at a time “How to Create a Good Habit The 1st law (Cue): Make it obvious. The 2nd law (Craving): Make it attractive. The 3rd law (Response): Make it easy. The 4th law (Reward): Make it satisfying”. Atomic Habits by @JamesClear
English
0
0
2
330
Alexandre Robicquet
Alexandre Robicquet@AlexandreRbcqt·
🚀 Exciting News from the @Crossing_Minds Team 🚀 Last week marked a significant milestone for our team at Crossing Minds. I am thrilled to announce that after months of relentless dedication and rigorous evaluation, we have received the "Shopify Plus App Certification." Achieving this certification is not just a badge of honor for us; it signifies the validation of our commitment to offering unparalleled recommendation solutions for e-commerce platforms. At Crossing Minds, our guiding principle has always been to personalize the entire web. We believe every customer is unique, and their shopping experience should mirror this individuality. The @ShopifyPlus Certification stands as a testament to the compatibility, security, and performance of our app on one of the world's leading e-commerce platforms. A massive thank you to our team for their ceaseless passion and the countless hours they've put into making this possible. I also would like to thank directly people at Shopify who have been more than supportive, thank you to @JordanaFuller, Brian Peters and Acca Yeung and the @ShopifyEng and partner team. And to our partners and clients, thank you for your trust and collaboration. Together, we'll continue to make online shopping experiences more intuitive, engaging, and, most importantly, personal. #CrossingMindsUpdate #ShopifyPlusCertified #Ecommerce #PersonalizationMatters #DigitalTransformation #EcommerceEvolution #RetailTech #TechMilestone #BusinessGrowth
GIF
English
0
0
2
289