Alexandre Robicquet

614 posts

Alexandre Robicquet

@AlexandreRbcqt

Research @openai. AI/ML @stanford research

San Francisco, CA เข้าร่วม Ağustos 2013

230 กำลังติดตาม1.1K ผู้ติดตาม

Alexandre Robicquet รีทวีตแล้ว

Kevin Weil 🇺🇸@kevinweil·10 Mar

A look at the future/present

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

232

26.6K

Alexandre Robicquet@AlexandreRbcqt·11 Mar

find of the day: app.topology.vc/scientific-ai-…

English

Alexandre Robicquet@AlexandreRbcqt·9 Mar

magic on

Andrej Karpathy@karpathy

@tobi Who knew early singularity could be this fun? :) I just confirmed that the improvements autoresearch found over the last 2 days of (~650) experiments on depth 12 model transfer well to depth 24 so nanochat is about to get a new leaderboard entry for “time to GPT-2” too. Works 🤷‍♂️

English

Alexandre Robicquet รีทวีตแล้ว

Epoch AI@EpochAIResearch·5 Mar

GPT-5.4 set a new record on FrontierMath, our benchmark of extremely challenging math problems! We had pre-release access to evaluate the model. On Tiers 1–3, GPT-5.4 Pro scored 50%. On Tier 4 it scored 38%. See thread for commentary and additional experiments.

English

110

903

120.5K

Alexandre Robicquet รีทวีตแล้ว

Crossing Minds@Crossing_Minds·26 Haz

After 7 years of building @Crossing_Minds , our team is joining @OpenAI. We’ve poured everything into building better retrieval, personalization, and real-time AI. Now we get to bring that work to a mission we deeply believe in. Let’s build what’s next — together. 🧠⚡️ #AGI

English

47.5K

Alexandre Robicquet รีทวีตแล้ว

Ching-Wei Chen@cweichen·2 Ara

Do you clerb? ICLERB! Introducing the In-Context Learning Embedding and Reranker Benchmark (#ICLERB)! It evaluates retrieval models for #LLM in-context learning, based on downstream task performance, not text similarity. 📖: arxiv.org/abs/2411.18947 🏆: huggingface.co/spaces/crossin…

GIF

English

867

Alexandre Robicquet รีทวีตแล้ว

Sumit@_reachsumit·2 Ara

ICLERB: In-Context Learning Embedding and Reranker Benchmark Introduces an evaluation framework for assessing retrieval models based on their ability to enhance LLM accuracy in in-context learning tasks. 📝arxiv.org/abs/2411.18947

English

798

Alexandre Robicquet รีทวีตแล้ว

Ching-Wei Chen@cweichen·21 Kas

🚀 Introducing Claude François🕺, an AI Code Reviewer in the style of @fchollet! Try it with any #Github PR: claude-francois.crossingminds.com It uses @Crossing_Minds #RAGSys in-context learning with @AnthropicAI's #Claude #LLM. Learn more here: linkedin.com/posts/cweichen… #ClaudeFrancois

English

1.1K

Alexandre Robicquet@AlexandreRbcqt·20 Kas

A few thoughts on the growing Place of #DPO in Gen AI link.medium.com/VWcy7n7bFOb

English

790

Alexandre Robicquet@AlexandreRbcqt·9 Kas

Your mood is a reflection of your daily habits Your relationships are a reflection of your daily habits Your mindset is a reflection of your daily habits Your health is a reflection of your daily habits Your future will be a reflection of your daily habits

English

160

Alexandre Robicquet รีทวีตแล้ว

Sumit@_reachsumit·10 Haz

RAG Does Not Work for Enterprises Explores the challenges and requirements for implementing RAG in enterprises proposing potential solutions like semantic search and hybrid queries, and an evaluation framework to validate enterprise-grade RAG solutions 📝arxiv.org/abs/2406.04369

English

140

779

107K

Alexandre Robicquet@AlexandreRbcqt·18 May

quote of the day for me “Unspoken expectations are premeditated resentments.” ― Neil Strauss

English

284

Alexandre Robicquet@AlexandreRbcqt·18 May

My brain automatically disconnect the moment I read “delve” on anybody’s post Not sure if it is the good reflex since the form doesn’t always match the content, but it’s an interesting allergic reaction reflecting surely some Chat GPT abuse from all of us sharing “original” thoughts on social media …

English

279

Alexandre Robicquet รีทวีตแล้ว

Jeff Dean@JeffDean·21 Şub

Introducing Gemma - a family of lightweight, state-of-the-art open models for their class, built from the same research & technology used to create the Gemini models. Blog post: blog.google/technology/dev… Tech report: goo.gle/GemmaReport This thread explores some of the performance characteristics of these models.

English

782

4.3K

1.3M

Alexandre Robicquet รีทวีตแล้ว

Ching-Wei Chen@cweichen·14 Şub

I'm very proud to announce that our paper on using pre-trained image-to-text models for eCommerce has been accepted to the ISIR-eCom Workshop at #WSDM2024! Congrats to @jasonjytang @marieiag @Crossing_Minds! Preprint: arxiv.org/abs/2402.08532 Workshop: isir-ecom.github.io

English

529

Alexandre Robicquet รีทวีตแล้ว

Sundar Pichai@sundarpichai·6 Ara

Seeing some qs on what Gemini *is* (beyond the zodiac :). Best way to understand Gemini’s underlying amazing capabilities is to see them in action, take a look ⬇️

English

1.2K

6.5K

32K

8.2M

Alexandre Robicquet รีทวีตแล้ว

Romain Huet@romainhuet·20 Kas

To all developers and customers building on @OpenAI, we are here for you. ❤️ Our commitment to serve you remains unwavering, and we are continuing to prioritize stability and security of our systems.

Logan Kilpatrick@OfficialLoganK

A note to @OpenAI developers 🫶: I wanted to express my appreciation for all the warm, thoughtful, and supportive messages I got and I’ve seen posted across the community. Despite a moment of uncertainty, our commitment to developers remained steadfast. In the meantime, please know that we are continuing to prioritize stability and security of our systems. Our engineering team remains on-call and actively monitoring our services. In all events, our commitment remains to our customers and our mission. Thank you for your continued trust.

English

360

70.6K

Alexandre Robicquet@AlexandreRbcqt·13 Kas

Check out my latest article: AI & eCommerce: Part 2 - Why does it matter? linkedin.com/pulse/ai-ecomm… via @LinkedIn

English

206

Alexandre Robicquet@AlexandreRbcqt·29 Eki

Always have been a huge believer in small incremental adjustments to be the foundations of great changes One habit at a time “How to Create a Good Habit The 1st law (Cue): Make it obvious. The 2nd law (Craving): Make it attractive. The 3rd law (Response): Make it easy. The 4th law (Reward): Make it satisfying”. Atomic Habits by @JamesClear

English

330

Alexandre Robicquet@AlexandreRbcqt·10 Eki

🚀 Exciting News from the @Crossing_Minds Team 🚀 Last week marked a significant milestone for our team at Crossing Minds. I am thrilled to announce that after months of relentless dedication and rigorous evaluation, we have received the "Shopify Plus App Certification." Achieving this certification is not just a badge of honor for us; it signifies the validation of our commitment to offering unparalleled recommendation solutions for e-commerce platforms. At Crossing Minds, our guiding principle has always been to personalize the entire web. We believe every customer is unique, and their shopping experience should mirror this individuality. The @ShopifyPlus Certification stands as a testament to the compatibility, security, and performance of our app on one of the world's leading e-commerce platforms. A massive thank you to our team for their ceaseless passion and the countless hours they've put into making this possible. I also would like to thank directly people at Shopify who have been more than supportive, thank you to @JordanaFuller, Brian Peters and Acca Yeung and the @ShopifyEng and partner team. And to our partners and clients, thank you for your trust and collaboration. Together, we'll continue to make online shopping experiences more intuitive, engaging, and, most importantly, personal. #CrossingMindsUpdate #ShopifyPlusCertified #Ecommerce #PersonalizationMatters #DigitalTransformation #EcommerceEvolution #RetailTech #TechMilestone #BusinessGrowth

GIF

English

289

ค้นพบ

@Crossing_Minds @OpenAI @fchollet @AnthropicAI @jasonjytang @marieiag @LinkedIn @JamesClear