Sebastien Treguer

3K posts

Sebastien Treguer

@ST4Good

ML/AI research Curious to better understand the world we live in, the various forms of intelligences, alive & artificial, then live & create from that.

France Katılım Aralık 2010

1.1K Takip Edilen524 Takipçiler

Sebastien Treguer retweetledi

Christine Yip@christinetyip·12 Mar

We were inspired by @karpathy 's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared memory layer, agents can: - read and learn from prior experiments - avoid duplicate work - build on each other's results in real time

English

122

258

2.4K

269.9K

Sebastien Treguer@ST4Good·10 Mar

@karpathy Does the agent observe its own exploration/exploitation process/method and question it in order to optimise it and converge on the best solution more quickly, with less testing step? Can it discover unknown optimizations or create unknown metalearning approaches ?

English

Sebastien Treguer retweetledi

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

965

2.1K

19.5K

3.6M

Sebastien Treguer@ST4Good·10 Mar

@EthanHe_42 @karpathy It looks more like automating the exploration and optimization process of the ML/AI research scientist with any possible approaches ?

English

Ethan He@EthanHe_42·10 Mar

@karpathy Reminds me of AutoML and neural architecture search. But with intelligence this time.

English

227

41.6K

Sebastien Treguer retweetledi

Finfox 🦇🔊@Finfox3·8 Ağu

I asked #GPT5 to make a detailed analysis comparing GPT-5 vs Grok 4. Surprisingly he made a gross confusion between Grok and Claude, considering that Grok 4 was an Anthropic model🫣. As a result, the comparison is totally irrelevant. Embarrassing for a so-called PhD level @OpenAI

English

183

Sebastien Treguer@ST4Good·9 Tem

@karpathy In France we have groups of farmers/producers, organized to come to sell directly to consumers. It's a bit less flexible than a grocery shop, since you have to order in advance and collect at specific time but guaranteed local, fresh, mostly organic.

English

Andrej Karpathy@karpathy·8 Tem

This is what the ideal grocery store looks like. Minimally processed (NOVA Group 1) food only (no "edible food-like substances"), organic, local, fresh. Food should not be more complex than this, yet I don't believe this exists.

English

532

451

6.1K

596.4K

Sebastien Treguer retweetledi

Finfox 🦇🔊@Finfox3·20 Haz

🧠 #MIT study: how AI chatbots impact our brain activity and change how we think? Dive into the findings based on 4 months of data and what it means for our minds! (Hint: challenge your brain to avoid getting dull) shorturl.at/6YjBX #AI #Neuroscience

English

Sebastien Treguer retweetledi

Finfox 🦇🔊@Finfox3·19 Haz

MiniMax-M1 China's new open source (Apache2.0) LLM (456B params, 1M token context) outperforms DeepSeek R1 and rivals GPT-4o in reasoning, coding, and long-context tasks—at 200x lower training cost. More details: shorturl.at/W4m0e GitHub shorturl.at/wt5dB

English

129

Sebastien Treguer@ST4Good·5 May

@gosimfoundation @Finfox3

QAM

GOSIM Foundation@gosimfoundation·17 Nis

🎉 Feeling lucky? Win exclusive discounts for Europe’s premier open-source AI event 🇫🇷✨ GOSIM AI Paris 2025 is all about open-source passion—let’s share the love! Join our Lucky Draw for a chance to unlock MASSIVE DISCOUNTS on your conference ticket! 🎟✨ 🔸 How to enter: Simply Follow, Re-Tweet & tag your AI enthusiast friends and DM us 🎁 Winners receive: Huge savings & exclusive perks to Europe’s most vibrant AI gathering! 💌 Let's spread the AI magic—join the fun today! #OpenSource #Paris #GOSIM #TechEvents #DiscountTickets #LLM #DeepSeek #AICommunity #AIConference #AIModels #EmbodiedAI #QWen #AIinfra #RAG #agentic #PyTorch #HuggingFace #GOSIMAI2025 #LuckyDraw #AI #AGI #OpenSource #Paris #GOSIM #TechEvents #DiscountTickets #LLM #DeepSeek #AICommunity #AIConference #AI2025 #AIModels #EmbodiedAI #QWen #AIinfra #RAG #agentic #OpenManus #autogen #camelai #PyTorch #SGLang #zenoh #dora #HuggingFace #StationF #AIApps

English

4.6K

Sebastien Treguer@ST4Good·14 Şub

The repo has already been ported in python for non js folks github.com/epuerta9/deep-…

English

Sebastien Treguer@ST4Good·14 Şub

Open Deep Research, an #opensource #AI assistant combining search engines, web scraping, and LLMs for comprehensive results: - Iterative deep dives - Smart query generation - Customizable depth & breadth - Detailed markdown reports #ResearchTool github.com/dzhng/deep-res…

English

313

Sebastien Treguer@ST4Good·14 Şub

@0xbasedalex It's a lighter implementation of the similar concepts. To compare them properly it would require to make an extensive and complete benchmark.

English

basedalex@0xbasedalex·14 Şub

@ST4Good How does it compare with OpenAI's deep research?

English

Sebastien Treguer@ST4Good·14 Şub

@ashtom I can't wait to play with it.

English

Thomas Dohmke@ashtom·6 Şub

Today, we are infusing the power of agentic AI into the GitHub Copilot experience, elevating Copilot from pair to peer programmer 🤖 (1/4) github.blog/news-insights/…

English

250

726

4.8K

1.4M

Sebastien Treguer retweetledi

Thomas Dohmke@ashtom·6 Şub

1️⃣New Agent Mode: With agent mode in VS Code, Copilot goes beyond your initial request, completing all necessary subtasks and even inferring unspecified tasks. Agent mode allows Copilot to iterate on its own code, propose and guide terminal commands, and analyze and resolve run-time errors. Available today for VS Code Insiders 💫 (2/4)

English

595

95.5K

Sebastien Treguer@ST4Good·4 Şub

8/ Both models are groundbreaking in their own ways. The "best" choice depends on your needs—speed vs. scalability, simplicity vs. complexity, or cost vs. energy efficiency! 💡✨ Which one would you pick for your next project? Let me know below! 👇 #AI #MachineLearning

English

Sebastien Treguer@ST4Good·4 Şub

Open AI o3-mini vs Deepseek-R1. Two cutting-edge AI models, each excelling in different domains. Let's dive into how they compare across benchmarks, efficiency, and use cases. Ready? Let’s go! 🚀 Thread 🧵👇

English

112

Sebastien Treguer@ST4Good·4 Şub

7/ So, which one should you choose? 🤔 Go with o3-mini if you need speed, cost-efficiency, or large-context handling. 🚀💼 Choose DeepSeek R1 for energy-efficient operations or complex reasoning/coding tasks at scale. 🌿🧩

English

Sebastien Treguer@ST4Good·4 Şub

6/ Use Cases o3-mini: Perfect for real-time decision-making, large-context tasks (200K tokens!), and simpler coding workflows. 🕒📜 DeepSeek R1: Excels in batch processing, advanced research queries, and energy-efficient large-scale tasks. 🌐💡

English

Sebastien Treguer@ST4Good·4 Şub

5/ Architectural Insights o3-mini: Dense transformer = consistent performance across tasks. DeepSeek R1: Mixture-of-Experts (MoE) = scalable, energy-efficient for large workloads. Different architectures, different strengths! 🏗️🛠️

English

Sebastien Treguer@ST4Good·4 Şub

4/ Efficiency Metrics DeepSeek wins on energy & throughput, while o3-mini has lower memory needs & faster response times! ⚡🔋

English

Sebastien Treguer@ST4Good·4 Şub

3/ Coding Benchmarks o3-mini dominates competitive programming (Codeforces ELO: 2130). DeepSeek R1 excels in complex outputs like 3D animations & intricate algorithms. o3-mini = speed & simplicity. DeepSeek = complexity & creativity. 💻⚡

English

Keşfet

@karpathy @EthanHe_42 @OpenAI @gosimfoundation @Finfox3 @0xbasedalex @ashtom @elonmusk