Sebastien Treguer

3K posts

Sebastien Treguer

Sebastien Treguer

@ST4Good

ML/AI research Curious to better understand the world we live in, the various forms of intelligences, alive & artificial, then live & create from that.

France Katılım Aralık 2010
1.1K Takip Edilen524 Takipçiler
Sebastien Treguer retweetledi
Christine Yip
Christine Yip@christinetyip·
We were inspired by @karpathy 's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared memory layer, agents can: - read and learn from prior experiments - avoid duplicate work - build on each other's results in real time
Christine Yip tweet mediaChristine Yip tweet media
English
122
258
2.4K
269.9K
Sebastien Treguer
Sebastien Treguer@ST4Good·
@karpathy Does the agent observe its own exploration/exploitation process/method and question it in order to optimise it and converge on the best solution more quickly, with less testing step? Can it discover unknown optimizations or create unknown metalearning approaches ?
English
0
0
0
24
Sebastien Treguer retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
965
2.1K
19.5K
3.6M
Sebastien Treguer
Sebastien Treguer@ST4Good·
@EthanHe_42 @karpathy It looks more like automating the exploration and optimization process of the ML/AI research scientist with any possible approaches ?
English
0
0
1
79
Ethan He
Ethan He@EthanHe_42·
@karpathy Reminds me of AutoML and neural architecture search. But with intelligence this time.
English
6
0
227
41.6K
Sebastien Treguer retweetledi
Finfox 🦇🔊
Finfox 🦇🔊@Finfox3·
I asked #GPT5 to make a detailed analysis comparing GPT-5 vs Grok 4. Surprisingly he made a gross confusion between Grok and Claude, considering that Grok 4 was an Anthropic model🫣. As a result, the comparison is totally irrelevant. Embarrassing for a so-called PhD level @OpenAI
English
1
1
2
183
Sebastien Treguer
Sebastien Treguer@ST4Good·
@karpathy In France we have groups of farmers/producers, organized to come to sell directly to consumers. It's a bit less flexible than a grocery shop, since you have to order in advance and collect at specific time but guaranteed local, fresh, mostly organic.
English
0
0
0
26
Andrej Karpathy
Andrej Karpathy@karpathy·
This is what the ideal grocery store looks like. Minimally processed (NOVA Group 1) food only (no "edible food-like substances"), organic, local, fresh. Food should not be more complex than this, yet I don't believe this exists.
Andrej Karpathy tweet media
English
532
451
6.1K
596.4K
Sebastien Treguer retweetledi
Finfox 🦇🔊
Finfox 🦇🔊@Finfox3·
🧠 #MIT study: how AI chatbots impact our brain activity and change how we think? Dive into the findings based on 4 months of data and what it means for our minds! (Hint: challenge your brain to avoid getting dull) shorturl.at/6YjBX #AI #Neuroscience
English
0
1
1
74
Sebastien Treguer retweetledi
Finfox 🦇🔊
Finfox 🦇🔊@Finfox3·
MiniMax-M1 China's new open source (Apache2.0) LLM (456B params, 1M token context) outperforms DeepSeek R1 and rivals GPT-4o in reasoning, coding, and long-context tasks—at 200x lower training cost. More details: shorturl.at/W4m0e GitHub shorturl.at/wt5dB
English
0
1
1
129
GOSIM Foundation
GOSIM Foundation@gosimfoundation·
🎉 Feeling lucky? Win exclusive discounts for Europe’s premier open-source AI event 🇫🇷✨ GOSIM AI Paris 2025 is all about open-source passion—let’s share the love! Join our Lucky Draw for a chance to unlock MASSIVE DISCOUNTS on your conference ticket! 🎟✨ 🔸 How to enter: Simply Follow, Re-Tweet & tag your AI enthusiast friends and DM us 🎁 Winners receive: Huge savings & exclusive perks to Europe’s most vibrant AI gathering! 💌 Let's spread the AI magic—join the fun today! #OpenSource #Paris #GOSIM #TechEvents #DiscountTickets #LLM #DeepSeek #AICommunity #AIConference #AIModels #EmbodiedAI #QWen #AIinfra #RAG #agentic #PyTorch #HuggingFace #GOSIMAI2025 #LuckyDraw #AI #AGI #OpenSource #Paris #GOSIM #TechEvents #DiscountTickets #LLM #DeepSeek #AICommunity #AIConference #AI2025 #AIModels #EmbodiedAI #QWen #AIinfra #RAG #agentic #OpenManus #autogen #camelai #PyTorch #SGLang #zenoh #dora #HuggingFace #StationF #AIApps
GOSIM Foundation tweet mediaGOSIM Foundation tweet mediaGOSIM Foundation tweet mediaGOSIM Foundation tweet media
English
11
26
25
4.6K
Sebastien Treguer
Sebastien Treguer@ST4Good·
@0xbasedalex It's a lighter implementation of the similar concepts. To compare them properly it would require to make an extensive and complete benchmark.
English
0
0
1
14
basedalex
basedalex@0xbasedalex·
@ST4Good How does it compare with OpenAI's deep research?
English
1
0
0
18
Thomas Dohmke
Thomas Dohmke@ashtom·
Today, we are infusing the power of agentic AI into the GitHub Copilot experience, elevating Copilot from pair to peer programmer 🤖 (1/4) github.blog/news-insights/…
English
250
726
4.8K
1.4M
Sebastien Treguer retweetledi
Thomas Dohmke
Thomas Dohmke@ashtom·
1️⃣New Agent Mode: With agent mode in VS Code, Copilot goes beyond your initial request, completing all necessary subtasks and even inferring unspecified tasks. Agent mode allows Copilot to iterate on its own code, propose and guide terminal commands, and analyze and resolve run-time errors. Available today for VS Code Insiders 💫 (2/4)
English
18
38
595
95.5K
Sebastien Treguer
Sebastien Treguer@ST4Good·
8/ Both models are groundbreaking in their own ways. The "best" choice depends on your needs—speed vs. scalability, simplicity vs. complexity, or cost vs. energy efficiency! 💡✨ Which one would you pick for your next project? Let me know below! 👇 #AI #MachineLearning
English
0
0
0
31
Sebastien Treguer
Sebastien Treguer@ST4Good·
Open AI o3-mini vs Deepseek-R1. Two cutting-edge AI models, each excelling in different domains. Let's dive into how they compare across benchmarks, efficiency, and use cases. Ready? Let’s go! 🚀 Thread 🧵👇
English
7
0
1
112
Sebastien Treguer
Sebastien Treguer@ST4Good·
7/ So, which one should you choose? 🤔 Go with o3-mini if you need speed, cost-efficiency, or large-context handling. 🚀💼 Choose DeepSeek R1 for energy-efficient operations or complex reasoning/coding tasks at scale. 🌿🧩
English
0
0
0
49
Sebastien Treguer
Sebastien Treguer@ST4Good·
6/ Use Cases o3-mini: Perfect for real-time decision-making, large-context tasks (200K tokens!), and simpler coding workflows. 🕒📜 DeepSeek R1: Excels in batch processing, advanced research queries, and energy-efficient large-scale tasks. 🌐💡
English
0
0
0
47
Sebastien Treguer
Sebastien Treguer@ST4Good·
5/ Architectural Insights o3-mini: Dense transformer = consistent performance across tasks. DeepSeek R1: Mixture-of-Experts (MoE) = scalable, energy-efficient for large workloads. Different architectures, different strengths! 🏗️🛠️
English
0
0
0
37
Sebastien Treguer
Sebastien Treguer@ST4Good·
4/ Efficiency Metrics DeepSeek wins on energy & throughput, while o3-mini has lower memory needs & faster response times! ⚡🔋
Sebastien Treguer tweet media
English
0
0
0
51
Sebastien Treguer
Sebastien Treguer@ST4Good·
3/ Coding Benchmarks o3-mini dominates competitive programming (Codeforces ELO: 2130). DeepSeek R1 excels in complex outputs like 3D animations & intricate algorithms. o3-mini = speed & simplicity. DeepSeek = complexity & creativity. 💻⚡
English
0
0
0
47