Thomas Wolf

5K posts

Thomas Wolf banner
Thomas Wolf

Thomas Wolf

@Thom_Wolf

Co-founder at @HuggingFace - moonshots - angel

Katılım Şubat 2011
7.1K Takip Edilen114.1K Takipçiler
Sabitlenmiş Tweet
Thomas Wolf
Thomas Wolf@Thom_Wolf·
Shifting structures in a software world dominated by AI. Some first-order reflections (TL;DR at the end): Reducing software supply chains, the return of software monoliths – When rewriting code and understanding large foreign codebases becomes cheap, the incentive to rely on deep dependency trees collapses. Writing from scratch ¹ or extracting the relevant parts from another library is far easier when you can simply ask a code agent to handle it, rather than spending countless nights diving into an unfamiliar codebase. The reasons to reduce dependencies are compelling: a smaller attack surface for supply chain threats, smaller packaged software, improved performance, and faster boot times. By leveraging the tireless stamina of LLMs, the dream of coding an entire app from bare-metal considerations all the way up is becoming realistic. End of the Lindy effect – The Lindy effect holds that things which have been around for a long time are there for good reason and will likely continue to persist. It's related to Chesterton's fence: before removing something, you should first understand why it exists, which means removal always carries a cost. But in a world where software can be developed from first principles and understood by a tireless agent, this logic weakens. Older codebases can be explored at will; long-standing software can be replaced with far less friction. A codebase can be fully rewritten in a new language. ² Legacy software can be carefully studied and updated in situations where humans would have given up long ago. The catch: unknown unknowns remain unknown. The true extent of AI's impact will hinge on whether complete coverage of testing, edge cases, and formal verification is achievable. In an AI-dominated world, formal verification isn't optional—it's essential. The case for strongly typed languages – Historically, programming language adoption has been driven largely by human psychology and social dynamics. A language's success depended on a mix of factors: individual considerations like being easy to learn and simple to write correctly; community effects like how active and welcoming a community was, which in turn shaped how fast its ecosystem would grow; and fundamental properties like provable correctness, formal verification, and striking the right balance between dynamic and static checks—between the freedom to write anything and the discipline of guarding against edge cases and attacks. As the human factor diminishes, these dynamics will shift. Less dependence on human psychology will favor strongly typed, formally verifiable and/or high performance languages.³ These are often harder for humans to learn, but they're far better suited to LLMs, which thrive on formal verification and reinforcement learning environments. Expect this to reshape which languages dominate. Economic restructuring of open source – For decades, open-source communities have been built around humans finding connection through writing, learning, and using code together. In a world where most code is written—and perhaps more importantly, read—by machines, these incentives will start to break down.⁴ Communities of AIs building libraries and codebases together will likely emerge as a replacement, but such communities will lack the fundamentally human motivations that have driven open source until now. If the future of open-source development becomes largely devoid of humans, alignment of AI models won't just matter—it will be decisive. The future of new languages – Will AI agents face the same tradeoffs we do when developing or adopting new programming languages? Expressiveness vs. simplicity, safety vs. control, performance vs. abstraction, compile time vs. runtime, explicitness vs. conciseness. It's unclear that they will. In the long term, the reasons to create a new programming language will likely diverge significantly from the human-driven motivations of the past. There may well be an optimal programming language for LLMs—and there's no reason to assume it will resemble the ones humans have converged on. TL; DR: - Monoliths return – cheap rewriting kills dependency trees; smaller attack surface, better performance, bare-metal becomes realistic - Lindy effect weakens – legacy code loses its moat, but unknown unknowns persist; formal verification becomes essential - Strongly typed languages rise – human psychology mattered for adoption; now formal verification and RL environments favor types over ergonomics - Open source restructures – human connection drove the community; AI-written/read code breaks those incentives; alignment becomes decisive - New languages diverge – AI may not share our tradeoffs; optimal LLM programming languages may look nothing like what humans converged on ¹ x.com/mntruell/statu… ² x.com/anthropicai/st… ³ wesmckinney.com/blog/agent-erg…#issuecomment-3717222957" target="_blank" rel="nofollow noopener">github.com/tailwindlabs/t…
English
98
286
1.8K
1M
Thomas Wolf retweetledi
Unitree
Unitree@UnitreeRobotics·
🚀 Unitree open-sources UnifoLM-WBT-Dataset — a high-quality real-world humanoid robot whole-body teleoperation (WBT) dataset for open environments. 🥳Publicly available since March 5, 2026, the dataset will continue to receive high-frequency rolling updates. It aims to establish the most comprehensive real-world humanoid robot dataset in terms of scenario coverage, task complexity, and manipulation diversity. 👉 Explore the dataset here: huggingface.co/collections/un…
English
26
160
745
83.3K
Thomas Wolf
Thomas Wolf@Thom_Wolf·
Who would win when combining best algo(model+optimization)/data of the year? h/t @lvwerra
English
7
5
27
25.9K
Thomas Wolf retweetledi
Chroma
Chroma@trychroma·
Introducing Chroma Context-1, a 20B parameter search agent. > pushes the pareto frontier of agentic search > order of magnitude faster > order of magnitude cheaper > Apache 2.0, open-source
English
138
399
4.1K
1M
Sakana AI
Sakana AI@SakanaAILabs·
The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature Nature: nature.com/articles/s4158… Blog: sakana.ai/ai-scientist-n… When we first introduced The AI Scientist, we shared an ambitious vision of an agent powered by foundation models capable of executing the entire machine learning research lifecycle. From inventing ideas and writing code to executing experiments and drafting the manuscript, the system demonstrated that end-to-end automation of the scientific process is possible. Soon after, we shared a historic update: the improved AI Scientist-v2 produced the first fully AI-generated paper to pass a rigorous human peer-review process. Today, we are happy to announce that “The AI Scientist: Towards Fully Automated AI Research,” our paper describing all of this work, along with fresh new insights, has been published in @Nature! This Nature publication consolidates these milestones and details the underlying foundation model orchestration. It also introduces our Automated Reviewer, which matches human review judgments and actually exceeds standard inter-human agreement. Crucially, by using this reviewer to grade papers generated by different foundation models, we discovered a clear scaling law of science. As the underlying foundation models improve, the quality of the generated scientific papers increases correspondingly. This implies that as compute costs decrease and model capabilities continue to exponentially increase, future versions of The AI Scientist will be substantially more capable. Building upon our previous open-source releases (github.com/SakanaAI/AI-Sc…), this open-access Nature publication comprehensively details our system's architecture, outlines several new scaling results, and discusses the promise and challenges of AI-generated science. This substantial milestone is the result of a close and fruitful collaboration between researchers at Sakana AI, the University of British Columbia (UBC) and the Vector Institute, and the University of Oxford. Congrats to the team! @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru @jeffclune
GIF
English
48
388
1.9K
611.8K
Thomas Wolf retweetledi
Victor M
Victor M@victormustar·
Now available on Hugging Face: hf-mount 🧑‍🚀 The team really cooked, still wrapping my head everything possible but you can do things like: - mount a 5TB dataset as a local folder and query only the parts you need with DuckDB (✅ works) - browse any model repo with ls/cat like it's a USB drive - use a shared read-write bucket as a team drive for ML artifacts - drop the init container that downloads models in your k8s pods - point llama.cpp at a mounted GGUF and run inference (infinite storage??)
Victor M tweet media
English
16
44
263
28.1K
Thomas Wolf retweetledi
Leandro von Werra
Leandro von Werra@lvwerra·
Auto-research for ML training models is all the rage now, but underrated is: auto-research for data! Sure, you can squeeze out a bit of model performance by optimizing hyperparameters, but code agents can do data work that has been very labour intensive and required a lot of attention to a lot details effortlessly: > download data from many different data sources > bring all the data sources into uniform format > do detailed EDA: find patterns and outliers > look at 100s of samples and take detailed notes > make beautiful infographics rather than mpl plots > iterate on data filtering by looking at more samples > make a simple pipelines robust and scalable It's now possible to write data pipelines for dozens of data sources in hours that would have taken weeks of reading many docs, debugging APIs and data formats, wrangling outliers and missing data. A few weeks ago we gave Claude access to the CPU partition of our cluster and it iteratively refined filters to retrieve a domain subset of FineWeb. This would have taken me 2-3 days to work through while it took Claude just a few hours with almost no babysitting and with a nice logbook. Thus the long tail of small, niche data sources becomes more accessible and can be aggregated to even larger high quality datasets for cool applications. Data has been fuelling LLM progress more than model architecture innovations, so I am very excited about this!
English
11
29
272
21.1K
Thomas Wolf
Thomas Wolf@Thom_Wolf·
What are the best current techniques to have autoresearch behave better than (slightly improved) random search? By which I mean (in Sijun below example), having the agent understand that (given some constraints) exploring int5 quantization is more exciting and have more downstream fruits than playing with the random seed? I’m talking about the beginning of having an agent pushed a real research program. The ones where you know the current technique will not give crazy results out of the box but it still push it because it believe and can demonstrate that the general direction has potential. Like neural networks used to be a worse way to do AI performance-wise. But we still pushed them…
Sijun Tan@sijun_tan

We took @karpathy's autoresearch agent, scaled it into a collaborative swarm, and topped @OpenAI's Parameter Golf Challenge—twice. Here’s how we did it:

English
20
4
89
17.8K
Thomas Wolf retweetledi
Julien Chaumond
Julien Chaumond@julien_c·
hf-mount Attach any Storage Bucket, model or dataset from @huggingface as a local filesystem This is a game changer, as it allows you to attach remote storage that is 100x bigger than your local machine's disk. This is also perfect for Agentic storage!! Read-write for Storage Buckets, read-only for models and datasets. Here's an example with FineWeb-edu (a 5TB slice of the Web): 1️⃣> hf-mount start repo datasets/HuggingFaceFW/fineweb-edu /tmp/fineweb It takes a few seconds to mount, and then: 2️⃣> du -h -d1 /tmp/fineweb 4.1T ./data 1.2T ./sample 5.3T . 🤯😮 Two backends are available: NFS (recommended) and FUSE Let's f**ing go 💪
Julien Chaumond tweet media
English
9
27
119
19.6K
Thomas Wolf retweetledi
Daniel Hnyk
Daniel Hnyk@hnykda·
LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below
English
305
2.3K
9.4K
5.6M
Thomas Wolf retweetledi
Lewis Tunstall
Lewis Tunstall@_lewtun·
You can now pretrain LLMs entirely on the HF Hub 💥 Last week, @OpenAI launched a competition to see who can pretrain the best LLM in under 10 minutes. So over the weekend, I made a little demo to automate this end-to-end using the Hub as the infra layer: - Jobs to scale compute - Buckets to store all experiments - Trackio to log all the metrics The cool thing here is that everything is launched locally: no ssh shenanigans into a cluster or fighting with colleagues over storage and GPUs ⚔️ All that's left is coming up with new ideas, but luckily Codex can automate that part too 😁 Can I have a job now please @reach_vb 🙏?
GIF
English
14
41
249
75.5K
Thomas Wolf retweetledi
jack
jack@jack·
is the future value of "open source" code anymore? i believe it's shifting to data, provenance, protocols, evals, and weights. in that order.
English
957
777
7.4K
766.2K
Thomas Wolf retweetledi
Muratcan Koylan
Muratcan Koylan@koylanai·
If you're building anything in AI, the best skill you need to be using right now is hugging-face-paper-pages Whatever problem you're facing, someone has probably already published a paper about it. HF's Papers API gives a hybrid semantic search over AI papers. I wrote an internal skill, context-research, that orchestrates the HF Papers API into a research pipeline. It runs five parallel searches with keyword variants, triages by relevance and recency, fetches full paper content as markdown, then reads the actual methodology and results sections. The skill also chains into a deep research API that crawls the broader web to complement the academic findings. The gap between "a paper was published" and "a practitioner applies the insight" is shrinking, and I think this is a practical way to provide relevant context to coding agents. So you should write a skill on top of the HF Paper skill that teaches the model how to think about research, not just what to search for.
Muratcan Koylan tweet media
English
45
150
1.6K
97.1K
Thomas Wolf
Thomas Wolf@Thom_Wolf·
This is really cool. It got me thinking more deeply about personalized RL: what’s the real point of personalizing a model in a world where base models can become obsolete so quickly? The reality in AI is that new models ship every few weeks, each better than the last. And the pace is only accelerating, as we see on the Hugging Face Hub. We are not far away from better base models dropping daily. There’s a research gap in RL here that almost no one is working on. Most LLM personalization research assumes a fixed base model, but very few ask what happens to that personalization when you swap the base model. Think about going from Llama 3 to Llama 4. All the tuned preferences, reward signals, and LoRAs are suddenly tied to yesterday’s model. As a user or a team, you don’t want to reteach every new model your preferences. But you also don’t want to be stuck on an older one just because it knows you. We could call this "RL model transferability": how can an RL trace, a reward signal, or a preference representation trained on model N be distilled, stored, and automatically reapplied to model N+1 without too much user involvement? We solved that in SFT where a training dataset can be stored and reused to train a future model. We also tackled a version of that in RLHF phases somehow but it remain unclear more generally when using RL deployed in the real world. There are some related threads (RLTR for transferable reasoning traces, P-RLHF and PREMIUM for model-agnostic user representations, HCP for portable preference protocols) but the full loop seems under-studied to me. Some of these questions are about off-policy but other are about capabilities versus personalization: which of the old customizations/fixes does the new model already handle out of the box, and which ones are actually user/team-specific to ever be solved by default? That you would store in a skill for now but that RL allow to extend beyond the written guidance level. I have surely missed some work so please post any good work you’ve seen on this topic in the comments.
Ronak Malde@rronak_

This paper is almost too good that I didn't want to share it Ignore the OpenClaw clickbait, OPD + RL on real agentic tasks with significant results is very exciting, and moves us away from needing verifiable rewards Authors: @YinjieW2024 Xuyang Chen, Xialong Jin, @MengdiWang10 @LingYang_PU

English
33
64
739
117.8K
Thomas Wolf
Thomas Wolf@Thom_Wolf·
@OpenAI Very cool (and love to see Fineweb here). Are people allowed to iterate on the training data?
English
0
0
7
2.5K
kepano
kepano@kepano·
I have been working on Obsidian Reader for a over a year. I didn't want to share it until I felt it was good enough. It's finally there. Consistent formatting for any article. Outline, syntax highlighting, nice footnotes, adjustable typography. Runs locally. Just rules, no AI.
English
176
319
5.5K
329K
Workshop Labs
Workshop Labs@WorkshopLabs·
Letting a provider see all your data is the price of admission for AI. We're changing that. Introducing Silo, the first private post-training and inference stack for frontier models, with hardware-level guarantees that we can’t see your data. Privacy without compromises. 🧵
Workshop Labs tweet media
English
17
35
248
38K
Thomas Wolf
Thomas Wolf@Thom_Wolf·
@cjpedregal @soleio great to read that, I love granola. tbh MCP access felt already much more stable/reliable than the previous hacks indeed
English
0
0
7
1.2K
Chris Pedregal
Chris Pedregal@cjpedregal·
There are some tweets out there saying that Granola is trying to lock down access to your data. Tldr; we are actually trying to become more open, not closed. We’re launching a public API next week to complement our MCP. Read on for context. A couple months ago, we noticed that some folks had reversed engineered our local cache so they could access their meeting data. Our cache was not built for this (it can change at any point), so we launched our MCP to serve this need. The MCP gives full access to your notes and transcripts (all time for paid users, time restricted for free users). MCP usage has exploded since launch, so we felt good about it. A week ago, we updated how we store data in our cache and broke the workarounds. This is on us. Stupidly, we thought we had solved these use cases well enough with our MCP. We’ve now learned that while MCPs are great for connecting to tools like Claude or chatGPT, they don’t meet your needs for agents running locally or for data export / pipeline work. So we’re going to fix this for you ASAP. First, we’ll launch a public API next week to make it easier for you to pull your data. Second, we’ll figure out how to make Granola work better for agents running locally. Whether that’s expanding our MCP, launching a CLI, a local API, etc. The industry is moving quickly here, so we’d appreciate your suggestions. We want Granola data to be accessible and useful wherever you need it. Stay tuned.
English
98
40
803
150.4K
Thomas Wolf retweetledi
Elliot Arledge
Elliot Arledge@elliotarledge·
Karpathy asked. I delivered. Introducing OpenSquirrel! Written in pure rust with GPUI (same as zed) but with agents as central unit rather than files. Supports Claude Code, Codex, Opencode, and Cursor (cli). This really forced me to think up the UI/UX from first principles instead of relying on common electron slop. github.com/Infatoshi/Open…
Andrej Karpathy@karpathy

Expectation: the age of the IDE is over Reality: we’re going to need a bigger IDE (imo). It just looks very different because humans now move upwards and program at a higher level - the basic unit of interest is not one file but one agent. It’s still programming.

English
145
175
2.5K
410.9K
Christos Tzamos
Christos Tzamos@ChristosTzamos·
1/4 LLMs solve research grade math problems but struggle with basic calculations. We bridge this gap by turning them to computers. We built a computer INSIDE a transformer that can run programs for millions of steps in seconds solving even the hardest Sudokus with 100% accuracy
English
251
815
6.1K
1.8M