Kunvar Thaman

920 posts

Kunvar Thaman banner
Kunvar Thaman

Kunvar Thaman

@__kunvar__

Taking apart neural networks and putting them back together for a living. prev @si_pbc and @Akamai

Inside a computer Katılım Aralık 2022
876 Takip Edilen3.2K Takipçiler
Kunvar Thaman
Kunvar Thaman@__kunvar__·
"fusing entire training runs into a single kernel and even weirder stuff" what obsessing over whole milk gets you
Flapping Airplanes@flappyairplanes

(1/5) Great to be at @sequoia to give a sneak peek of one of our research directions! TL;DR one path to data-efficiency may be to “abuse GPUs like they’ve never been abused before”

English
0
0
41
6.1K
mehul
mehul@alienpisscrack·
@__kunvar__ congratulations man, do you have a preprint we can checkout?
English
1
0
4
2.4K
Kunvar Thaman
Kunvar Thaman@__kunvar__·
Yes! my solo-authored paper Reward Hacking Benchmark was accepted to ICML :))) We put LLM agents in a tool-rich sandbox, give them multi-step workflows, and measure when they solve the intended task vs take unexpected shortcuts (like monkeypatching files at runtime!) 1/3
English
91
156
1.6K
232.9K
Kunvar Thaman
Kunvar Thaman@__kunvar__·
Big thanks to @except_raised for funding the project and helping me scale to richer environments and better models!
English
5
3
113
8.8K
Kunvar Thaman
Kunvar Thaman@__kunvar__·
It was a pleasure to work on this project last year as an Independent researcher and I learnt so many cool things! Full paper, code, and blog post coming soon!
English
2
3
110
12.2K
Kunvar Thaman
Kunvar Thaman@__kunvar__·
the entire @si_pbc team is incredibly thoughtful and smart and cares about making agi go well. they did incredible work with fdm-1 to get good in distribution priors and now they're gonna scale and mog everyone on computer use. super excited!
Standard Intelligence@si_pbc

We’ve raised 75m in new funding from Sequoia and Spark Capital—partnering with @sonyatweetybird, @MikowaiA, and @YasminRazavi, all of whom are deeply supportive of our long-term mission. We’ve also brought on angels & advisors including @karpathy, @tszzl, and @_milankovac_. ----- Our early results with FDM-1 moved computer use from a data-constrained regime to a compute-constrained one; this latest round of funding unlocks several orders of magnitude of compute scaling for that work. With the FDM model series we have a path to scale agentic capabilities through video pretraining, and we expect to achieve superhuman performance on general computer tasks in the same way that current language models have superhuman performance on coding tasks. We’re also now able to invest in the blue-sky research necessary to our long term mission of building aligned general learners. To realize the civilizationally transformative impacts of AI, models must generalize far out of their training distributions, actively exploring and building skills in new environments. This capability represents a substantial shift from the current paradigm of model training. We believe that current alignment techniques are insufficient to predictably and safely steer a model with human-level learning capabilities, and so we’re doing work to study small versions of this problem in controlled environments to develop a science of alignment for general learners. We’re a team of 6 people in San Francisco. We’re hiring world-class researchers and engineers to help us achieve our mission. If that’s you, please get in touch.

English
0
0
24
3.4K
Kunvar Thaman retweetledi
gavin leech (Non-Reasoning)
Security things you could do rn * Turn on Google Advanced Protection. Takes 10 seconds. * Buy 4 yubikeys * Freeze your credit * Put your crypto into cold storage (or sell). * the usual: KeePass, Signal * move off of banks which don't offer 2FA. They are telling on themselves
English
8
37
1.1K
171.2K
Kunvar Thaman retweetledi
Standard Intelligence
Standard Intelligence@si_pbc·
Computer use models shouldn't learn from screenshots. We built a new foundation model that learns from video like humans do. FDM-1 can construct a gear in Blender, find software bugs, and even drive a real car through San Francisco using arrow keys.
GIF
English
189
403
3.9K
1.2M
redJ
redJ@sudoredj·
@ebarschkis flapping airplane would be kinda cool
English
2
0
16
40.8K
Kunvar Thaman retweetledi
Kushal Thaman
Kushal Thaman@kushal1t·
I spent a bunch of time a year ago thinking about the data wall. A blackpill at the time for me was when I realized that the total stock of natural text data is depleting much faster than Chinchilla's infamous 20 tokens per param compute optimal ratio suggested. Here is a naive BOTEC from back then: Famously, Chinchilla showed that using about 20 tokens per param was compute optimal, measured at 6*10^23 FLOPs. It turns out that even though MoEs are more compute efficient than dense models, training them compute optimally needs a lot more data! In fact, at a 1:32 (97%) sparsity it uses ~6x more tokens per active params (see [1]). The Llama 3 405B report measured 40 token per param to be optimal with their data at 4*10^25 FLOPs. And for a 1:32 sparse MoE model such as DeepSeek v3, this suggests 240 tokens per param could well end up being optimal! At this ratio, things would break down. A 4*10^27 FLOPs model (a pretraining run that might be planned e.g. for 2026) will need 400T tokens. A 5*10^28 FLOPs model would require O(1400T) tokens. These are insane numbers, and they only get worse into the 2030s! The totally unfiltered Common Crawl is about 240T tokens. People have been offsetting this to some extent by training for multiple epochs or repeating the same data a la "Scaling Data-Constrained Language Models" by Muennighoff et al. (2023). Of course, this is a naive BOTEC, and I'm happy to dive into more details, e.g. how much compute might be put into other uses, such as long-horizon RLVR which could well require a lot of those 5*10^28 FLOPs. But we are casually talking about hundreds of trillions to over a quadrillion tokens as compute-optimal! It makes one question whether these numbers are actually necessary for the kind of capability gains we want. We are working on this question at @flappyairplanes, and we're excited to be advised by @karpathy. I will end here with this @ilyasut quote from the @dwarkesh_sp episode with him: "The data is very clearly finite. What do you do next? Either you do some kind of souped-up pre-training, a different recipe from the one you’ve done before, or you’re doing RL, or maybe something else. But now that compute is big, compute is now very big, in some sense we are back to the age of research. [...] Up until 2020, from 2012 to 2020, it was the age of research. Now, from 2020 to 2025, it was the age of scaling—maybe plus or minus, let’s add error bars to those years—because people say, “This is amazing. You’ve got to scale more. Keep scaling.” The one word: scaling. But now the scale is so big. Is the belief really, “Oh, it’s so big, but if you had 100x more, everything would be so different?” It would be different, for sure. But is the belief that if you just 100x the scale, everything would be transformed? I don’t think that’s true. So it’s back to the age of research again, just with big computers." [1] arxiv: 2501.12370
Kushal Thaman tweet media
Andrej Karpathy@karpathy

A conventional narrative you might come across is that AI is too far along for a new, research-focused startup to outcompete and outexecute the incumbents of AI. This is exactly the sentiment I listened to often when OpenAI started ("how could the few of you possibly compete with Google?") and 1) it was very wrong, and then 2) it was very wrong again with a whole another round of startups who are now challenging OpenAI in turn, and imo it still continues to be wrong today. Scaling and locally improving what works will continue to create incredible advances, but with so much progress unlocked so quickly, with so much dust thrown up in the air in the process, and with still a large gap between frontier LLMs and the example proof of the magic of a mind running on 20 watts, the probability of research breakthroughs that yield closer to 10X improvements (instead of 10%) imo still feels very high - plenty high to continue to bet on and look for. The tricky part ofc is creating the conditions where such breakthroughs may be discovered. I think such an environment comes together rarely, but @bfspector & @amspector100 are brilliant, with (rare) full-stack understanding of LLMs top (math/algorithms) to bottom (megakernels/related), they have a great eye for talent and I think will be able to build something very special. Congrats on the launch and I look forward to what you come up with!

English
3
14
131
29.3K
Kunvar Thaman retweetledi
Flapping Airplanes
Flapping Airplanes@flappyairplanes·
We estimate that humans are 100,000x to 1,000,000x more sample efficient than existing models. To achieve such large gains, we need big ideas.
English
12
26
527
110.1K
Kunvar Thaman retweetledi
Flapping Airplanes
Flapping Airplanes@flappyairplanes·
Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.
GIF
English
339
258
3.6K
2.1M