
Vinay Hiremath
13.4K posts

Vinay Hiremath
@vhmth
previously: co-founder @loom, mechatronics intern @specter




everyone should read codependent no more



my first take, and a good lesson on good research epistemics here: what can we infer from ~82% SWE-Bench? it’s possible they (1) they trained a new model, from scratch, that is unlike a regular transformer but i’ve never heard of this company before, and checking their funding round they’ve only raised ~30M, so it’s unlikely they could/afford to train a Opus/GPT-5/Kimi 2.6 level coding model right now from scratch so this tells us that (2) they need to bootstrap off of an existing pretrained model, likely RL too, to get that performance! this tells us they’ve taken a vanilla Transformer and modified the attention mechanism, likely finetuning/midtraining in a subquadratic attention method its quite possible it doesn’t really work and that there’s some degeneracy to the method, or it’s just plain fake but if it’s not, you could expect that given how long it takes to do weight surgery on big models (bigger changes to a pretrained model == longer mid training to recover performance), it’s a lightweight change id lean towards something mostly leveraging existing attention key value protections like a fancy version of deepseeks sparse attention paper, but it could also be some unique test-time KV compression, which would come with its own downsides


.@devanshpandey and Galen are exceptional. Proud to be a small angel on this journey and watch them blow up. @mcannonbrookes @scottfarkas there is are more than a few interesting partnership angles here.

We’ve raised 75m in new funding from Sequoia and Spark Capital—partnering with @sonyatweetybird, @MikowaiA, and @YasminRazavi, all of whom are deeply supportive of our long-term mission. We’ve also brought on angels & advisors including @karpathy, @tszzl, and @_milankovac_. ----- Our early results with FDM-1 moved computer use from a data-constrained regime to a compute-constrained one; this latest round of funding unlocks several orders of magnitude of compute scaling for that work. With the FDM model series we have a path to scale agentic capabilities through video pretraining, and we expect to achieve superhuman performance on general computer tasks in the same way that current language models have superhuman performance on coding tasks. We’re also now able to invest in the blue-sky research necessary to our long term mission of building aligned general learners. To realize the civilizationally transformative impacts of AI, models must generalize far out of their training distributions, actively exploring and building skills in new environments. This capability represents a substantial shift from the current paradigm of model training. We believe that current alignment techniques are insufficient to predictably and safely steer a model with human-level learning capabilities, and so we’re doing work to study small versions of this problem in controlled environments to develop a science of alignment for general learners. We’re a team of 6 people in San Francisco. We’re hiring world-class researchers and engineers to help us achieve our mission. If that’s you, please get in touch.



