
Prasith Govin
283 posts

Prasith Govin
@prasithg
CTO @ Jobleap AI. 2X Entrepreneur. Life Long Engineer. Helping 1mm people improve their lives with AI.




my first take, and a good lesson on good research epistemics here: what can we infer from ~82% SWE-Bench? it’s possible they (1) they trained a new model, from scratch, that is unlike a regular transformer but i’ve never heard of this company before, and checking their funding round they’ve only raised ~30M, so it’s unlikely they could/afford to train a Opus/GPT-5/Kimi 2.6 level coding model right now from scratch so this tells us that (2) they need to bootstrap off of an existing pretrained model, likely RL too, to get that performance! this tells us they’ve taken a vanilla Transformer and modified the attention mechanism, likely finetuning/midtraining in a subquadratic attention method its quite possible it doesn’t really work and that there’s some degeneracy to the method, or it’s just plain fake but if it’s not, you could expect that given how long it takes to do weight surgery on big models (bigger changes to a pretrained model == longer mid training to recover performance), it’s a lightweight change id lean towards something mostly leveraging existing attention key value protections like a fancy version of deepseeks sparse attention paper, but it could also be some unique test-time KV compression, which would come with its own downsides
















A new model is now available on chat.z.ai.






