Brea Browne
147 posts

Brea Browne
@bb_turing_ai
Making the impossible - POSSIBLE

4/ For Cogito v2.1, we fork off the open-licensed Deepseek base model from November 2024. This is an obvious choice for a pretrained base model, as Deepseek architecture has an ecosystem of cheap inference built around it. We have built a frontier training stack, while being an early stage startup, since we can stand on the shoulders of open source champions like @huggingface, @togethercompute, @runpod and @nebiusai, as well as stellar contributions by @Microsoft, @Meta, @nvidia and a lot of other folks in open source. Over the last months, we have iterated and refined our post-training strategies of self-play + RL (called Iterated Distillation and Amplification - IDA) with Cogito v1 and v2. You will see high-quality responses from Cogito v2.1 while being a bit different from usual models - we increase the model’s intelligence prior and teach it how to think via process supervision. So there are significantly shorter reasoning chains for the responses. We also use less markdown, less verbosity. In short, we want to make the model great for API usage - faster, fewer tokens with super high quality.

It is intuitively easy to understand why self play *can* work for LLMs, if we are able to provide a value function at intermediate steps (although not as clearly guaranteed as in two-player zero-sum games). In chess / go / poker, we have a reward associated with every next move, but as Noam points out, natural language is messy. It is hard to define a value function at intermediate steps like tokens. As a result, in usual reinforcement learning (like RLVR), LLMs get a reward at the end. They end up learning to 'meander' more for hard problems. In a way, we reward brute forcing with more tokens to end up at the right answer as the right approach. However, at @DeepCogito, we provide a signal for the thinking process itself. Conceptually, you can imagine this as post-hoc assigning a reward to better search trajectories. This teaches the model to develop a stronger intuition for 'how to search' while reasoning. In practice, the model ends up with significantly shorter reasoning chains for harder problems in a reasoning mode. Somewhat surprisingly, it also ends up being better in a non-thinking mode. One way to think about it is that since the model knows how to search better, it 'picks' the most likely trajectory better in the non-thinking mode.

1/ Introducing Isaac 0.1 — our first perceptive-language model. 2B params, open weights. Matches or beats models significantly larger on core perception. We are pushing the efficient frontier for physical AI. perceptron.inc/blog/introduci…

Today, we are releasing 4 hybrid reasoning models of sizes 70B, 109B MoE, 405B, 671B MoE under open license. These are some of the strongest LLMs in the world, and serve as a proof of concept for a novel AI paradigm - iterative self-improvement (AI systems improving themselves). The largest 671B MoE model is amongst the strongest open models in the world. It matches/exceeds the performance of the latest DeepSeek v3 and DeepSeek R1 models both, and approaches closed frontier models like o3 and Claude 4 Opus.














