

y’all not ready for this we casually solved edge deployment to run CV pipelines on-site on CPUs
τao sτacker
3.4K posts

@taostacker
AI architect ex-Microsoft | early invesτor in $btc | sτacking and sτaking $tao


y’all not ready for this we casually solved edge deployment to run CV pipelines on-site on CPUs


With all the noise around $TAO lately.. It's probably a good time for people to watch the latest #bittensor documentary The Incentive Layer 👉 youtu.be/71rvASmXUN8?si… Its under 1 hour of your time.. and you’ll come away with a much clearer picture of what’s actually being built $TAO






For ~14 months, $TAO / $BTC has sat in Bitcoin’s gravitational pull. Every attempt to unhook has failed. Back at the level again now. Does it finally escape velocity, or get sucked back in?


$TAO seems like they are rage shorting tao with funding deep negative and OI increasing massively. Not sure if the case but that's how I read this.

Pretty wild to see our work on PULSE show up in a real 1T-scale post-training run done by @cursor_ai. Cursor built Composer 2 in collaboration with Fireworks and trained it across multiple datacenters, getting huge savings by syncing only the weights that actually changed between RL checkpoints. Fireworks reports that more than 98% of BF16 weights can stay bit-identical from one checkpoint to the next, and they cited our paper on this, too. That is basically the exact sparsity pattern we showed in our paper, where we introduced PULSE, a lossless method for 100x more efficient weight-sync communication for RL training. Their system is very close to this idea in practice: exploiting the fact that only a tiny fraction of weights actually change between RL steps. The deeper reason for this is not that RL gradients are sparse. They are not. The gradients are still dense. What becomes sparse is the realized weight update. In RL, learning rates are tiny, and with Adam, the update size stays bounded around the learning rate. Then BF16 adds a hard threshold: if the update is too small relative to the weight, it just rounds away, and the stored weight does not change at all. So from one checkpoint to the next, most of the model literally stays identical. That is why this is such a useful systems idea. Lower precision, like using BF16, does not just save compute. It can also save communication, because more tiny updates get absorbed and fewer weights need to be shipped. At that point, compute efficiency and comms efficiency stop being a tradeoff. They start reinforcing each other. If you want the deeper story on why RL updates get this sparse, the theory behind it, and how to push weight-sync bandwidth down by 100x+, take a look at our paper: arxiv.org/pdf/2602.03839 The Fireworks blog on Composer 2 that cited our work: fireworks.ai/blog/frontier-… The animation is taken from Fireworks!







Subnets starting to trend on coingecko wait until bittensor is shipping its first billion dollar subnet, thats when a lot of capital will enter the subnet arena