The Bittensor Netrunner - TAO -

74 posts

The Bittensor Netrunner - TAO -

The Bittensor Netrunner - TAO -

@TheTNetHunter

Finding/trading the gems within Bittensor ecosystem. Opinions only - never financial advise - don't buy bc of my tweets.

Katılım Şubat 2023
280 Takip Edilen2.3K Takipçiler
The Bittensor Netrunner - TAO - retweetledi
Satoshi Flipper
Satoshi Flipper@SatoshiFlipper·
$TAO/usdt 8 hour $TAO loading a $380 moon-shot 🎯
Satoshi Flipper tweet media
English
5
19
194
8.2K
The Bittensor Netrunner - TAO - retweetledi
Gyles | TAO.com
Gyles | TAO.com@Gylez·
If people knew the talent density and competitiveness of mining in Bittensor, we would teleport. Bittensor’s greatest asset is the anon demons in the metagraph who will do everything but what you want them to do. Taming them is the single greatest feat of a subnet’s success and once you do you have control of a legion bittensor:native
English
5
21
135
7.8K
Punisher ττ
Punisher ττ@CryptoZPunisher·
Bittensor bittensor:native $dTAO In your opinion, which subnets are truly legitimate? Natural selection is entering an acceleration phase.
Punisher ττ tweet media
English
8
4
26
2K
The Bittensor Netrunner - TAO - retweetledi
Jesus Martinez
Jesus Martinez@JesusMartinez·
Imagine missing the best trade since Ethereum & Bitcoin because you think you're right-curving TAO isn't going anywhere & no it's not "too late" 95%+ of CT has quit since October
English
10
9
80
3.7K
The Bittensor Netrunner - TAO - retweetledi
TAOminator
TAOminator@TAOminated·
Insanity. Well done @TroyQuasar and team.
TAOminator tweet media
Quasar@QuasarModels

This is Quasar Attention, the mechanism behind the upcoming Quasar models, designed to support context lengths of up to 5 million tokens. Attention has long been a bottleneck for processing extended context. Standard attention mechanisms struggle to scale beyond ~200k tokens in training, creating a ceiling on how much information models can reliably use. One approach to solving this has been linear attention methods, such as gated delta attention (used in Qwen 3.5) or Kimi delta attention. These improve efficiency and allow longer sequences, but introduce trade-offs: instability at extreme lengths, quality degradation, and in practice, they are not strictly linear. Quasar Attention takes a different approach. It uses a continuous-time formulation, implemented as a fully matrix-based system rather than relying on vector-state approximations. In practice, this improves stability, reduces cost, and maintains performance as sequence length increases. In internal stress tests at 50 million tokens, KDA-based approaches begin to lose stability, while Quasar Attention remains stable. This allows performance to hold as sequence length increases, rather than degrading beyond a fixed threshold. On BABILong, a Quasar-based model pretrained on 20B tokens and fine-tuned on 16k sequences was evaluated on contexts ranging from 1 million to 10 million tokens, maintaining consistent performance across that range. By contrast, models using gated delta attention show significant degradation at longer lengths, in some cases dropping to ~10% performance at 10 million tokens. (Note: results are indicative; setups are not directly comparable) On RULER benchmarks, a Quasar-10B model (built on Qwen 3.5 with frozen base weights and Quasar Attention added), pretrained on 200B tokens, achieved 87% at 1 million tokens, outperforming significantly larger baselines, including Qwen3 80B, under the same evaluation conditions. Taken together, this points to a shift in where long-context performance is won or lost: not in model size alone, but in the attention mechanism itself. Quasar Attention represents a step change in long-context modelling, setting a new standard for stability and performance at scale. We thank @TargonCompute for the compute and for being our compute provider and long-term partner in training the upcoming Quasar models Here is the link to our paper 👇

English
0
8
64
7.8K
The Bittensor Netrunner - TAO - retweetledi
Tao Outsider
Tao Outsider@TaoOutsider·
$TAO Bittensor - Kraken recently faced delays completing TAO transfers. Rumors pointed to excess demand. The issue has already been resolved and transfers are back to normal. Something similar happened recently when Binance listed TAO in Japan. It’s starting to look like even major exchanges weren’t fully prepared for the level of demand coming into the Bittensor ecosystem. Prepare accordingly.
Tao Outsider tweet media
English
10
27
193
13.8K
The Bittensor Netrunner - TAO - retweetledi
pixel
pixel@spacepixel·
TAO was $540 less than 6 months ago. And some of ya'll are calling the top at $340 lmeow
English
20
30
404
22.5K
The Bittensor Netrunner - TAO - retweetledi
pixel
pixel@spacepixel·
With both @chamath and @Jason shilling $TAO there is near 100% chance their bestie @DavidSacks has a bag too. He's literally the US governments crypto CZAR. He would have told them its a yes or no to buy it.
English
15
10
165
12.3K
The Bittensor Netrunner - TAO -
The Bittensor Netrunner - TAO -@TheTNetHunter·
There are people who have the ability to know they are wrong or that landscape changed. Leave their holdings and pivot to winning with $TAO . There are also others who will shout as hard as possible hoping to change a reality that isn't, these are called $TAO fudders. PIVOT
The Bittensor Netrunner - TAO - tweet media
English
2
3
18
436
The Bittensor Netrunner - TAO -
The Bittensor Netrunner - TAO -@TheTNetHunter·
Quasar is also just starting according to their discord & @TroyQuasar $TAO #SN24
The Bittensor Netrunner - TAO - tweet media
Quasar@QuasarModels

This is Quasar Attention, the mechanism behind the upcoming Quasar models, designed to support context lengths of up to 5 million tokens. Attention has long been a bottleneck for processing extended context. Standard attention mechanisms struggle to scale beyond ~200k tokens in training, creating a ceiling on how much information models can reliably use. One approach to solving this has been linear attention methods, such as gated delta attention (used in Qwen 3.5) or Kimi delta attention. These improve efficiency and allow longer sequences, but introduce trade-offs: instability at extreme lengths, quality degradation, and in practice, they are not strictly linear. Quasar Attention takes a different approach. It uses a continuous-time formulation, implemented as a fully matrix-based system rather than relying on vector-state approximations. In practice, this improves stability, reduces cost, and maintains performance as sequence length increases. In internal stress tests at 50 million tokens, KDA-based approaches begin to lose stability, while Quasar Attention remains stable. This allows performance to hold as sequence length increases, rather than degrading beyond a fixed threshold. On BABILong, a Quasar-based model pretrained on 20B tokens and fine-tuned on 16k sequences was evaluated on contexts ranging from 1 million to 10 million tokens, maintaining consistent performance across that range. By contrast, models using gated delta attention show significant degradation at longer lengths, in some cases dropping to ~10% performance at 10 million tokens. (Note: results are indicative; setups are not directly comparable) On RULER benchmarks, a Quasar-10B model (built on Qwen 3.5 with frozen base weights and Quasar Attention added), pretrained on 200B tokens, achieved 87% at 1 million tokens, outperforming significantly larger baselines, including Qwen3 80B, under the same evaluation conditions. Taken together, this points to a shift in where long-context performance is won or lost: not in model size alone, but in the attention mechanism itself. Quasar Attention represents a step change in long-context modelling, setting a new standard for stability and performance at scale. We thank @TargonCompute for the compute and for being our compute provider and long-term partner in training the upcoming Quasar models Here is the link to our paper 👇

English
1
12
73
4.7K
The Bittensor Netrunner - TAO - retweetledi
一只羊 ττ
一只羊 ττ@Yizhiyangfund·
Quasar在即将到来的5M token时代让Bittensor感到骄傲 致敬Quasar团队 这即将为这个世界添加不一样的色彩与惊喜 他们值得受到任何的尊重且追随🫡 #SN24 $TAO #dTAO
Jesus Martinez@JesusMartinez

Subnet 24 on Bittensor just dropped a paper that should have every AI researcher's attention. 10B parameters. Outperformed an 80B model. On long-context benchmarks. The mechanism is called Quasar Attention. The problem it solves: You know how AI chatbots start forgetting what you said earlier in a long conversation? Or how they get dumber the more context you give them? That's the attention mechanism breaking down. Current models can only hold about 200K tokens before they start losing it. Think of it like short-term memory. Past a certain point, the model just stops retaining information reliably. Some teams tried to fix this with "linear attention." Qwen 3.5 and Kimi use it. It helps, but it still degrades at longer lengths. The memory still fades. Quasar took a different path. • Continuous-time, matrix-based attention • Stable out to 50 MILLION tokens • 87% on RULER at 1M tokens, beating Qwen3 80B • Held performance from 1M to 10M where competitors dropped to ~10% To put that in perspective. 50 million tokens is roughly 75 million words. That's the entire Harry Potter series about 75 times over. And the model didn't forget anything. A 10B parameter model beating one that's 80B. 8x smaller. Won anyway. Built by Silx AI. Trained on Targon compute. Subnet 24. No model weights yet. Benchmarks not directly comparable by their own admission. Paper on HuggingFace, not peer reviewed. But the signal is loud. Decentralized compute just produced research that says the bottleneck in AI was never model size. It was the attention mechanism. And a Bittensor subnet cracked it before Google did.

中文
4
5
21
3K
The Bittensor Netrunner - TAO - retweetledi
bitstarter
bitstarter@bitstarterAI·
P R O U D 🚀 From takeoff to taking the world by storm: three months since we introduced Quasar via livestream, their very own mechanism for extending context for LLMs @TroyQuasar @Farahatyoussef0 take a bow 👏
Quasar@QuasarModels

This is Quasar Attention, the mechanism behind the upcoming Quasar models, designed to support context lengths of up to 5 million tokens. Attention has long been a bottleneck for processing extended context. Standard attention mechanisms struggle to scale beyond ~200k tokens in training, creating a ceiling on how much information models can reliably use. One approach to solving this has been linear attention methods, such as gated delta attention (used in Qwen 3.5) or Kimi delta attention. These improve efficiency and allow longer sequences, but introduce trade-offs: instability at extreme lengths, quality degradation, and in practice, they are not strictly linear. Quasar Attention takes a different approach. It uses a continuous-time formulation, implemented as a fully matrix-based system rather than relying on vector-state approximations. In practice, this improves stability, reduces cost, and maintains performance as sequence length increases. In internal stress tests at 50 million tokens, KDA-based approaches begin to lose stability, while Quasar Attention remains stable. This allows performance to hold as sequence length increases, rather than degrading beyond a fixed threshold. On BABILong, a Quasar-based model pretrained on 20B tokens and fine-tuned on 16k sequences was evaluated on contexts ranging from 1 million to 10 million tokens, maintaining consistent performance across that range. By contrast, models using gated delta attention show significant degradation at longer lengths, in some cases dropping to ~10% performance at 10 million tokens. (Note: results are indicative; setups are not directly comparable) On RULER benchmarks, a Quasar-10B model (built on Qwen 3.5 with frozen base weights and Quasar Attention added), pretrained on 200B tokens, achieved 87% at 1 million tokens, outperforming significantly larger baselines, including Qwen3 80B, under the same evaluation conditions. Taken together, this points to a shift in where long-context performance is won or lost: not in model size alone, but in the attention mechanism itself. Quasar Attention represents a step change in long-context modelling, setting a new standard for stability and performance at scale. We thank @TargonCompute for the compute and for being our compute provider and long-term partner in training the upcoming Quasar models Here is the link to our paper 👇

English
1
17
67
5.2K
Grey BTC
Grey BTC@greybtc·
What's your $TAO price prediction for this year?
Grey BTC tweet media
English
85
4
198
43.2K