QuantFather (@QuantFather63) - Profil Twitter | Zamantika Mersobahis Locabet

Tweet Disematkan

QuantFather@QuantFather63·21 Şub

#Bittensor - With everyone focused on AI subnets hardly anyone is looking at 𝐒𝐮𝐛𝐧𝐞𝐭 𝟔𝟑. P2 launches in a matter of weeks with all major milestones almost ticked off, partnerships in the works, challenges ready. @qBitTensorLabs. It’s almost GO TIME! 🔥 $TAO

English

2

4

23

7.4K

QuantFather me-retweet

QuantFather@QuantFather63·21 Şub

#Bittensor - With everyone focused on AI subnets hardly anyone is looking at 𝐒𝐮𝐛𝐧𝐞𝐭 𝟔𝟑. P2 launches in a matter of weeks with all major milestones almost ticked off, partnerships in the works, challenges ready. @qBitTensorLabs. It’s almost GO TIME! 🔥 $TAO

English

2

4

23

7.4K

QuantFather me-retweet

Jesus Martinez@JesusMartinez·7h

Subnet 24 on Bittensor just dropped a paper that should have every AI researcher's attention. 10B parameters. Outperformed an 80B model. On long-context benchmarks. The mechanism is called Quasar Attention. The problem it solves: You know how AI chatbots start forgetting what you said earlier in a long conversation? Or how they get dumber the more context you give them? That's the attention mechanism breaking down. Current models can only hold about 200K tokens before they start losing it. Think of it like short-term memory. Past a certain point, the model just stops retaining information reliably. Some teams tried to fix this with "linear attention." Qwen 3.5 and Kimi use it. It helps, but it still degrades at longer lengths. The memory still fades. Quasar took a different path. • Continuous-time, matrix-based attention • Stable out to 50 MILLION tokens • 87% on RULER at 1M tokens, beating Qwen3 80B • Held performance from 1M to 10M where competitors dropped to ~10% To put that in perspective. 50 million tokens is roughly 75 million words. That's the entire Harry Potter series about 75 times over. And the model didn't forget anything. A 10B parameter model beating one that's 80B. 8x smaller. Won anyway. Built by Silx AI. Trained on Targon compute. Subnet 24. No model weights yet. Benchmarks not directly comparable by their own admission. Paper on HuggingFace, not peer reviewed. But the signal is loud. Decentralized compute just produced research that says the bottleneck in AI was never model size. It was the attention mechanism. And a Bittensor subnet cracked it before Google did.

Quasar@QuasarModels

This is Quasar Attention, the mechanism behind the upcoming Quasar models, designed to support context lengths of up to 5 million tokens. Attention has long been a bottleneck for processing extended context. Standard attention mechanisms struggle to scale beyond ~200k tokens in training, creating a ceiling on how much information models can reliably use. One approach to solving this has been linear attention methods, such as gated delta attention (used in Qwen 3.5) or Kimi delta attention. These improve efficiency and allow longer sequences, but introduce trade-offs: instability at extreme lengths, quality degradation, and in practice, they are not strictly linear. Quasar Attention takes a different approach. It uses a continuous-time formulation, implemented as a fully matrix-based system rather than relying on vector-state approximations. In practice, this improves stability, reduces cost, and maintains performance as sequence length increases. In internal stress tests at 50 million tokens, KDA-based approaches begin to lose stability, while Quasar Attention remains stable. This allows performance to hold as sequence length increases, rather than degrading beyond a fixed threshold. On BABILong, a Quasar-based model pretrained on 20B tokens and fine-tuned on 16k sequences was evaluated on contexts ranging from 1 million to 10 million tokens, maintaining consistent performance across that range. By contrast, models using gated delta attention show significant degradation at longer lengths, in some cases dropping to ~10% performance at 10 million tokens. (Note: results are indicative; setups are not directly comparable) On RULER benchmarks, a Quasar-10B model (built on Qwen 3.5 with frozen base weights and Quasar Attention added), pretrained on 200B tokens, achieved 87% at 1 million tokens, outperforming significantly larger baselines, including Qwen3 80B, under the same evaluation conditions. Taken together, this points to a shift in where long-context performance is won or lost: not in model size alone, but in the attention mechanism itself. Quasar Attention represents a step change in long-context modelling, setting a new standard for stability and performance at scale. We thank @TargonCompute for the compute and for being our compute provider and long-term partner in training the upcoming Quasar models Here is the link to our paper 👇

English

4

32

124

10K

QuantFather me-retweet

Targon@TargonCompute·7h

It's been a pleasure working with the @QuasarModels team and providing secure compute for their frontier long-context AI model training ☁️ Excited to watch their continued innovation and growth ⚡️

Quasar@QuasarModels

This is Quasar Attention, the mechanism behind the upcoming Quasar models, designed to support context lengths of up to 5 million tokens. Attention has long been a bottleneck for processing extended context. Standard attention mechanisms struggle to scale beyond ~200k tokens in training, creating a ceiling on how much information models can reliably use. One approach to solving this has been linear attention methods, such as gated delta attention (used in Qwen 3.5) or Kimi delta attention. These improve efficiency and allow longer sequences, but introduce trade-offs: instability at extreme lengths, quality degradation, and in practice, they are not strictly linear. Quasar Attention takes a different approach. It uses a continuous-time formulation, implemented as a fully matrix-based system rather than relying on vector-state approximations. In practice, this improves stability, reduces cost, and maintains performance as sequence length increases. In internal stress tests at 50 million tokens, KDA-based approaches begin to lose stability, while Quasar Attention remains stable. This allows performance to hold as sequence length increases, rather than degrading beyond a fixed threshold. On BABILong, a Quasar-based model pretrained on 20B tokens and fine-tuned on 16k sequences was evaluated on contexts ranging from 1 million to 10 million tokens, maintaining consistent performance across that range. By contrast, models using gated delta attention show significant degradation at longer lengths, in some cases dropping to ~10% performance at 10 million tokens. (Note: results are indicative; setups are not directly comparable) On RULER benchmarks, a Quasar-10B model (built on Qwen 3.5 with frozen base weights and Quasar Attention added), pretrained on 200B tokens, achieved 87% at 1 million tokens, outperforming significantly larger baselines, including Qwen3 80B, under the same evaluation conditions. Taken together, this points to a shift in where long-context performance is won or lost: not in model size alone, but in the attention mechanism itself. Quasar Attention represents a step change in long-context modelling, setting a new standard for stability and performance at scale. We thank @TargonCompute for the compute and for being our compute provider and long-term partner in training the upcoming Quasar models Here is the link to our paper 👇

English

0

28

152

5.2K

QuantFather me-retweet

Quasar@QuasarModels·7h

This is Quasar Attention, the mechanism behind the upcoming Quasar models, designed to support context lengths of up to 5 million tokens. Attention has long been a bottleneck for processing extended context. Standard attention mechanisms struggle to scale beyond ~200k tokens in training, creating a ceiling on how much information models can reliably use. One approach to solving this has been linear attention methods, such as gated delta attention (used in Qwen 3.5) or Kimi delta attention. These improve efficiency and allow longer sequences, but introduce trade-offs: instability at extreme lengths, quality degradation, and in practice, they are not strictly linear. Quasar Attention takes a different approach. It uses a continuous-time formulation, implemented as a fully matrix-based system rather than relying on vector-state approximations. In practice, this improves stability, reduces cost, and maintains performance as sequence length increases. In internal stress tests at 50 million tokens, KDA-based approaches begin to lose stability, while Quasar Attention remains stable. This allows performance to hold as sequence length increases, rather than degrading beyond a fixed threshold. On BABILong, a Quasar-based model pretrained on 20B tokens and fine-tuned on 16k sequences was evaluated on contexts ranging from 1 million to 10 million tokens, maintaining consistent performance across that range. By contrast, models using gated delta attention show significant degradation at longer lengths, in some cases dropping to ~10% performance at 10 million tokens. (Note: results are indicative; setups are not directly comparable) On RULER benchmarks, a Quasar-10B model (built on Qwen 3.5 with frozen base weights and Quasar Attention added), pretrained on 200B tokens, achieved 87% at 1 million tokens, outperforming significantly larger baselines, including Qwen3 80B, under the same evaluation conditions. Taken together, this points to a shift in where long-context performance is won or lost: not in model size alone, but in the attention mechanism itself. Quasar Attention represents a step change in long-context modelling, setting a new standard for stability and performance at scale. We thank @TargonCompute for the compute and for being our compute provider and long-term partner in training the upcoming Quasar models Here is the link to our paper 👇

English

19

62

179

45K

QuantFather me-retweet

Keith Singery@KeithSingery·10h

The time has come for the world to be introduced to Bittensor $TAO

const@const_reborn

I feel like i need to get on a podcast or something

English

4

5

89

3.8K

QuantFather me-retweet

τao τemplar@TAOTemplar·2h

tao-pilled

const@const_reborn

Just going live on this right now if people want to listen about Tao

Filipino

0

1

23

1K

QuantFather me-retweet

Algod@AlgodTrading·48m

If you’re in bittensor, use claude code or setup an openclaw instance, try to find holes or outcompete miners on subnets The better the miner output, the faster bittensor gets full blown adoption Everyone can mine now, just be creative

English

5

9

95

4.4K

QuantFather me-retweet

NiFτy@niftyinvest·57m

Let this sink in... Only 28% of $TAO is currently staked to Bittensor subnets

English

2

18

1.1K

QuantFather me-retweet

CounterParty TV@counterpartytv·4h

TAO co-founder Const explains the difference between mining BTC and mining TAO “People try to mine Bitcoin, and if they can do it faster, they get paid. Bittensor takes that same idea and turns it into a system where people compete to solve problems, get evaluated, and the best performers earn more, over and over again. We took the Bitcoin model, abstracted it, and applied it to everything”

English

5

13

109

7.8K

QuantFather me-retweet

Jolly Green Investor 🍀@jollygreenmoney·1h

This AI startup is pre-revenue and even pre-product (nothing launched yet) and has a $4.5B valuation Bittensor $TAO has 128 startups, most of which have launched products, several with revenue, and the entire ecosystem has a market cap of $3.5B Ok 🤓

English

1

3

26

1.7K

QuantFather@QuantFather63·1h

@Jason @Jackkk @Jason check out Subnet 63 Bob Wold CEO of open quantum running the show! $TAO

English

0

2

81

QuantFather me-retweet

@jason@Jason·3h

I’m not the guy for $tao — the people building the subnets and the customers the attack are “the guys” I bet on things that are extremely speculative and I lose my money on 80%+ of startups we invest in Also. I “bet to learn” often… which means after some cursory investigations into a technology I will make a series of bets that drag me deep into the rabbit hole $tao is exactly that kind of bet—to learn more 1. Not financial advice 2. Do your own research 3. It could got zero as easily as $500b market cap in my mind 4. No crying in the casino please

English

25

6

189

8.9K

Jack@Jackkk·4h

Threadguy thinks Jason Calacanis is to TAO what Naval was for Zcash “You never want to be the guy that’s responsible for the thing going up, you just don’t. But somebody always is and Jason has decided he’s going to be the guy for TAO. It’s good enough for me” “The thing about Crypto is that the narrative is driven by a couple hundred people, you don't have earnings reports and all this stuff to fall back on so the whole thing becomes a narrative game”

English

13

7

98

13.9K

QuantFather me-retweet

Michael Parker@bittensormax·1h

$904 million in vol today on kraken $TAO 1/3 of the entire MCAP shifted to winners hands. Thank you paper hands.

English

1

29

752

QuantFather me-retweet

Keith Singery@KeithSingery·4h

Bittensor subnets are $TAO lock-up mechanisms

English

3

8

132

6.6K

QuantFather me-retweet

Cade O'Neill@CadeONeill·10h

$TAO is the AI Bitcoin • Permission-less networks (Both) • Decentralized (Both) • Miners ($BTC) vs Validators ($TAO) • Proof of work ($BTC) vs Proof of intelligence ($TAO) • Fixed supply (Both) • Compute ($BTC) vs Rewards ($TAO) $TAO Price: $305 Market Cap: 3.4B (FDV: 6.4B) ATH: $767 Listed on: Binance, Coinbase, Upbit, Kucoin,... Excited to see the growth of $TAO and with the subnets such as SN3 Templar @tplr_ai getting the attention they deserve, the network is looking very promising. #crypto #bittensor #nvidea #tao #altcoins

English

7

19

152

6.2K

QuantFather me-retweet

Lukiτa@LukitaTao·4h

TAO is the future.

English

0

2

24

374

QuantFather me-retweet

Barbie True Blue@Pop_Collapse·6h

I mean, it's not surprising. And if you're surprised, then you haven't been paying attention or you haven’t taken the time to understand what’s really going on under the hood of Bittensor. $TAO

English

1

3

30

900

QuantFather me-retweet

Keith Singery@KeithSingery·4h

Yo @const_reborn killing it live rn $TAO

Threadguy Updates@Threadupdates_

Threadguy is now LIVE with @const_reborn twitch.tv/threadguy

English

1

2

37

1.3K

QuantFather me-retweet

The Bittensor Netrunner - TAO -@TheTNetHunter·5h

Holy F*ck Another huge Bittensor subnet win! $TAO #SN24

Quasar@QuasarModels

This is Quasar Attention, the mechanism behind the upcoming Quasar models, designed to support context lengths of up to 5 million tokens. Attention has long been a bottleneck for processing extended context. Standard attention mechanisms struggle to scale beyond ~200k tokens in training, creating a ceiling on how much information models can reliably use. One approach to solving this has been linear attention methods, such as gated delta attention (used in Qwen 3.5) or Kimi delta attention. These improve efficiency and allow longer sequences, but introduce trade-offs: instability at extreme lengths, quality degradation, and in practice, they are not strictly linear. Quasar Attention takes a different approach. It uses a continuous-time formulation, implemented as a fully matrix-based system rather than relying on vector-state approximations. In practice, this improves stability, reduces cost, and maintains performance as sequence length increases. In internal stress tests at 50 million tokens, KDA-based approaches begin to lose stability, while Quasar Attention remains stable. This allows performance to hold as sequence length increases, rather than degrading beyond a fixed threshold. On BABILong, a Quasar-based model pretrained on 20B tokens and fine-tuned on 16k sequences was evaluated on contexts ranging from 1 million to 10 million tokens, maintaining consistent performance across that range. By contrast, models using gated delta attention show significant degradation at longer lengths, in some cases dropping to ~10% performance at 10 million tokens. (Note: results are indicative; setups are not directly comparable) On RULER benchmarks, a Quasar-10B model (built on Qwen 3.5 with frozen base weights and Quasar Attention added), pretrained on 200B tokens, achieved 87% at 1 million tokens, outperforming significantly larger baselines, including Qwen3 80B, under the same evaluation conditions. Taken together, this points to a shift in where long-context performance is won or lost: not in model size alone, but in the attention mechanism itself. Quasar Attention represents a step change in long-context modelling, setting a new standard for stability and performance at scale. We thank @TargonCompute for the compute and for being our compute provider and long-term partner in training the upcoming Quasar models Here is the link to our paper 👇

English

1

9

52

3.2K

QuantFather me-retweet

Finance Freeman 🇺🇸@FinanceFreeman·7h

🚨 BITTENSOR IS THE NASDAQ OF AI? dTAO turns Bittensor into something closer to a capital market for AI subnets: -> Each subnet has its own token (alpha) -> These tokens are priced via supply/demand (AMM-style markets) -> Higher demand → more TAO emissions → more rewards No committees. No insiders. Just markets. Credit to the builders pushing $dTAO forward 🫡

English

1

6

41

1.4K

QuantFather

Jelajahi