QuantFather

28K posts

QuantFather banner
QuantFather

QuantFather

@QuantFather63

SN63 🤌 $TAO 🤖

Melbourne, Victoria Bergabung Aralık 2024
1.1K Mengikuti468 Pengikut
Tweet Disematkan
QuantFather
QuantFather@QuantFather63·
#Bittensor - With everyone focused on AI subnets hardly anyone is looking at 𝐒𝐮𝐛𝐧𝐞𝐭 𝟔𝟑. P2 launches in a matter of weeks with all major milestones almost ticked off, partnerships in the works, challenges ready. @qBitTensorLabs. It’s almost GO TIME! 🔥 $TAO
QuantFather tweet mediaQuantFather tweet mediaQuantFather tweet media
English
2
4
23
7.4K
QuantFather me-retweet
QuantFather
QuantFather@QuantFather63·
#Bittensor - With everyone focused on AI subnets hardly anyone is looking at 𝐒𝐮𝐛𝐧𝐞𝐭 𝟔𝟑. P2 launches in a matter of weeks with all major milestones almost ticked off, partnerships in the works, challenges ready. @qBitTensorLabs. It’s almost GO TIME! 🔥 $TAO
QuantFather tweet mediaQuantFather tweet mediaQuantFather tweet media
English
2
4
23
7.4K
QuantFather me-retweet
Jesus Martinez
Jesus Martinez@JesusMartinez·
Subnet 24 on Bittensor just dropped a paper that should have every AI researcher's attention. 10B parameters. Outperformed an 80B model. On long-context benchmarks. The mechanism is called Quasar Attention. The problem it solves: You know how AI chatbots start forgetting what you said earlier in a long conversation? Or how they get dumber the more context you give them? That's the attention mechanism breaking down. Current models can only hold about 200K tokens before they start losing it. Think of it like short-term memory. Past a certain point, the model just stops retaining information reliably. Some teams tried to fix this with "linear attention." Qwen 3.5 and Kimi use it. It helps, but it still degrades at longer lengths. The memory still fades. Quasar took a different path. • Continuous-time, matrix-based attention • Stable out to 50 MILLION tokens • 87% on RULER at 1M tokens, beating Qwen3 80B • Held performance from 1M to 10M where competitors dropped to ~10% To put that in perspective. 50 million tokens is roughly 75 million words. That's the entire Harry Potter series about 75 times over. And the model didn't forget anything. A 10B parameter model beating one that's 80B. 8x smaller. Won anyway. Built by Silx AI. Trained on Targon compute. Subnet 24. No model weights yet. Benchmarks not directly comparable by their own admission. Paper on HuggingFace, not peer reviewed. But the signal is loud. Decentralized compute just produced research that says the bottleneck in AI was never model size. It was the attention mechanism. And a Bittensor subnet cracked it before Google did.
Quasar@QuasarModels

This is Quasar Attention, the mechanism behind the upcoming Quasar models, designed to support context lengths of up to 5 million tokens. Attention has long been a bottleneck for processing extended context. Standard attention mechanisms struggle to scale beyond ~200k tokens in training, creating a ceiling on how much information models can reliably use. One approach to solving this has been linear attention methods, such as gated delta attention (used in Qwen 3.5) or Kimi delta attention. These improve efficiency and allow longer sequences, but introduce trade-offs: instability at extreme lengths, quality degradation, and in practice, they are not strictly linear. Quasar Attention takes a different approach. It uses a continuous-time formulation, implemented as a fully matrix-based system rather than relying on vector-state approximations. In practice, this improves stability, reduces cost, and maintains performance as sequence length increases. In internal stress tests at 50 million tokens, KDA-based approaches begin to lose stability, while Quasar Attention remains stable. This allows performance to hold as sequence length increases, rather than degrading beyond a fixed threshold. On BABILong, a Quasar-based model pretrained on 20B tokens and fine-tuned on 16k sequences was evaluated on contexts ranging from 1 million to 10 million tokens, maintaining consistent performance across that range. By contrast, models using gated delta attention show significant degradation at longer lengths, in some cases dropping to ~10% performance at 10 million tokens. (Note: results are indicative; setups are not directly comparable) On RULER benchmarks, a Quasar-10B model (built on Qwen 3.5 with frozen base weights and Quasar Attention added), pretrained on 200B tokens, achieved 87% at 1 million tokens, outperforming significantly larger baselines, including Qwen3 80B, under the same evaluation conditions. Taken together, this points to a shift in where long-context performance is won or lost: not in model size alone, but in the attention mechanism itself. Quasar Attention represents a step change in long-context modelling, setting a new standard for stability and performance at scale. We thank @TargonCompute for the compute and for being our compute provider and long-term partner in training the upcoming Quasar models Here is the link to our paper 👇

English
4
32
124
10K
QuantFather me-retweet
Targon
Targon@TargonCompute·
It's been a pleasure working with the @QuasarModels team and providing secure compute for their frontier long-context AI model training ☁️ Excited to watch their continued innovation and growth ⚡️
Quasar@QuasarModels

This is Quasar Attention, the mechanism behind the upcoming Quasar models, designed to support context lengths of up to 5 million tokens. Attention has long been a bottleneck for processing extended context. Standard attention mechanisms struggle to scale beyond ~200k tokens in training, creating a ceiling on how much information models can reliably use. One approach to solving this has been linear attention methods, such as gated delta attention (used in Qwen 3.5) or Kimi delta attention. These improve efficiency and allow longer sequences, but introduce trade-offs: instability at extreme lengths, quality degradation, and in practice, they are not strictly linear. Quasar Attention takes a different approach. It uses a continuous-time formulation, implemented as a fully matrix-based system rather than relying on vector-state approximations. In practice, this improves stability, reduces cost, and maintains performance as sequence length increases. In internal stress tests at 50 million tokens, KDA-based approaches begin to lose stability, while Quasar Attention remains stable. This allows performance to hold as sequence length increases, rather than degrading beyond a fixed threshold. On BABILong, a Quasar-based model pretrained on 20B tokens and fine-tuned on 16k sequences was evaluated on contexts ranging from 1 million to 10 million tokens, maintaining consistent performance across that range. By contrast, models using gated delta attention show significant degradation at longer lengths, in some cases dropping to ~10% performance at 10 million tokens. (Note: results are indicative; setups are not directly comparable) On RULER benchmarks, a Quasar-10B model (built on Qwen 3.5 with frozen base weights and Quasar Attention added), pretrained on 200B tokens, achieved 87% at 1 million tokens, outperforming significantly larger baselines, including Qwen3 80B, under the same evaluation conditions. Taken together, this points to a shift in where long-context performance is won or lost: not in model size alone, but in the attention mechanism itself. Quasar Attention represents a step change in long-context modelling, setting a new standard for stability and performance at scale. We thank @TargonCompute for the compute and for being our compute provider and long-term partner in training the upcoming Quasar models Here is the link to our paper 👇

English
0
28
152
5.2K
QuantFather me-retweet
Quasar
Quasar@QuasarModels·
This is Quasar Attention, the mechanism behind the upcoming Quasar models, designed to support context lengths of up to 5 million tokens. Attention has long been a bottleneck for processing extended context. Standard attention mechanisms struggle to scale beyond ~200k tokens in training, creating a ceiling on how much information models can reliably use. One approach to solving this has been linear attention methods, such as gated delta attention (used in Qwen 3.5) or Kimi delta attention. These improve efficiency and allow longer sequences, but introduce trade-offs: instability at extreme lengths, quality degradation, and in practice, they are not strictly linear. Quasar Attention takes a different approach. It uses a continuous-time formulation, implemented as a fully matrix-based system rather than relying on vector-state approximations. In practice, this improves stability, reduces cost, and maintains performance as sequence length increases. In internal stress tests at 50 million tokens, KDA-based approaches begin to lose stability, while Quasar Attention remains stable. This allows performance to hold as sequence length increases, rather than degrading beyond a fixed threshold. On BABILong, a Quasar-based model pretrained on 20B tokens and fine-tuned on 16k sequences was evaluated on contexts ranging from 1 million to 10 million tokens, maintaining consistent performance across that range. By contrast, models using gated delta attention show significant degradation at longer lengths, in some cases dropping to ~10% performance at 10 million tokens. (Note: results are indicative; setups are not directly comparable) On RULER benchmarks, a Quasar-10B model (built on Qwen 3.5 with frozen base weights and Quasar Attention added), pretrained on 200B tokens, achieved 87% at 1 million tokens, outperforming significantly larger baselines, including Qwen3 80B, under the same evaluation conditions. Taken together, this points to a shift in where long-context performance is won or lost: not in model size alone, but in the attention mechanism itself. Quasar Attention represents a step change in long-context modelling, setting a new standard for stability and performance at scale. We thank @TargonCompute for the compute and for being our compute provider and long-term partner in training the upcoming Quasar models Here is the link to our paper 👇
Quasar tweet media
English
19
62
179
45K
QuantFather me-retweet
Algod
Algod@AlgodTrading·
If you’re in bittensor, use claude code or setup an openclaw instance, try to find holes or outcompete miners on subnets The better the miner output, the faster bittensor gets full blown adoption Everyone can mine now, just be creative
English
5
9
95
4.4K
QuantFather me-retweet
NiFτy
NiFτy@niftyinvest·
Let this sink in... Only 28% of $TAO is currently staked to Bittensor subnets
NiFτy tweet media
English
2
2
18
1.1K
QuantFather me-retweet
CounterParty TV
CounterParty TV@counterpartytv·
TAO co-founder Const explains the difference between mining BTC and mining TAO “People try to mine Bitcoin, and if they can do it faster, they get paid. Bittensor takes that same idea and turns it into a system where people compete to solve problems, get evaluated, and the best performers earn more, over and over again. We took the Bitcoin model, abstracted it, and applied it to everything”
English
5
13
109
7.8K
QuantFather me-retweet
Jolly Green Investor 🍀
Jolly Green Investor 🍀@jollygreenmoney·
This AI startup is pre-revenue and even pre-product (nothing launched yet) and has a $4.5B valuation Bittensor $TAO has 128 startups, most of which have launched products, several with revenue, and the entire ecosystem has a market cap of $3.5B Ok 🤓
Jolly Green Investor 🍀 tweet media
English
1
3
26
1.7K
QuantFather me-retweet
@jason
@jason@Jason·
I’m not the guy for $tao — the people building the subnets and the customers the attack are “the guys” I bet on things that are extremely speculative and I lose my money on 80%+ of startups we invest in Also. I “bet to learn” often… which means after some cursory investigations into a technology I will make a series of bets that drag me deep into the rabbit hole $tao is exactly that kind of bet—to learn more 1. Not financial advice 2. Do your own research 3. It could got zero as easily as $500b market cap in my mind 4. No crying in the casino please
English
25
6
189
8.9K
Jack
Jack@Jackkk·
Threadguy thinks Jason Calacanis is to TAO what Naval was for Zcash “You never want to be the guy that’s responsible for the thing going up, you just don’t. But somebody always is and Jason has decided he’s going to be the guy for TAO. It’s good enough for me” “The thing about Crypto is that the narrative is driven by a couple hundred people, you don't have earnings reports and all this stuff to fall back on so the whole thing becomes a narrative game”
English
13
7
98
13.9K
QuantFather me-retweet
Michael Parker
Michael Parker@bittensormax·
$904 million in vol today on kraken $TAO 1/3 of the entire MCAP shifted to winners hands. Thank you paper hands.
English
1
1
29
752
QuantFather me-retweet
Keith Singery
Keith Singery@KeithSingery·
Bittensor subnets are $TAO lock-up mechanisms
English
3
8
132
6.6K
QuantFather me-retweet
Cade O'Neill
Cade O'Neill@CadeONeill·
$TAO is the AI Bitcoin • Permission-less networks (Both) • Decentralized (Both) • Miners ($BTC) vs Validators ($TAO) • Proof of work ($BTC) vs Proof of intelligence ($TAO) • Fixed supply (Both) • Compute ($BTC) vs Rewards ($TAO) $TAO Price: $305 Market Cap: 3.4B (FDV: 6.4B) ATH: $767 Listed on: Binance, Coinbase, Upbit, Kucoin,... Excited to see the growth of $TAO and with the subnets such as SN3 Templar @tplr_ai getting the attention they deserve, the network is looking very promising. #crypto #bittensor #nvidea #tao #altcoins
English
7
19
152
6.2K
QuantFather me-retweet
Lukiτa
Lukiτa@LukitaTao·
TAO is the future.
English
0
2
24
374
QuantFather me-retweet
Barbie True Blue
Barbie True Blue@Pop_Collapse·
I mean, it's not surprising. And if you're surprised, then you haven't been paying attention or you haven’t taken the time to understand what’s really going on under the hood of Bittensor. $TAO
Barbie True Blue tweet media
English
1
3
30
900
QuantFather me-retweet
The Bittensor Netrunner - TAO -
Holy F*ck Another huge Bittensor subnet win! $TAO #SN24
Quasar@QuasarModels

This is Quasar Attention, the mechanism behind the upcoming Quasar models, designed to support context lengths of up to 5 million tokens. Attention has long been a bottleneck for processing extended context. Standard attention mechanisms struggle to scale beyond ~200k tokens in training, creating a ceiling on how much information models can reliably use. One approach to solving this has been linear attention methods, such as gated delta attention (used in Qwen 3.5) or Kimi delta attention. These improve efficiency and allow longer sequences, but introduce trade-offs: instability at extreme lengths, quality degradation, and in practice, they are not strictly linear. Quasar Attention takes a different approach. It uses a continuous-time formulation, implemented as a fully matrix-based system rather than relying on vector-state approximations. In practice, this improves stability, reduces cost, and maintains performance as sequence length increases. In internal stress tests at 50 million tokens, KDA-based approaches begin to lose stability, while Quasar Attention remains stable. This allows performance to hold as sequence length increases, rather than degrading beyond a fixed threshold. On BABILong, a Quasar-based model pretrained on 20B tokens and fine-tuned on 16k sequences was evaluated on contexts ranging from 1 million to 10 million tokens, maintaining consistent performance across that range. By contrast, models using gated delta attention show significant degradation at longer lengths, in some cases dropping to ~10% performance at 10 million tokens. (Note: results are indicative; setups are not directly comparable) On RULER benchmarks, a Quasar-10B model (built on Qwen 3.5 with frozen base weights and Quasar Attention added), pretrained on 200B tokens, achieved 87% at 1 million tokens, outperforming significantly larger baselines, including Qwen3 80B, under the same evaluation conditions. Taken together, this points to a shift in where long-context performance is won or lost: not in model size alone, but in the attention mechanism itself. Quasar Attention represents a step change in long-context modelling, setting a new standard for stability and performance at scale. We thank @TargonCompute for the compute and for being our compute provider and long-term partner in training the upcoming Quasar models Here is the link to our paper 👇

English
1
9
52
3.2K
QuantFather me-retweet
Finance Freeman 🇺🇸
Finance Freeman 🇺🇸@FinanceFreeman·
🚨 BITTENSOR IS THE NASDAQ OF AI? dTAO turns Bittensor into something closer to a capital market for AI subnets: -> Each subnet has its own token (alpha) -> These tokens are priced via supply/demand (AMM-style markets) -> Higher demand → more TAO emissions → more rewards No committees. No insiders. Just markets. Credit to the builders pushing $dTAO forward 🫡
English
1
6
41
1.4K