crux

489 posts

crux

crux

@macrocrux

CTO & Co-Founder @ Macrocosmos

Katılım Kasım 2023
173 Takip Edilen2.6K Takipçiler
crux retweetledi
Macrocosmos
Macrocosmos@MacrocosmosAI·
While modern AI capabilities continue to grow, their thoughts remain opaque to us. There’s a growing body of evidence which shows LLMs conceal their thoughts, and there are many alarming examples of deception towards humans. A core part of our mission at Macrocosmos is to accelerate the development of safe AI, which is why we're launching a new competition aimed at probing the minds of modern LLMs. To do this, we’re collaborating with Bittensor’s resident AI alignment team @AureliusAligned to launch a competition on @Apex_SN1. Miners will compete by training small neural networks called sparse autoencoders to steer LLMs thoughts towards target concepts. By injecting them into the larger reference models, they modify the internal activations during model inference and teach us about how knowledge and behaviour are encoded. One of the competition’s aims is to see if we’re able to reliably manipulate behavioural features such as deception or evaluation-awareness (alignment faking). If successful, we can train natural language autoencoders using these steering modules to explain when, and to what degree, models are misaligned. @macrocrux and @Austin_Aligned will be walking through this challenge live on our Inventive Mechanisms podcast. 📍 Location: X livestream (on the @MacrocosmosAI X account) 📅 Date: Thursday 28th May 🕒 Time: 3pm UK time
Macrocosmos tweet media
English
5
10
22
1.9K
crux
crux@macrocrux·
Something very important is being brought into existence right now. Bricks have been laid over the last 18 months and now the tech is coming together in a way that makes commercialization possible. If this shit works, it will completely disrupt the economics of training large models and the floodgates will burst open. @Pluralis and @MacrocosmosAI are the only teams who I think can clearly see the shape of this opportunity right now. Agora is a strong first step towards this future. After spending a bit of time on their platform there's a form factor to it which feels "natural", almost inevitable in hindsight. This subfield of training is really starting to take shape. Our IOTA team has been very, very busy for the last few months. Can't wait to share more soon.
Pluralis Research@Pluralis

Today we're releasing Agora: the first ever pretraining stack that allows non-collocated consumer GPUs to be competitive with centralized clusters Agora is 15x faster than Megatron-LM in this setting and is only 1.5x less efficient in terms of tokens per unit compute than TorchTitan on H100s, despite running on devices that have no NVLink or InfiniBand support.

English
3
15
59
4.9K
crux
crux@macrocrux·
@GoodfireAI Is there any evidence that suggests that physical laws are embedded in a way that preserves invariant quantities and symmetries?
English
0
0
1
19
Goodfire
Goodfire@GoodfireAI·
The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)
Goodfire@GoodfireAI

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

English
22
150
1K
157.5K
crux
crux@macrocrux·
We’re speeding up distributed training on IOTA by leveraging Apex competitions. We thought miners had almost maxxed out the throughput.. but then came a second wave where top submissions started forecasting the future state of the network in order to stay ahead of network congestion. Now it’s going off!
Apex・SN1@Apex_SN1

The @IOTA_SN9 Simulator competition: 22k+ submissions, ~29% faster epoch times, R&D insights for distributed AI training. Two months ago we launched a competition to minimise epoch completion time inside a digital twin of the IOTA network by optimising activation routing and balancing. The goal: to improve speed and efficiency within our distributed training network. The result: 22,967 submissions across 57 rounds. Epoch times are now ~29% faster on average compared to the start of the competition, and up to 39% faster on some network configurations. Our subnets are an ecosystem - the IOTA Simulator is the clearest example: insights from miners feed directly into how IOTA engineers iterate, both for current participants and for future clients once we productise. Several R&D insights have arisen. Let's isolate one in particular. The competition highlighted a specific architectural challenge: downstream congestion dominates throughput more than raw processing speed does. Routing too many activations to the fastest miner doesn't solve the problem, as doing so fills its queues and slows it down overall. Top submissions converged on the same fix: track how full each miner's downstream queues were getting, and skip the ones building up backlog, even if they were nominally the fastest pipeline target. In other words, routing decisions must consider downstream capacity, not just downstream speed. Seeking the fastest miner in IOTA only makes sense if the algorithm factors in the speed their queues fill and empty, otherwise it accentuates the bottleneck. As a result, subnet 1 draws in techniques from frontier labs. This setup best fits the structure of Capacity-Aware Load Balancing, applied in many settings, with Mixture of Experts models like DeepSeek and Mixtral using it to route tokens to different neural networks during inference tasks, Google using it to prevent congestion on its cloud services, and even Amazon using the same principles for optimising its physical supply chain. In trying to opimise IOTA, miners are learning second order corrections to peer to peer networks.

English
3
6
43
5.3K
crux retweetledi
TAO Times ⚡️
TAO Times ⚡️@taotimesdotai·
As co-founder of @MacrocosmosAI, @macrocrux has built subnets spanning inference, pretraining, and fine-tuning of models His advice? ‘Prove that successful execution will give them an edge in the market.’
TAO Times ⚡️ tweet media
English
1
3
14
480
crux
crux@macrocrux·
@NousResearch damn you guys are cooking this week
English
0
0
2
249
Nous Research
Nous Research@NousResearch·
Today we release Lighthouse Attention, a selection-based hierarchical attention for long-context pre-training that delivers a 1.4-1.7× wall-clock speedup at 98K context. It runs the same forward+backward pass ~17× faster than standard attention at 512K context on a single B200, without a custom sparse attention kernel, a straight-through estimator, or an auxiliary loss. During training, queries, keys, and values are pooled symmetrically into a multi-resolution pyramid. We then score every pyramid heads, and a top-k cascade selects a small hierarchical dense sub-sequence, and after a sorting pass that enforces causality, we use standard attention for token mixing. A brief full attention resume at the end converts the checkpoint back into a competent dense-attention model. Validated this using 530M parameter Llama-3 models across 50B tokens, with up to 1M-token benchmarks across 32 B200s under context parallelism. The work on Lighthouse Attention was led by @bloc97_, @SubhoGhosh02, and @theemozilla.
Nous Research tweet media
English
52
230
2K
157.4K
crux
crux@macrocrux·
@GoodfireAI Love your research guys. Keep it up!
English
1
0
3
204
crux
crux@macrocrux·
Very cool to see a Bittensor hackathon in India building with our @Data_SN13 API
HackQuest@HackQuest_

HackQuest x Bittensor Co-Learning Camp | India Recap 🇮🇳 150+ registrations, 80+ attendees, 45+ graduates, 30 winners. Over 5 days at Galgotias University, builders came together to: ⚡ Learn the fundamentals of @opentensor 🛠 Complete HackQuest learning tracks ⛏️ Become miners on Data Universe (SN13) @Data_SN13 & Sportstensor (SN41) @sportstensor 🤝 Build alongside mentors, developers, and future founders But don’t just take it from us — check out some firsthand reflections from builders who experienced the camp themselves 👇

English
0
5
26
1.3K
crux
crux@macrocrux·
Novel idea put forth by @ConnitoAI. To me it remains to be seen if domain specialist experts can be truly trained in isolation in the way the team propose, but if this does actually work it maps very nicely onto decentralized networks and has an interesting scaling trajectory.
Connito AI@ConnitoAI

We’re excited to share the Connito whitepaper V1: a framework for decentralized, composable MoE adaptation. We trains sparse expert subsets, validates updates through Proof-of-Loss, and turns open-model improvement into a distributed expert-level market. Read the whitepaper: connito.ai/whitepaper

English
0
5
13
623
crux retweetledi
Connito AI
Connito AI@ConnitoAI·
We’re excited to share the Connito whitepaper V1: a framework for decentralized, composable MoE adaptation. We trains sparse expert subsets, validates updates through Proof-of-Loss, and turns open-model improvement into a distributed expert-level market. Read the whitepaper: connito.ai/whitepaper
English
9
27
106
26.3K
crux retweetledi
Moonlit
Moonlit@moonlit_ds·
SAY IT LOUDER FOR THE PEOPLE IN THE BACK!!! bittensor:native earnyourmac.com
Moonlit tweet media
English
0
5
26
937
crux retweetledi
Targon
Targon@TargonCompute·
Today we are excited to announce Targon Supply Portal. Targon Supply Portal allows compute suppliers to easily onboard and start monetizing their idle compute capacity and track earnings and status of their nodes on Targon. There are 2 ways to start earning: → Permissionless: onboard directly to the Bittensor blockchain and start earning SN4 alpha, no conversations necessary. → Targon Managed: let us handle the blockchain, get paid out weekly in fiat or a currency of your choice. Head to supply.targon.com now to start earning and achieve 100% utilization on your hardware.
Targon tweet media
English
15
71
349
36.9K
crux
crux@macrocrux·
What you see are two neural networks playing against each other. Both models were trained by @Apex_SN1 miners, this was the replay of the first round of our newest competition. cool to see the models using an effective strategy called “Hamiltonian filling”. The winning model is open sourced, which means this is now the minimum performance to beat. Many rounds to go. Let’s see if we reach SOTA performance!
Apex・SN1@Apex_SN1

RL Tron’s first round has ended. Let's take a peek at the winning miner. In this game, a close face-off in the middle of the grid led to a war of attrition between these two players. All duels are recorded and accessible on our site.

English
0
3
5
500