crux

489 posts

crux

@macrocrux

CTO & Co-Founder @ Macrocosmos

Katılım Kasım 2023

173 Takip Edilen2.6K Takipçiler

crux retweetledi

Macrocosmos@MacrocosmosAI·3d

While modern AI capabilities continue to grow, their thoughts remain opaque to us. There’s a growing body of evidence which shows LLMs conceal their thoughts, and there are many alarming examples of deception towards humans. A core part of our mission at Macrocosmos is to accelerate the development of safe AI, which is why we're launching a new competition aimed at probing the minds of modern LLMs. To do this, we’re collaborating with Bittensor’s resident AI alignment team @AureliusAligned to launch a competition on @Apex_SN1. Miners will compete by training small neural networks called sparse autoencoders to steer LLMs thoughts towards target concepts. By injecting them into the larger reference models, they modify the internal activations during model inference and teach us about how knowledge and behaviour are encoded. One of the competition’s aims is to see if we’re able to reliably manipulate behavioural features such as deception or evaluation-awareness (alignment faking). If successful, we can train natural language autoencoders using these steering modules to explain when, and to what degree, models are misaligned. @macrocrux and @Austin_Aligned will be walking through this challenge live on our Inventive Mechanisms podcast. 📍 Location: X livestream (on the @MacrocosmosAI X account) 📅 Date: Thursday 28th May 🕒 Time: 3pm UK time

English

1.9K

crux@macrocrux·3d

Something very important is being brought into existence right now. Bricks have been laid over the last 18 months and now the tech is coming together in a way that makes commercialization possible. If this shit works, it will completely disrupt the economics of training large models and the floodgates will burst open. @Pluralis and @MacrocosmosAI are the only teams who I think can clearly see the shape of this opportunity right now. Agora is a strong first step towards this future. After spending a bit of time on their platform there's a form factor to it which feels "natural", almost inevitable in hindsight. This subfield of training is really starting to take shape. Our IOTA team has been very, very busy for the last few months. Can't wait to share more soon.

Pluralis Research@Pluralis

Today we're releasing Agora: the first ever pretraining stack that allows non-collocated consumer GPUs to be competitive with centralized clusters Agora is 15x faster than Megatron-LM in this setting and is only 1.5x less efficient in terms of tokens per unit compute than TorchTitan on H100s, despite running on devices that have no NVLink or InfiniBand support.

English

4.9K

crux@macrocrux·4d

@GoodfireAI Is there any evidence that suggests that physical laws are embedded in a way that preserves invariant quantities and symmetries?

English

Goodfire@GoodfireAI·4d

@macrocrux more neural geometry posts coming soon!

English

Goodfire@GoodfireAI·4d

The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)

Goodfire@GoodfireAI

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

English

150

157.5K

crux@macrocrux·4d

We’re speeding up distributed training on IOTA by leveraging Apex competitions. We thought miners had almost maxxed out the throughput.. but then came a second wave where top submissions started forecasting the future state of the network in order to stay ahead of network congestion. Now it’s going off!

Apex・SN1@Apex_SN1

The @IOTA_SN9 Simulator competition: 22k+ submissions, ~29% faster epoch times, R&D insights for distributed AI training. Two months ago we launched a competition to minimise epoch completion time inside a digital twin of the IOTA network by optimising activation routing and balancing. The goal: to improve speed and efficiency within our distributed training network. The result: 22,967 submissions across 57 rounds. Epoch times are now ~29% faster on average compared to the start of the competition, and up to 39% faster on some network configurations. Our subnets are an ecosystem - the IOTA Simulator is the clearest example: insights from miners feed directly into how IOTA engineers iterate, both for current participants and for future clients once we productise. Several R&D insights have arisen. Let's isolate one in particular. The competition highlighted a specific architectural challenge: downstream congestion dominates throughput more than raw processing speed does. Routing too many activations to the fastest miner doesn't solve the problem, as doing so fills its queues and slows it down overall. Top submissions converged on the same fix: track how full each miner's downstream queues were getting, and skip the ones building up backlog, even if they were nominally the fastest pipeline target. In other words, routing decisions must consider downstream capacity, not just downstream speed. Seeking the fastest miner in IOTA only makes sense if the algorithm factors in the speed their queues fill and empty, otherwise it accentuates the bottleneck. As a result, subnet 1 draws in techniques from frontier labs. This setup best fits the structure of Capacity-Aware Load Balancing, applied in many settings, with Mixture of Experts models like DeepSeek and Mixtral using it to route tokens to different neural networks during inference tasks, Google using it to prevent congestion on its cloud services, and even Amazon using the same principles for optimising its physical supply chain. In trying to opimise IOTA, miners are learning second order corrections to peer to peer networks.

English

5.3K

crux retweetledi

TAO Times ⚡️@taotimesdotai·15 May

As co-founder of @MacrocosmosAI, @macrocrux has built subnets spanning inference, pretraining, and fine-tuning of models His advice? ‘Prove that successful execution will give them an edge in the market.’

English

480

crux@macrocrux·15 May

@NousResearch damn you guys are cooking this week

English

249

Nous Research@NousResearch·15 May

Paper: arxiv.org/abs/2605.06554 Code: github.com/ighoshsubho/li… HF: huggingface.co/papers/2605.06… Blog: nousresearch.com/lighthouse-att…

Dansk

11.4K

Nous Research@NousResearch·15 May

Today we release Lighthouse Attention, a selection-based hierarchical attention for long-context pre-training that delivers a 1.4-1.7× wall-clock speedup at 98K context. It runs the same forward+backward pass ~17× faster than standard attention at 512K context on a single B200, without a custom sparse attention kernel, a straight-through estimator, or an auxiliary loss. During training, queries, keys, and values are pooled symmetrically into a multi-resolution pyramid. We then score every pyramid heads, and a top-k cascade selects a small hierarchical dense sub-sequence, and after a sorting pass that enforces causality, we use standard attention for token mixing. A brief full attention resume at the end converts the checkpoint back into a competent dense-attention model. Validated this using 530M parameter Llama-3 models across 50B tokens, with up to 1M-token benchmarks across 32 B200s under context parallelism. The work on Lighthouse Attention was led by @bloc97_, @SubhoGhosh02, and @theemozilla.

English

230

157.4K

crux@macrocrux·14 May

@GoodfireAI Love your research guys. Keep it up!

English

204

Goodfire@GoodfireAI·14 May

Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used for more than just math! (1/6)

Goodfire@GoodfireAI

English

122

556

4.3K

926.7K

crux@macrocrux·14 May

After hundreds of ablations on 1.5B models we now have what we need to begin scaling IOTA. In the last two weeks we've quietly scaled up by 10x, and we're nowhere near the limits of this approach. Today, a 15B model, sliced up into 32 layers, training over the internet.

Felix Quinque@Felix_Quinque

Currently running the biggest DPP model so far as a part of scaling test series for our actual model launch on @IOTA_SN9 . Gonna be fun.

English

8.6K

crux@macrocrux·14 May

What i hear is 16x sequence compression for r% of training. This is extremely cool and I can't wait to try it on IOTA.

Nous Research@NousResearch

Today we release Token Superposition Training (TST), a modification to the standard LLM pretraining loop that produces a 2-3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data. During the first third of training, the model reads and predicts contiguous bags of tokens, averaging their embeddings on the input side and predicting the next bag with a modified cross-entropy on the output side. For the remainder of the run, it trains normally on next-token prediction. The inference-time model is identical to one produced by conventional pretraining. Validated at 270M, 600M, and 3B dense scales, and at 10B-A1B MoE. The work on TST was led by @bloc97_, @gigant_theo, and @theemozilla.

English

887

crux@macrocrux·14 May

Play head to head against the decentralized agent swarm

Apex・SN1@Apex_SN1

Think you can beat the machine? Play against the winning RL Tron model from each round. As these are reinforcement learning AI models, the winning submission rides autonomously on the playing field, meaning you can’t simply memorise its tactics. You can only rely on skill. So far, the round 2 winner has won against 55% of human rivals. Can you outsmart it?

English

412

crux@macrocrux·14 May

Very cool to see a Bittensor hackathon in India building with our @Data_SN13 API

HackQuest@HackQuest_

HackQuest x Bittensor Co-Learning Camp | India Recap 🇮🇳 150+ registrations, 80+ attendees, 45+ graduates, 30 winners. Over 5 days at Galgotias University, builders came together to: ⚡ Learn the fundamentals of @opentensor 🛠 Complete HackQuest learning tracks ⛏️ Become miners on Data Universe (SN13) @Data_SN13 & Sportstensor (SN41) @sportstensor 🤝 Build alongside mentors, developers, and future founders But don’t just take it from us — check out some firsthand reflections from builders who experienced the camp themselves 👇

English

1.3K

crux@macrocrux·13 May

Novel idea put forth by @ConnitoAI. To me it remains to be seen if domain specialist experts can be truly trained in isolation in the way the team propose, but if this does actually work it maps very nicely onto decentralized networks and has an interesting scaling trajectory.

Connito AI@ConnitoAI

We’re excited to share the Connito whitepaper V1: a framework for decentralized, composable MoE adaptation. We trains sparse expert subsets, validates updates through Proof-of-Loss, and turns open-model improvement into a distributed expert-level market. Read the whitepaper: connito.ai/whitepaper

English

623

crux retweetledi

Connito AI@ConnitoAI·13 May

English

106

26.3K

crux retweetledi

Moonlit@moonlit_ds·11 May

SAY IT LOUDER FOR THE PEOPLE IN THE BACK!!! bittensor:native earnyourmac.com

English

937

crux@macrocrux·9 May

Interesting analysis of how bittensor weighs up against similar protocols in terms of brand awareness, and how selected subnets compare to web2 counterparts. nice job @subnetai, this is important framing

subnet.ai@subnetai

Our Bittensor Brand Performance Report is now live. We analyzed 12 months of media coverage, social presence, and search visibility across the Bittensor ecosystem, benchmarking it against both crypto and Web2 competitors. Here’s what we found: The seven subnets we analyzed have collectively received over $90M in owner emissions and are shipping real products that compete with well-funded Web2 companies, yet they remain largely invisible outside the ecosystem. The good news is that the few subnets that have invested in their brands are already seeing results. The opportunity is there for the rest to follow. Featuring @chutesai_ (SN64), @TargonCompute (SN4), @webuildscore (SN44), @BitMindAI (SN34), @ridges_ai (SN62), @VantaTrading (SN8), and @lium_io (SN51). Read the full report here subnet.ai/reports/bitten…

English

512

crux retweetledi

Targon@TargonCompute·6 May

Today we are excited to announce Targon Supply Portal. Targon Supply Portal allows compute suppliers to easily onboard and start monetizing their idle compute capacity and track earnings and status of their nodes on Targon. There are 2 ways to start earning: → Permissionless: onboard directly to the Bittensor blockchain and start earning SN4 alpha, no conversations necessary. → Targon Managed: let us handle the blockchain, get paid out weekly in fiat or a currency of your choice. Head to supply.targon.com now to start earning and achieve 100% utilization on your hardware.

English

349

36.9K

crux@macrocrux·9 May

As above So below

Goodfire@GoodfireAI

Read the first 2 posts in the series: goodfire.ai/research/the-w… Forthcoming posts will go into more detail on: - an example mechanism that operates on manifolds - unsupervised discovery of manifolds + the connection to SAE features - in-context geometry

English

488

crux@macrocrux·9 May

@vincentweisser @PrimeIntellect an insanely good landing page

English

109

Vincent Weisser@vincentweisser·8 May

😍 primeintellect.ai

QME

164

9.9K

crux@macrocrux·9 May

What you see are two neural networks playing against each other. Both models were trained by @Apex_SN1 miners, this was the replay of the first round of our newest competition. cool to see the models using an effective strategy called “Hamiltonian filling”. The winning model is open sourced, which means this is now the minimum performance to beat. Many rounds to go. Let’s see if we reach SOTA performance!

Apex・SN1@Apex_SN1

RL Tron’s first round has ended. Let's take a peek at the winning miner. In this game, a close face-off in the middle of the grid led to a war of attrition between these two players. All duels are recorded and accessible on our site.

English

500

Keşfet

@AureliusAligned @Apex_SN1 @Austin_Aligned @MacrocosmosAI @Pluralis @GoodfireAI @NousResearch @bloc97_