Isak Westerlund

5.2K posts

Isak Westerlund banner
Isak Westerlund

Isak Westerlund

@westis96

Exploring Amortized Inference, Language and Speech.

🇪🇺 Katılım Mart 2014
5K Takip Edilen934 Takipçiler
Isak Westerlund retweetledi
Lucas Nestler
Lucas Nestler@Clashluke·
HeavyBall 3.0.0 is finally out. Key features: * FSDP * DDP * End-to-End Compilation (2.5x speedup) * Higher-precision PSGDKron (grey, vs. HB2's blue) * Faster Muon and SOAP * PSGD-PRO (yellow) * LATHER, a SOAP-like optimizer * HyperBall * explicit `consume_grad` * simplified API
Lucas Nestler tweet media
English
5
13
109
8K
Erik Schluntz
Erik Schluntz@ErikSchluntz·
One of my weekend projects has been training Chess Transformers on @modal (h/t to @charles_irl for introducing it to me!) Claude already knows how to use it, you just tell it "run a hyper param sweep of X on modal" and it happens
Erik Schluntz tweet media
English
2
1
18
1.9K
Raphael Pisoni
Raphael Pisoni@ml_4rtemi5·
I'm open sourcing all my code for scaled RBF-Attention. If you want to roast my triton knowledge or want to check how far you have to scale things to make it break, feel free to have a look!😅 github.com/4rtemi5/rbf_at…
English
2
0
9
290
Raphael Pisoni
Raphael Pisoni@ml_4rtemi5·
For some reason I decided to swap out standard dot-product attention for a scaled-rbf kernel. Pretty much expected it to fail to converge or be impossibly slow but the scaled-rbf-attention is getting unexpectedly good results?? 👇
Raphael Pisoni tweet media
English
1
2
19
11.5K
Isak Westerlund retweetledi
Adina Yakup
Adina Yakup@AdinaYakup·
Matrix-Game 3.0🔥real-time interactive world models from @Skywork_ai huggingface.co/Skywork/Matrix… ✨ MIT license ✨ 720p @ 40FPS with a 5B model ✨ Minute-long memory consistency ✨ Unreal + AAA + real-world data ✨ Scales up to 28B MoE
English
10
104
627
42.4K
ellen livia ᯅ 🇺🇸🇮🇩
Starting an AI Researcher group chat. The space is growing fast! Comment “literature review” to join.
English
871
28
755
57.5K
@levelsio
@levelsio@levelsio·
Okay let's see who can reply to this
English
2.5K
16
2.2K
1M
Isak Westerlund retweetledi
Wildminder
Wildminder@wildmindai·
Covo-Audio (7B) -full-duplex LALM from Tencent. - Qwen2.5-7B + Whisper - Listens and speaks simultaneously (barge-in support). - No separate ASR or TTS pipelines. - Decoupled intelligence/speaker for voice cloning. - 8M hours of audio training. huggingface.co/tencent/Covo-A…
Wildminder tweet media
English
0
19
113
5.8K
Radiance Fields
Radiance Fields@RadianceFields·
I'm giving away a NVIDIA RTX PRO 6000, but you only have three days left to enter. Also my capture app, @SplatK1ng, is now available in the EU! Thank you to NVIDIA for providing the GPU and hosting me at GTC.
English
80
6
79
7K
Isak Westerlund retweetledi
Conor Heins
Conor Heins@conorheins·
pymdp 1.0.0 is here: batched, autodifferentiable, JIT-compiled active inference in JAX: github.com/infer-actively… This release brings: GPU/TPU-ready active inference autodiff through inference, planning and learning easy parallelization and batching with vmap()
English
2
24
98
8.4K
Isak Westerlund retweetledi
Isak Westerlund retweetledi
Sophia Tang
Sophia Tang@_sophia_tang_·
New tutorial paper on the “Foundations of Schrödinger Bridges for Generative Modeling” is out on arXiv! 🧩 📖 arXiv: arxiv.org/abs/2603.18992 🔮 Project Website: sophtang.github.io/foundations-of… With 220 pages and 24 figures, this guide builds the theoretical foundations of Schrödinger bridges from the ground up, unifying the broad field of generative modeling with a single guiding principle: construct an optimal stochastic bridge between distributions while minimizing deviation from a reference process. The rapid progress in generative modeling has made the field increasingly difficult to navigate from a foundational perspective, which motivated me to develop a resource that builds the core concepts needed to understand and contribute to new advances. This guide contains intuitive explanations and step-by-step proofs covering: 🧩 The dynamic Schrödinger bridge formulation, lifting optimal transport to continuous-time stochastic processes between distributions, with direct connections to diffusion models, score-based methods, and flow matching. 🧩 A comprehensive toolkit for constructing Schrödinger bridges from first principles, describing stochastic optimal control, forward–backward SDEs, Doob’s h-transform, and Markov and reciprocal projections. 🧩 Extensions to complex and real-world problem settings, including the multi-marginal, unbalanced, discrete SB problems, highlighting the flexibility of the Schrödinger bridge framework in describing complex dynamical systems. 🧩 Practical, scalable algorithms for training and inference of dynamic Schrödinger bridges across modern generative modeling tasks. More details in the thread 👇🏻
Sophia Tang tweet media
English
7
145
887
43.6K
Isak Westerlund retweetledi
Albert Gu
Albert Gu@_albertgu·
The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!
Albert Gu tweet media
English
38
313
1.6K
427.7K
Isak Westerlund retweetledi
Karsten Kreis
Karsten Kreis@karsten_kreis·
📢📢 Proteina-Complexa 📢📢 Atomistic Binder Design with Generative Pretraining and Test-Time Compute + Experimental Validation at Scale ⭐️ Project page (research.nvidia.com/labs/genair/pr…) for: 📜 Method paper (ICLR 2026 Oral) 🧬 Wet lab paper 🛠️ Code & models 📁 Data 🧵 Thread (1/n)
English
3
33
121
12.5K
Isak Westerlund retweetledi
Isak Westerlund retweetledi
Mohammed AlQuraishi
Mohammed AlQuraishi@MoAlQuraishi·
New OpenFold3 preview out! (OF3p2) It closes the gap to AlphaFold3 for most modalities. Most critically, we're releasing everything, including training sets & configs, making OF3p2 the only current AF3-based model that is functionally trainable & reproducible from scratch🧵1/9
Mohammed AlQuraishi tweet media
English
8
185
675
53.4K
Isak Westerlund retweetledi
Shuangfei Zhai
Shuangfei Zhai@zhaisf·
Say hi to Exclusive Self Attention (XSA), a (nearly) free improvement to Transformers for LM. Observation: for y = attn(q, k, v), yᵢ and vᵢ tend to have a very high cosine similarity Fix: exclude vᵢ from yᵢ via zᵢ = yᵢ - (yᵢᵀvᵢ)vᵢ/‖vᵢ‖² Result: better training/val loss across model sizes; increasing gains as sequence length grows. See more: arxiv.org/abs/2603.09078
Shuangfei Zhai tweet media
English
32
81
944
215K
Tech Pro Dude
Tech Pro Dude@OneTweetAwayMan·
@AuroraIntel A tomahawk impact has a way bigger explosion... You don't even see any fire here...
English
7
0
103
7K
Aurora Intel
Aurora Intel@AuroraIntel·
It’s a tomahawk, end of.
Matt Tardio@angertab

The evidence is clear, this is not a Tomahawk Iran alleged that an American Tomahawk Cruise Missile hit a school (buried in an IRGC compound) in southern Iran, killing 165 people. Analysis of a newly released video tells a different story. ANALYSIS: A-I analysis confirms the wings of the munition in question sit about 40%-45% down the body of the munition. On a Tomahawk, the wings sit roughly 49%-50% down the body of the munition. The wing to body ratio of the munition in question matches an Iranian Kh-55–derived Land Attack Cruise Missile. Further, the video shows the munition in a steep dive angle for the final attack phase. This places the attack angle at approximately 70%, which is the max attack angle for a Tomahawk. The attack angle does not match the KH-55. That angle maxes out at about 55 degrees. So what would have caused this? CONCLUSION: The wing positioning alone makes the munition impossible to be a Tomahawk. The attack angle is at the max of the Tomahawk's capabilities. The typical attack angle for a Tomahawk is much lower than 70 degrees. The typical angle is between 20-45 degrees. This is due to the flight pattern of Tomahawks. They fly very low horizontally to the ground, often only 50-100 meters AGL to avoid detection and interception. In order to achieve that attack angle, the missile would have had to gain altitude several kilometers away, this would leave it vulnerable for interception. This is highly unlikely on the first day of US attacks. So what could have caused this? Simply put, GPS jamming of an Iranian KH-55. The USA and Israel were, and continue to actively jam the Iranian airspace. If the KH-55's signal was jammed, this could result in an uncontrollable dive. Think of GPS jamming more like disorienting the missile. On 03/07 President Trump stated: “No, in my opinion, based on what I’ve seen, that was done by Iran.” Today, I concur with the President.

English
283
177
3.7K
237.9K