🅳🅾︎🅼🅴

3.4K posts

🅳🅾︎🅼🅴

@dome_cs

From Automata theory, Buddhist doctrines, to Thai cuisine. Stories told by a math enthusiast.

Katılım Temmuz 2013

433 Takip Edilen91 Takipçiler

🅳🅾︎🅼🅴 retweetledi

Sakana AI@SakanaAILabs·2d

Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation pub.sakana.ai/diffusionblocks What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: arxiv.org/abs/2506.14202 GitHub: github.com/SakanaAI/Diffu… 🐟

GIF

English

313

758.1K

🅳🅾︎🅼🅴 retweetledi

Roan@RohOnChain·3 May

Anthropic pays $750,000+ a year for engineers who can build LLM architectures from scratch. Stanford taught the entire thing in 1 hour lecture & released it for free. Bookmark & watch this today before someone takes it down.

English

117

1.6K

10.5K

2.5M

🅳🅾︎🅼🅴 retweetledi

Ralph Sueppel@macro_synergy·30 Nis

"Getting the Target Right in Return Prediction": "Transforming the target from raw to standardized or rank-based returns nearly triples predictive accuracy and doubles portfolio returns [based on machine learning]." papers.ssrn.com/sol3/papers.cf…

English

6.6K

🅳🅾︎🅼🅴 retweetledi

Symplectic.Research@QuantSymplectic·27 Nis

Black-Scholes is wrong almost everywhere. And yet, it’s still the language of options markets. The reason: It’s the flat limit of a curved geometric pricing space. The volatility smile? That’s the curvature. Below we see where markets actually live in that space Preprint: papers.ssrn.com/sol3/papers.cf…

English

809

59K

🅳🅾︎🅼🅴 retweetledi

Peter - Cracking Markets@SystematicPeter·21 Nis

A systematic portfolio does not have to be complicated. If I were starting from scratch, I would not begin with 25 exotic strategies and endless optimization. I would start with a few simple, different return drivers: - stock momentum - slow long mean reversion on stocks - faster long mean reversion on stocks - simple intraday system on indices The goal is not to find one perfect system. The goal is to combine simple systems that behave differently, make money in different market conditions, and reduce dependence on any single edge. This chart shows the main strategies in my own portfolio applied to a smaller account, with slippage and commissions included. Simple ideas. Different behavior. One systematic portfolio. Deep dive with detailed statistics, updated daily: crackingmarkets.com/portfolio-cons…

English

111

8.3K

🅳🅾︎🅼🅴 retweetledi

Symplectic.Research@QuantSymplectic·21 Nis

As a grad student working on Hamiltonian systems in General Relativity, I often wondered what the phase-plane approach from dynamical systems theory could tell us about markets. Today I submitted the third paper in that answer: Information Geometry of Market Dynamics: A Pareto Frontier from Contact Geometry. Preprint: papers.ssrn.com/sol3/papers.cf… and code: Zenodo: doi.org/10.5281/zenodo…

English

10.1K

🅳🅾︎🅼🅴 retweetledi

Valeriy M., PhD, MBA, CQF@predict_addict·20 Nis

Solid mathematical ideas almost always outperform contrived engineering tricks. For years deep learning has been dominated by increasingly complex architectural hacks: CNN blocks, attention layers, channel mixers, residual pathways, normalization stacks. Every few years a new architecture is announced as if it were a revolution. One of the most famous examples was Kaiming He and Residual Networks (ResNet). At the time he was paraded around the AI world like a celebrity because residual connections supposedly “solved” deep learning. But these were largely engineering patches. Now something much more interesting appeared. A new architecture called CliffordNet returns to mathematics — specifically Clifford Algebra, developed in the 19th century by William Kingdon Clifford. Instead of stacking arbitrary modules, the model is built around the geometric product uv = u·v + u∧v A single algebraic operation that simultaneously captures inner product structure and geometric interactions. In other words: the math already contains the interaction mechanism. No attention blocks. No mixer layers. No architectural spaghetti. The result: • 77.82% accuracy on CIFAR-100 with only 1.4M parameters • roughly 8× fewer parameters than ResNet-18 And with strict O(N) complexity. The paper even suggests that once geometric interactions are modeled correctly, feed-forward networks become largely redundant. A good reminder for the AI community. Engineering tricks can dominate for years. But eventually mathematics shows up and deletes half the architecture. Paper: [arxiv.org/pdf/2601.06793…) 19th century geometry just walked into computer vision.

English

127

912

83.8K

🅳🅾︎🅼🅴 retweetledi

Quantocracy@Quantocracy·19 Nis

The AutoTune filter [Financial Hacker] dlvr.it/TS6BnG

Deutsch

974

🅳🅾︎🅼🅴 retweetledi

Lianghui Zhu@lianghui_zhu·19 Nis

For a decade, we've made models wider and deeper—but we've barely changed how layers *talk* to each other. Since ResNet's `x + F(x)` in 2015, the depth residual has been the only highway for inter-layer communication. It's time to upgrade the staircase. 🧵

English

240

1.9K

187.9K

🅳🅾︎🅼🅴 retweetledi

XO Labs@xolabs_·17 Nis

x.com/i/article/2045…

ZXX

122

39.5K

🅳🅾︎🅼🅴 retweetledi

Gappy (Giuseppe Paleologo)@__paleologo·16 Nis

Just a reminder that you can do unintuitive things with ordinary differential equations. Oldie but goldie paper.

English

41.8K

🅳🅾︎🅼🅴 retweetledi

Liquidity Goblin@liquiditygoblin·15 Nis

when fitting curves on short dated options, do you fit to the bid / ask respectively and then take the mid of the vols? or do you take the mid of the prices and then take the vol of that? what if neither will give you the full picture? a thread on fitting 0dte curves🧵 1/11

English

259

27.5K

🅳🅾︎🅼🅴 retweetledi

机器之心 JIQIZHIXIN@jiqizhixin·14 Nis

Huge! Recurrent neural networks could match Transformer memory without the quadratic burden! Ali Behrouz from Google and colleagues have cracked it! They present Memory Caching (MC), a simple yet powerful method that lets RNNs store "memory checkpoints" of their internal states. This allows their effective memory to grow with context, offering a flexible trade-off between speed and recall. MC dramatically enhances recurrent models in language modeling and long-context understanding. It significantly closes the performance gap with Transformers on recall tasks and outperforms existing state-of-the-art recurrent models.

English

465

37.2K

🅳🅾︎🅼🅴 retweetledi

annie@_annieversary·13 Nis

what the Fuck arxiv.org/html/2603.2185…

English

188

799

7.9K

1.1M

🅳🅾︎🅼🅴 retweetledi

Roan@RohOnChain·9 Nis

This is the EXACT 12-step methodology Institutional quant desks use to win every single trade. Bookmark & run it through your stack or just pass it directly to your AI coding agent. Most people never reach this layer in their entire lifetime. Full breakdown in article below.

Roan@RohOnChain

x.com/i/article/2037…

English

137

1.1K

165.5K

🅳🅾︎🅼🅴@dome_cs·4 Nis

@plus_vision_div I would be very excited to see a product like this someday. Thank you for your great work!

English

🅳🅾︎🅼🅴@dome_cs·4 Nis

@plus_vision_div I noticed you also make whiteboards, so I wanted to ask—do you have any plans to bring this magnetic writing technology to whiteboards in the future? I think it could solve common issues like dried-out markers and stains from unwiped ink, while also being more eco-friendly.

English

プラス VISION事業部【公式】@plus_vision_div·5 Mar

#Kaiteで描いてみたおはようございます🌸 ひな祭りは過ぎましたが・・・美少女の三人官女を絵が上手な社員に描いてもらいました✨ 鉛筆のように描けるのでイラストにピッタリです✨ #企業公式が朝の挨拶を言い合う #Kaiteパッド

プラス VISION事業部【公式】@plus_vision_div

今日はひな祭り🎎 ちょっとおひなさま落書き🌸 今日は何食べますか？ #Kaiteパッド

日本語

219

🅳🅾︎🅼🅴 retweetledi

Google Research@GoogleResearch·24 Mar

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

GIF

English

5.8K

39K

19.4M

🅳🅾︎🅼🅴 retweetledi

verax@journoverax·19 Mar

x.com/i/article/2033…

ZXX

897

506K

Keşfet

@plus_vision_div @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine