🅳🅾︎🅼🅴

3.4K posts

🅳🅾︎🅼🅴 banner
🅳🅾︎🅼🅴

🅳🅾︎🅼🅴

@dome_cs

From Automata theory, Buddhist doctrines, to Thai cuisine. Stories told by a math enthusiast.

Katılım Temmuz 2013
433 Takip Edilen91 Takipçiler
🅳🅾︎🅼🅴 retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation pub.sakana.ai/diffusionblocks What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: arxiv.org/abs/2506.14202 GitHub: github.com/SakanaAI/Diffu… 🐟
GIF
English
49
313
2K
758.1K
🅳🅾︎🅼🅴 retweetledi
Roan
Roan@RohOnChain·
Anthropic pays $750,000+ a year for engineers who can build LLM architectures from scratch. Stanford taught the entire thing in 1 hour lecture & released it for free. Bookmark & watch this today before someone takes it down.
English
117
1.6K
10.5K
2.5M
🅳🅾︎🅼🅴 retweetledi
Ralph Sueppel
Ralph Sueppel@macro_synergy·
"Getting the Target Right in Return Prediction": "Transforming the target from raw to standardized or rank-based returns nearly triples predictive accuracy and doubles portfolio returns [based on machine learning]." papers.ssrn.com/sol3/papers.cf…
Ralph Sueppel tweet media
English
1
10
74
6.6K
🅳🅾︎🅼🅴 retweetledi
Symplectic.Research
Symplectic.Research@QuantSymplectic·
Black-Scholes is wrong almost everywhere. And yet, it’s still the language of options markets. The reason: It’s the flat limit of a curved geometric pricing space. The volatility smile? That’s the curvature. Below we see where markets actually live in that space Preprint: papers.ssrn.com/sol3/papers.cf…
Symplectic.Research tweet media
English
29
87
809
59K
🅳🅾︎🅼🅴 retweetledi
Peter - Cracking Markets
Peter - Cracking Markets@SystematicPeter·
A systematic portfolio does not have to be complicated. If I were starting from scratch, I would not begin with 25 exotic strategies and endless optimization. I would start with a few simple, different return drivers: - stock momentum - slow long mean reversion on stocks - faster long mean reversion on stocks - simple intraday system on indices The goal is not to find one perfect system. The goal is to combine simple systems that behave differently, make money in different market conditions, and reduce dependence on any single edge. This chart shows the main strategies in my own portfolio applied to a smaller account, with slippage and commissions included. Simple ideas. Different behavior. One systematic portfolio. Deep dive with detailed statistics, updated daily: crackingmarkets.com/portfolio-cons…
Peter - Cracking Markets tweet media
English
5
13
111
8.3K
🅳🅾︎🅼🅴 retweetledi
Symplectic.Research
Symplectic.Research@QuantSymplectic·
As a grad student working on Hamiltonian systems in General Relativity, I often wondered what the phase-plane approach from dynamical systems theory could tell us about markets. Today I submitted the third paper in that answer: Information Geometry of Market Dynamics: A Pareto Frontier from Contact Geometry. Preprint: papers.ssrn.com/sol3/papers.cf… and code: Zenodo: doi.org/10.5281/zenodo…
Symplectic.Research tweet media
English
8
11
41
10.1K
🅳🅾︎🅼🅴 retweetledi
Valeriy M., PhD, MBA, CQF
Valeriy M., PhD, MBA, CQF@predict_addict·
Solid mathematical ideas almost always outperform contrived engineering tricks. For years deep learning has been dominated by increasingly complex architectural hacks: CNN blocks, attention layers, channel mixers, residual pathways, normalization stacks. Every few years a new architecture is announced as if it were a revolution. One of the most famous examples was Kaiming He and Residual Networks (ResNet). At the time he was paraded around the AI world like a celebrity because residual connections supposedly “solved” deep learning. But these were largely engineering patches. Now something much more interesting appeared. A new architecture called CliffordNet returns to mathematics — specifically Clifford Algebra, developed in the 19th century by William Kingdon Clifford. Instead of stacking arbitrary modules, the model is built around the geometric product uv = u·v + u∧v A single algebraic operation that simultaneously captures inner product structure and geometric interactions. In other words: the math already contains the interaction mechanism. No attention blocks. No mixer layers. No architectural spaghetti. The result: • 77.82% accuracy on CIFAR-100 with only 1.4M parameters • roughly 8× fewer parameters than ResNet-18 And with strict O(N) complexity. The paper even suggests that once geometric interactions are modeled correctly, feed-forward networks become largely redundant. A good reminder for the AI community. Engineering tricks can dominate for years. But eventually mathematics shows up and deletes half the architecture. Paper: [arxiv.org/pdf/2601.06793…) 19th century geometry just walked into computer vision.
Valeriy M., PhD, MBA, CQF tweet media
English
25
127
912
83.8K
🅳🅾︎🅼🅴 retweetledi
Quantocracy
Quantocracy@Quantocracy·
The AutoTune filter [Financial Hacker] dlvr.it/TS6BnG
Deutsch
0
2
3
974
🅳🅾︎🅼🅴 retweetledi
Lianghui Zhu
Lianghui Zhu@lianghui_zhu·
For a decade, we've made models wider and deeper—but we've barely changed how layers *talk* to each other. Since ResNet's `x + F(x)` in 2015, the depth residual has been the only highway for inter-layer communication. It's time to upgrade the staircase. 🧵
Lianghui Zhu tweet media
English
18
240
1.9K
187.9K
🅳🅾︎🅼🅴 retweetledi
Gappy (Giuseppe Paleologo)
Gappy (Giuseppe Paleologo)@__paleologo·
Just a reminder that you can do unintuitive things with ordinary differential equations. Oldie but goldie paper.
Gappy (Giuseppe Paleologo) tweet media
English
11
96
1K
41.8K
🅳🅾︎🅼🅴 retweetledi
Liquidity Goblin
Liquidity Goblin@liquiditygoblin·
when fitting curves on short dated options, do you fit to the bid / ask respectively and then take the mid of the vols? or do you take the mid of the prices and then take the vol of that? what if neither will give you the full picture? a thread on fitting 0dte curves🧵 1/11
Liquidity Goblin tweet media
English
14
13
259
27.5K
🅳🅾︎🅼🅴 retweetledi
机器之心 JIQIZHIXIN
机器之心 JIQIZHIXIN@jiqizhixin·
Huge! Recurrent neural networks could match Transformer memory without the quadratic burden! Ali Behrouz from Google and colleagues have cracked it! They present Memory Caching (MC), a simple yet powerful method that lets RNNs store "memory checkpoints" of their internal states. This allows their effective memory to grow with context, offering a flexible trade-off between speed and recall. MC dramatically enhances recurrent models in language modeling and long-context understanding. It significantly closes the performance gap with Transformers on recall tasks and outperforms existing state-of-the-art recurrent models.
机器之心 JIQIZHIXIN tweet media
English
14
87
465
37.2K
🅳🅾︎🅼🅴 retweetledi
Roan
Roan@RohOnChain·
This is the EXACT 12-step methodology Institutional quant desks use to win every single trade. Bookmark & run it through your stack or just pass it directly to your AI coding agent. Most people never reach this layer in their entire lifetime. Full breakdown in article below.
Roan tweet media
Roan@RohOnChain

x.com/i/article/2037…

English
21
137
1.1K
165.5K
🅳🅾︎🅼🅴
🅳🅾︎🅼🅴@dome_cs·
@plus_vision_div I noticed you also make whiteboards, so I wanted to ask—do you have any plans to bring this magnetic writing technology to whiteboards in the future? I think it could solve common issues like dried-out markers and stains from unwiped ink, while also being more eco-friendly.
English
1
0
0
20
プラス VISION事業部【公式】
#Kaiteで描いてみた おはようございます🌸 ひな祭りは過ぎましたが・・・ 美少女の三人官女を絵が上手な社員に描いてもらいました✨ 鉛筆のように描けるのでイラストにピッタリです✨ #企業公式が朝の挨拶を言い合う #Kaiteパッド
プラス VISION事業部【公式】 tweet media
プラス VISION事業部【公式】@plus_vision_div

今日はひな祭り🎎 ちょっとおひなさま落書き🌸 今日は何食べますか? #Kaiteパッド

日本語
3
0
7
219
🅳🅾︎🅼🅴 retweetledi
Google Research
Google Research@GoogleResearch·
Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI
GIF
English
1K
5.8K
39K
19.4M