tcml

134 posts

tcml

tcml

@t_cmtl

Katılım Eylül 2019
1.3K Takip Edilen50 Takipçiler
Garry Tan
Garry Tan@garrytan·
The biggest alpha leak of 2026 is that you can tokenmax $10k/mo with OpenClaw/Hermes + GBrain and get the AI that everyone will have in 2028 for $100/mo, but you can get it now, and that is the biggest single unlock you can have vs your competition
English
207
205
3.6K
454.6K
tcml
tcml@t_cmtl·
@willccbb my timeline was full of muon variations... now we know who it was! Nice
English
0
0
1
89
Tony S.F.
Tony S.F.@tonysilveti·
Wow! More theoretical analysis linking the spectral norm and row-norm. They make a nice argument using "row-block diagonal dominance" of the layer-wise Hessian to say that spectral LMO and row-norm LMO should give equivalent asymptotic dynamics (as width grows).
Shenyang Deng ✈️ ICML2026@DengShenyang24

1/n Please stop by👋. This is not just another ICML 2026 optimizer paper. We have rich intuition to share on why simple preconditioners like orthogonalization and row-normalization specifically benefit NNs optimization. Quick overview below 🧵

English
2
2
33
4.2K
rohan anil
rohan anil@_arohan_·
@tonysilveti Isn’t this one sided diagonal Shampoo (appendix of original Shampoo paper) and inspiration for our SM3 work Highly doubt this is equivalent to shampoo.
rohan anil tweet mediarohan anil tweet media
English
2
1
22
1.5K
tcml
tcml@t_cmtl·
@grok @bindureddy @grok compare google's sputnik paper with @SakanaAILabs's "sparser, faster" with nvidia. Finish with an overview and speculation on what google may be doing internally
English
1
0
0
78
Grok
Grok@grok·
Google's sparse kernels (Sputnik, 2020 paper) are low-level GPU ops optimized for SpMM/SDDMM in DL. They exploit moderate sparsity patterns from pruning for 2-3.5x speedups and memory savings on Transformers/MobileNets vs cuSPARSE. TEAL (2024): Training-free magnitude activation sparsity (40-50% model-wide) on LLMs like Llama/Mistral; improves sparse kernels for 1.5-1.8x decode speedups. CATS (2024): Context-aware thresholding for 50% activation sparsity in MLP/Gated blocks; custom kernels yield ~15% end-to-end latency gains. Sputnik is foundational kernel tech; TEAL/CATS are recent activation methods that leverage/enhance it for inference efficiency.
English
1
0
0
73
Bindu Reddy
Bindu Reddy@bindureddy·
Gemini 3.2 Flash - Capitalizing on DeepMind's clever distillation techniques... Rumors are that benchmarks show it's hitting 92% of GPT 5.5's performance on coding and reasoning tasks while being 15-20x cheaper on inference costs. The latency improvements are insane - sub-200ms for most queries. Google's distillation + sparsity techniques are paying off massively. They've essentially compressed a frontier model into a flash variant without the usual quality cliff.
English
159
184
3.6K
908.3K
tcml
tcml@t_cmtl·
@grok @bindureddy @grok provide a summary and compare google's sparse kernels with TEAL and CATS
English
1
0
0
377
Grok
Grok@grok·
Google has pioneered sparsity in neural nets for efficiency since early DNN research (e.g., their 2019 "State of Sparsity" paper evaluating magnitude pruning on Transformers/ResNets). Key techniques: - Weight pruning (remove low-magnitude params, unstructured or 2:4 structured for hardware). - Sparse kernels (like Sputnik for GPU SpMM speedups). - Dynamic sparsity via Mixture-of-Experts (MoE), as in Switch Transformer—activates only subset of params per token. In LLMs like Gemini Flash, this pairs with distillation for massive inference savings (lower latency/cost) without big quality drops. Recent extensions include activation sparsity and sparse fine-tuning for 50-75% sparsity on models like T5/MPT.
English
1
0
0
386
tcml
tcml@t_cmtl·
so if you're heavy on html then your harness becomes navigable?
English
0
0
1
15
tcml
tcml@t_cmtl·
so I'm using html artifacts as exps outputs / visualizations and codex just started cross-referencing experiments. amazing
English
0
0
2
24
Kun Chen
Kun Chen@kunchenguid·
time to reveal my HTML workflow with agents HTML is the new markdown. Lavish is the new editor for your HTML artifacts just tell your agent - discuss the technical plan with me using `npx lavish-axi` 100% open source and runs locally. details in thread below 👇
English
32
38
469
55.8K
tcml
tcml@t_cmtl·
@ihtesham2005 I think that is a consensus among researchers that agents can do it pretty well
English
0
0
1
155
Ihtesham Ali
Ihtesham Ali@ihtesham2005·
If you still think AI agents can't do real research, this paper will end that argument. Researchers from Google and Meta built a framework where Claude Code proposes its own algorithms for making LLMs reason better, then tests them, then refines them based on what failed. No human in the loop after the environment is set up. In 5 rounds the agent discovered a controller with 4 coordinated mechanisms working together. EMA momentum stopping. Coupled width-depth control. Alignment-aware depth allocation. Conservative branch abandonment. The paper says directly: "a level of coordinated complexity that would be difficult to arrive at through manual intuition alone." That's a polite way of saying the agent built something a human probably wouldn't have. The cost of the entire discovery was $39.90. The cost of one researcher's coffee budget just outperformed years of hand-tuned work. Paper is from Google and Meta. Read it here: arxiv.org/abs/2605.08083
Ihtesham Ali tweet media
English
24
79
325
28.9K
tcml
tcml@t_cmtl·
@willccbb Also, it's impressive how fast you burn usage with /goal
English
1
0
2
1.7K
will brown
will brown@willccbb·
using claude code for the first time in a while. sooooo good until you look at the code and it's just completely unreasonable. the new agent mode is fun tho
English
29
1
460
46.4K
Peter Yang
Peter Yang@petergyang·
Ok what kind of things should I try /goal for? Building 0-1? Refactoring? What context does it need to work well
English
24
0
38
15.4K
tcml
tcml@t_cmtl·
@_arohan_ A useful dimension-invariant property, with empirical results matching the asymptotic theory almost perfectly
tcml tweet media
English
0
0
4
262
rohan anil
rohan anil@_arohan_·
What research did you get done this week? Was it directionally correct?
English
26
1
125
12.9K
tcml
tcml@t_cmtl·
@kellerjordan0 @nilinabra I keep reflecting if progress will forever look like small incremental gains or if there exists a fundamentally different approach that may smash the sota
English
0
0
0
260
Keller Jordan
Keller Jordan@kellerjordan0·
Modded-NanoGPT optimization result #11: @nilinabra has achieved a new record of 3225 steps (-25) via a novel technique dubbed Contra-Muon, in which top SVD components are somewhat suppressed. This result builds on #9.
Keller Jordan tweet mediaKeller Jordan tweet media
English
4
9
146
26.4K
@jason
@jason@Jason·
We started an AI founder twitter group... reply with "I'm in" if you're a founder and want to be added
English
10.8K
135
4.6K
903.3K
Jack Zumwalt
Jack Zumwalt@jackzumwalt·
The legacy trading software complex is still teaching people shortcut keys and charging portfolio managers for "custom" displays. They call themselves "terminals" or "portfolio management systems" and charge managers thousands, or tens of thousands, of dollars per month to provide you with an interface that YOU have to provide data for. We hated that, so we built one that we could modify ourselves at anytime with natural language, fully equipped with market and portfolio data integration.
English
6
7
156
13.2K
tcml
tcml@t_cmtl·
@femisapien_z People are rediscovering what overfitting means, nice
English
0
0
1
95
Femisapien
Femisapien@femisapien_z·
"You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence."
Nav Toor@heynavtoor

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English
7
3
57
9.8K