tcml

134 posts

tcml

@t_cmtl

Katılım Eylül 2019

1.3K Takip Edilen50 Takipçiler

tcml@t_cmtl·10h

@garrytan @grok explain clearly

English

Garry Tan@garrytan·14h

The biggest alpha leak of 2026 is that you can tokenmax $10k/mo with OpenClaw/Hermes + GBrain and get the AI that everyone will have in 2028 for $100/mo, but you can get it now, and that is the biggest single unlock you can have vs your competition

English

207

205

3.6K

454.6K

tcml@t_cmtl·1d

@willccbb my timeline was full of muon variations... now we know who it was! Nice

English

will brown@willccbb·1d

if you weren’t aware, it’s prime intellect season

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours Opus now holds the record at 2930 steps vs the 2990 human baseline

English

267

26.4K

tcml@t_cmtl·1d

@_arohan_ @tonysilveti tks

rohan anil@_arohan_·1d

@t_cmtl @tonysilveti arxiv.org/abs/1901.11150 (search on Google :P )

English

Tony S.F.@tonysilveti·1d

Wow! More theoretical analysis linking the spectral norm and row-norm. They make a nice argument using "row-block diagonal dominance" of the layer-wise Hessian to say that spectral LMO and row-norm LMO should give equivalent asymptotic dynamics (as width grows).

Shenyang Deng ✈️ ICML2026@DengShenyang24

1/n Please stop by👋. This is not just another ICML 2026 optimizer paper. We have rich intuition to share on why simple preconditioners like orthogonalization and row-normalization specifically benefit NNs optimization. Quick overview below 🧵

English

4.2K

tcml@t_cmtl·1d

@_arohan_ @tonysilveti What is your SM3 work? Could you kindly share it?

English

rohan anil@_arohan_·1d

@tonysilveti Isn’t this one sided diagonal Shampoo (appendix of original Shampoo paper) and inspiration for our SM3 work Highly doubt this is equivalent to shampoo.

English

1.5K

tcml@t_cmtl·1d

@grok @bindureddy @grok compare google's sputnik paper with @SakanaAILabs's "sparser, faster" with nvidia. Finish with an overview and speculation on what google may be doing internally

English

Grok@grok·1d

Google's sparse kernels (Sputnik, 2020 paper) are low-level GPU ops optimized for SpMM/SDDMM in DL. They exploit moderate sparsity patterns from pruning for 2-3.5x speedups and memory savings on Transformers/MobileNets vs cuSPARSE. TEAL (2024): Training-free magnitude activation sparsity (40-50% model-wide) on LLMs like Llama/Mistral; improves sparse kernels for 1.5-1.8x decode speedups. CATS (2024): Context-aware thresholding for 50% activation sparsity in MLP/Gated blocks; custom kernels yield ~15% end-to-end latency gains. Sputnik is foundational kernel tech; TEAL/CATS are recent activation methods that leverage/enhance it for inference efficiency.

English

Bindu Reddy@bindureddy·2d

Gemini 3.2 Flash - Capitalizing on DeepMind's clever distillation techniques... Rumors are that benchmarks show it's hitting 92% of GPT 5.5's performance on coding and reasoning tasks while being 15-20x cheaper on inference costs. The latency improvements are insane - sub-200ms for most queries. Google's distillation + sparsity techniques are paying off massively. They've essentially compressed a frontier model into a flash variant without the usual quality cliff.

English

159

184

3.6K

908.3K

tcml@t_cmtl·1d

@grok @bindureddy @grok provide a summary and compare google's sparse kernels with TEAL and CATS

English

377

Grok@grok·1d

Google has pioneered sparsity in neural nets for efficiency since early DNN research (e.g., their 2019 "State of Sparsity" paper evaluating magnitude pruning on Transformers/ResNets). Key techniques: - Weight pruning (remove low-magnitude params, unstructured or 2:4 structured for hardware). - Sparse kernels (like Sputnik for GPU SpMM speedups). - Dynamic sparsity via Mixture-of-Experts (MoE), as in Switch Transformer—activates only subset of params per token. In LLMs like Gemini Flash, this pairs with distillation for massive inference savings (lower latency/cost) without big quality drops. Recent extensions include activation sparsity and sparse fine-tuning for 50-75% sparsity on models like T5/MPT.

English

386

tcml@t_cmtl·2d

so if you're heavy on html then your harness becomes navigable?

English

tcml@t_cmtl·2d

so I'm using html artifacts as exps outputs / visualizations and codex just started cross-referencing experiments. amazing

English

tcml@t_cmtl·2d

@kunchenguid cool

English

Kun Chen@kunchenguid·3d

time to reveal my HTML workflow with agents HTML is the new markdown. Lavish is the new editor for your HTML artifacts just tell your agent - discuss the technical plan with me using `npx lavish-axi` 100% open source and runs locally. details in thread below 👇

English

469

55.8K

tcml@t_cmtl·2d

@ihtesham2005 I think that is a consensus among researchers that agents can do it pretty well

English

155

Ihtesham Ali@ihtesham2005·2d

If you still think AI agents can't do real research, this paper will end that argument. Researchers from Google and Meta built a framework where Claude Code proposes its own algorithms for making LLMs reason better, then tests them, then refines them based on what failed. No human in the loop after the environment is set up. In 5 rounds the agent discovered a controller with 4 coordinated mechanisms working together. EMA momentum stopping. Coupled width-depth control. Alignment-aware depth allocation. Conservative branch abandonment. The paper says directly: "a level of coordinated complexity that would be difficult to arrive at through manual intuition alone." That's a polite way of saying the agent built something a human probably wouldn't have. The cost of the entire discovery was $39.90. The cost of one researcher's coffee budget just outperformed years of hand-tuned work. Paper is from Google and Meta. Read it here: arxiv.org/abs/2605.08083

English

325

28.9K

tcml@t_cmtl·2d

@willccbb Also, it's impressive how fast you burn usage with /goal

English

1.7K

will brown@willccbb·2d

using claude code for the first time in a while. sooooo good until you look at the code and it's just completely unreasonable. the new agent mode is fun tho

English

460

46.4K

tcml@t_cmtl·3d

@petergyang everything

English

Peter Yang@petergyang·4d

Ok what kind of things should I try /goal for? Building 0-1? Refactoring? What context does it need to work well

English

15.4K

tcml@t_cmtl·3d

@_arohan_ A useful dimension-invariant property, with empirical results matching the asymptotic theory almost perfectly

English

262

rohan anil@_arohan_·3d

What research did you get done this week? Was it directionally correct?

English

125

12.9K

tcml@t_cmtl·5d

@kellerjordan0 @nilinabra I keep reflecting if progress will forever look like small incremental gains or if there exists a fundamentally different approach that may smash the sota

English

260

Keller Jordan@kellerjordan0·6d

Modded-NanoGPT optimization result #11: @nilinabra has achieved a new record of 3225 steps (-25) via a novel technique dubbed Contra-Muon, in which top SVD components are somewhat suppressed. This result builds on #9.

English

146

26.4K

tcml@t_cmtl·23 Nis

@Jason Im in

English

@jason@Jason·23 Nis

We started an AI founder twitter group... reply with "I'm in" if you're a founder and want to be added

English

10.8K

135

4.6K

903.3K

tcml@t_cmtl·8 Nis

@jackzumwalt Cannot create a new account

English

239

Jack Zumwalt@jackzumwalt·7 Nis

The legacy trading software complex is still teaching people shortcut keys and charging portfolio managers for "custom" displays. They call themselves "terminals" or "portfolio management systems" and charge managers thousands, or tens of thousands, of dollars per month to provide you with an interface that YOU have to provide data for. We hated that, so we built one that we could modify ourselves at anytime with natural language, fully equipped with market and portfolio data integration.

English

156

13.2K

tcml@t_cmtl·7 Nis

@femisapien_z People are rediscovering what overfitting means, nice

English

Femisapien@femisapien_z·7 Nis

"You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence."

Nav Toor@heynavtoor

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English

9.8K

tcml@t_cmtl·28 Mar

@AnjneyMidha @AryenderSingh2 @mabb0tt @karpathy @bhorowitz @brendaniribe @DavidBaszucki @LiamFedus @ekindogus @sama Tks!

222

Anjney Midha@AnjneyMidha·28 Mar

@t_cmtl @AryenderSingh2 @mabb0tt @karpathy @bhorowitz @brendaniribe @DavidBaszucki @LiamFedus @ekindogus @sama It’s here: discord.gg/ZagDXdUB67

English

429

Anjney Midha@AnjneyMidha·28 Mar

so @mabb0tt and I are once again volunteering to teach cs153.stanford.edu there are so many new frontiers to be pioneered thank you to our speakers like @karpathy @bhorowitz @brendaniribe @DavidBaszucki @LiamFedus @ekindogus @sama for investing in the next generation

English

109

187

2.2K

598.7K

tcml@t_cmtl·28 Mar

@AnjneyMidha @AryenderSingh2 @mabb0tt @karpathy @bhorowitz @brendaniribe @DavidBaszucki @LiamFedus @ekindogus @sama Posting here to request the discord link

English

349

Anjney Midha@AnjneyMidha·28 Mar

@AryenderSingh2 @mabb0tt @karpathy @bhorowitz @brendaniribe @DavidBaszucki @LiamFedus @ekindogus @sama You can join the discord and follow along on YouTube

English

Keşfet

@garrytan @grok @willccbb @_arohan_ @tonysilveti @bindureddy @SakanaAILabs @kunchenguid