Alex Speicher (@AlxSp_) - Twitter Profili | Zamantika Mersobahis Locabet

@evilmathkid You can now try bi-directional attention in the input since the model doesn’t have to predict it. That should help.

English

0

1

118

Mithil Vakde@evilmathkid·5 Mar

A big change -> No training on inputs Interestingly, this makes the test loss much worse and yet it scores better! Clearly a compression framework / val loss alone isn't a perfect metric for sample efficiency (I am bullish input training will make a comeback though)

English

3

1

40

13.3K

Mithil Vakde@evilmathkid·5 Mar

44% on ARC-AGI-1 in 67 cents! Trained from scratch in 2hrs on a 5090 Matches TRM, beats HRM and is way faster & cheaper No recursion, just a transformer Also, 7% on ARC-2 🧵

English

30

73

679

54.6K

Alex Speicher@AlxSp_·6 Şub

I explored @diyerxx's Discrete Distribution Networks in @__tinygrad__, a generative approach using discrete nodes instead of diffusion noise. Blog + code: alexanderspeicher.com/blog/discrete-… Paper: arxiv.org/abs/2401.00036

English

2

7

74

9.5K

Alex Speicher@AlxSp_·29 Ara

@Teknium That's a great insight that I haven't seen explicitly stated before, thanks

English

0

1

69

Teknium (e/λ)@Teknium·28 Ara

Reasoning in LLMs has actually broken at least one intuition about data that I thought I was confident in. Prior to reasoning models, there was a lot I could predict based on the data that went in, such as things like average output lengths and limits on how many tokens they'd generate. It used to be if you trained on outputs of 4k max, you'd have a near-0% chance to generate 10k+ tokens. But with reasoning models, they actually learn a function through this data that can make it generate way way way beyond what length of output tokens you trained on. I think this justifies calling it "reasoning", because it actually learned a function similar to reasoning, by generating tokens that look like thinking to improve accuracy until it is confident in it having found the correct answer, and even if you train on 10k cot tokens max, models will still think, potentially through the entire 128k+ context length it has. Something else interesting about "reasoning" is that we observed when scaling Hermes 4 from 14b, to 70b, to 405B, that the thinking lengths went down and down for the same set of problems as the model got bigger. This also implies that the reasoning process is very much tied to innate intelligence, because the problem is, relative to each model, a different difficulty, and it literally *thinks longer* if it is less intelligent! Just some fun facts for you on this Sunday :)

English

46

41

660

44.6K

Alex Speicher@AlxSp_·19 Ara

@yacinelearning @suchenzang Yeah, I would consider the test training of those methods in the same way as how large scale AR models are expected to learn new patterns and logic through in-context learning.

English

1

0

2

106

Yacine Mahdid@yacinelearning·19 Ara

@suchenzang I thought it was weird too at first but after diving into the methods of HRM + TRM + CompressARC it actually makes sense

English

2

0

5

953

Susan Zhang@suchenzang·19 Ara

we need more AI people to join community notes... kind of crazy how many amplified a plot that went into negative cost territory, with a thread about training on test

Mithil Vakde@evilmathkid

Announcing New Pareto Frontier on ARC-AGI 27.5% for just $2 333x cheaper than TRM! Beats every non-thinking LLM in existence Cost so low, its literally off the chart Vanilla transformer. No special architectures. Tiny. Trained in 2 hrs. Open source. Thread:

English

35

11

206

96.3K

Alex Speicher@AlxSp_·19 Ara

@_arohan_ From a quick skim of the blog, I would assume it’s the task embedding (similar to HRM, TRM) that causes this performance.

English

0

2

301

rohan anil@_arohan_·19 Ara

With so many years of work on arc-agi-1 plus hill climbing, and with many corps using it to showcase model capability, a simple transformer with the most obvious data representation has completely shaken up a cottage industry?

Mithil Vakde@evilmathkid

Announcing New Pareto Frontier on ARC-AGI 27.5% for just $2 333x cheaper than TRM! Beats every non-thinking LLM in existence Cost so low, its literally off the chart Vanilla transformer. No special architectures. Tiny. Trained in 2 hrs. Open source. Thread:

English

6

2

60

12.9K

Alex Speicher@AlxSp_·18 Kas

@Teknium Wondering the same about xAi as well

English

0

2

251

Teknium (e/λ)@Teknium·18 Kas

I wonder if openai has an entire team of people just working on improving arc-agi tasks

English

4

0

79

7K

Alex Speicher@AlxSp_·29 Eki

@francoisfleuret You could split x_t into two tokens {x_t_in, x_t+1_out}, where x_t_in’s kv is kept for the following tokens, and x_t+1_out is discarded after predicting the next token (not visible to any other tokens). Obviously isn’t very performant during training as it doubles the context

English

0

340

François Fleuret@francoisfleuret·29 Eki

I really don't like that in the first layers X_t should be the representation of token t and gradually becomes that of token t+1 in the last layer. It makes absolutely no sense, it is objectively repugnant.

English

30

7

157

17.5K

Alex Speicher@AlxSp_·24 Eki

@francoisfleuret @0xHenriksson Isn’t that basically life generally? Life maintains its level of low entropy in a small spot (its body) while emitting energy, which generally increases entropy as a whole

English

0

2

81

François Fleuret@francoisfleuret·24 Eki

@0xHenriksson I think I get you, but I find weird that you can dump radiation in a vacuum. You just send photons, that will travel forever and you reduced your own entropy?

English

9

0

3

882

François Fleuret@francoisfleuret·24 Eki

Something I don't understand in physics: Since you can emit radiation to cool down (e.g. earth), isn't it like reducing entropy *in a vacuum* ? How is that consistent with the second law of thermodynamics?

English

32

3

37

15.3K

Alex Speicher@AlxSp_·7 Eki

@kalomaze Best guess would be for diff-attention?

English

0

26

kalomaze@kalomaze·7 Eki

what the fuck is a qkqkv_proj

English

2

0

15

606

kalomaze@kalomaze·7 Eki

i felt this post in my soul

finbarr@finbarrtimbers

whenever someone asks me about what interesting ML papers I’ve read recently I start babbling incoherently them about the details of how VLLM works and then they stop asking me

English

1

0

71

5.7K

Alex Speicher@AlxSp_·7 Eki

@willccbb I don’t think flow charts are the best option. They look pretty at first but turn into a mess

English

0

134

will brown@willccbb·7 Eki

honestly i can see this being a smash hit

English

101

13

756

92.5K

Alex Speicher@AlxSp_·11 Eyl

@iScienceLuvr @SophontAI @KindredVentures @Delphi_Ventures @upfrontvc @AICONIC_VC @JeffDean @LoganKilpatrick @ClementDelangue @l2k Congrats!

English

0

1

74

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·11 Eyl

Excited to announce that @SophontAI has raised $9.22M in combined pre-seed+seed rounds! 🚀🔥 Led by @KindredVentures, with participation from @delphi_ventures @upfrontvc @AICONIC_VC also @jeffdean, @logankilpatrick, @ClementDelangue (via Factorial Capital), @l2k & others

English

90

43

528

159.6K

Alex Speicher@AlxSp_·3 Eyl

@iScienceLuvr I’m pretty happy with my Sony ones (WH 1000XM5)

English

0

5

300

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·3 Eyl

I am looking to get new noise-cancelling Bluetooth headphones with high-quality sound... does anyone have suggestions? since I don't have Apple products, anything apart from AirPods Max

English

38

0

39

23.3K

Alex Speicher@AlxSp_·4 Ağu

@hi_tysam @_arohan_ How much do you think the hierarchical part of the architecture actually improves it? Seems like injecting the input embed into each recurrent block stabilizes the training a lot already

English

1

0

1

46

Fern@hi_tysam·4 Ağu

@_arohan_ definitely having something that is latent-space structure invariant is the way to go IMO, by a long shot

English

1

0

2

145

rohan anil@_arohan_·4 Ağu

Reminds me of implicit-layers-tutorial.org

Guan Wang@makingAGI

🚀Introducing Hierarchical Reasoning Model🧠🤖 Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoning power on complex tasks like ARC-AGI and expert-level Sudoku using just 1k examples, no pretraining or CoT! Unlock next AI breakthrough with neuroscience. 🌟 📄Paper: arxiv.org/abs/2506.21734 💻Code: github.com/sapientinc/HRM

English

1

0

14

3.7K

Alex Speicher@AlxSp_·4 Ağu

@cloneofsimo @lineardiff Could be that just injecting the embed input into each recurrent block is what makes the arch work and the hierarchical part is just being “fancy”

English

0

1

34

Alex Speicher@AlxSp_·4 Ağu

@cloneofsimo @lineardiff Yeah, they do use the entire “train” part of both the train & eval sets in ARC. Plus data augmentation. To me, the main interesting part in their paper is if this hierarchical structure scales better than simpler architectures like arxiv.org/pdf/2502.05171

English

1

0

1

57

Alex Speicher retweetledi

Xuandong Zhao@xuandongzhao·27 May

🚀 Excited to share the most inspiring work I’ve been part of this year: "Learning to Reason without External Rewards" TL;DR: We show that LLMs can learn complex reasoning without access to ground-truth answers, simply by optimizing their own internal sense of confidence. 1/n

English

86

501

3.5K

572.8K

Alex Speicher@AlxSp_·23 May

@iScienceLuvr Thank you for the inspiring talk! I had a great time at the hackathon and hopefully we’ll cross paths again at some other hackathon/meetup in the future

English

0

1

49

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·23 May

I'm glad that my talk inspired a team to build a medical RL environment that won 2nd place!

Nous Research@NousResearch

Nous Research's RL Environments Hackathon recap thread! Starting with the stars of the show, the winners! Top 3 for the subjective track were: 1st - Pokemon Trainer - by @iyajainfinity & @AlexReibman 2nd - VR-CLImax by @JakeABoggs 3rd - DynastAI by David van Vliet and @SRacoon23 Top 3 for the objective track were: 1st - CyberMaxxing by @1999_karthik 2nd - HelpfulDoctors by @tsadpbb, Nilesh Shah, Max Phelps, and Alexander Speicher 3rd - Physical RL by @nullref0 and @venkatacrc Another special shout out to our partners, @xai, @MistralAI, @nvidia, @tensorstax, @akashnet, @nebiusai, @runpod, @daytonaio, @morph_labs, @LambdaAPI and @Tesla As well as our many judges from @arcee_ai, @axolotl_ai, @cursor_ai, @latentspacepod, @MIT, @togethercompute, @haizelabs, @SophontAI, @EdgeAGI, @Google, specifically: @AlpayAriyak, @winglian, Samuel Barry, @tmm1, @keirp1, @swyx, @teknium, @karan4d, Meghana Puvvadi, @arattml, @brianlechthaler, Josh May, Alex Gu, @gordic_aleksa, @AlpayAriyak, @eraqian, @LukePiette, Rohan Rao, @chargoddard, @LoganGrasby, @xennygrimmato_, @zhangir_azerbay, @rogershijin, @max_paperclips, @theemozilla, and Abhinav Balasubramanian

English

6

96

20.1K

Alex Speicher retweetledi

Nous Research@NousResearch·22 May

Nous Research's RL Environments Hackathon recap thread! Starting with the stars of the show, the winners! Top 3 for the subjective track were: 1st - Pokemon Trainer - by @iyajainfinity & @AlexReibman 2nd - VR-CLImax by @JakeABoggs 3rd - DynastAI by David van Vliet and @SRacoon23 Top 3 for the objective track were: 1st - CyberMaxxing by @1999_karthik 2nd - HelpfulDoctors by @tsadpbb, Nilesh Shah, Max Phelps, and Alexander Speicher 3rd - Physical RL by @nullref0 and @venkatacrc Another special shout out to our partners, @xai, @MistralAI, @nvidia, @tensorstax, @akashnet, @nebiusai, @runpod, @daytonaio, @morph_labs, @LambdaAPI and @Tesla As well as our many judges from @arcee_ai, @axolotl_ai, @cursor_ai, @latentspacepod, @MIT, @togethercompute, @haizelabs, @SophontAI, @EdgeAGI, @Google, specifically: @AlpayAriyak, @winglian, Samuel Barry, @tmm1, @keirp1, @swyx, @teknium, @karan4d, Meghana Puvvadi, @arattml, @brianlechthaler, Josh May, Alex Gu, @gordic_aleksa, @AlpayAriyak, @eraqian, @LukePiette, Rohan Rao, @chargoddard, @LoganGrasby, @xennygrimmato_, @zhangir_azerbay, @rogershijin, @max_paperclips, @theemozilla, and Abhinav Balasubramanian

English

23

42

396

64.8K

Alex Speicher retweetledi

Yisong Yue@yisongyue·19 Nis

One of my PhD students got their visa revoked. I know of other cases amongst my AI colleagues. This is not what investing in US leadership in AI looks like.

Antoine Levy@LevyAntoine

This is flying a bit under the radar. But in terms of damage to America’s innovation and knowledge supremacy, the chilling effect of these revocations on the country’s ability to attract and retain scientific talent likely dwarfs the impact of tariffs or other policies.

English

38

163

1.1K

312.8K

Alex Speicher retweetledi

Isaac Liao@LiaoIsaac91893·4 Mar

Introducing *ARC‑AGI Without Pretraining* – ❌ No pretraining. ❌ No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. 🧵 1/4

GIF

English

36

181

1.3K

225.9K

Alex Speicher

Keşfet