nullptr

1.1K posts

nullptr

@resetptr

Beigetreten Ocak 2012

382 Folgt771 Follower

nullptr@resetptr·2d

@Madbonze16 @prajdabre @SarvamAI penalizing long reasoning traces must be common tho irrespective of compute constraints?

English

Shourya Jain@Madbonze16·2d

@resetptr @prajdabre @SarvamAI This might be down to maximum reasoning tokens they specified during training, which might be because they didn't have a lot of compute to allow it a lot of reasoning tokens

English

nullptr@resetptr·3d

ran some quick weekend experiments on @SarvamAI's 105B model on a subset of the IndicMMLU-Pro dataset Sarvam's model is really good at reasoning efficiency. uses ~2.5x less tokens to reach ~same accuracy

English

2.4K

nullptr@resetptr·2d

yeah that was my first guess after reading their blog post but the reasoning is less verbose which efficient tokenization doesn't really explain for eg, Sarvam's vs GLM's reasoning excerpt (which continues on beyond this screenshot) for a question on cramer's rule (correction - i should've said chars not tokens)

English

Raj Dabre@prajdabre·2d

@resetptr @SarvamAI Fewer* Its because of the tokenizer.

English

199

nullptr@resetptr·3d

@garybasin @zack_overflow technically you could tho, right? gumbel softmax etc, although maybe not as well or efficiently

English

Gary Basin@garybasin·3d

@zack_overflow Just can’t train it with gradients

English

669

zack@zack_overflow·3d

Feels like the coolest part of this research is not "we put computer in LLM" but making attention scale logarithmically?

Christos Tzamos@ChristosTzamos

1/4 LLMs solve research grade math problems but struggle with basic calculations. We bridge this gap by turning them to computers. We built a computer INSIDE a transformer that can run programs for millions of steps in seconds solving even the hardest Sudokus with 100% accuracy

English

179

19.6K

nullptr@resetptr·3d

@zack_overflow i wonder what the intermediate steps of the hard coded transformers look like. will certain paths "look wrong" in the gradients that you could then learn and then pause in between?

English

331

nullptr@resetptr·3d

@ChristosTzamos super cool! any plans to share the weights?

English

Christos Tzamos@ChristosTzamos·12 Mar

English

239

787

5.9K

1.6M

nullptr@resetptr·3d

@silicognition include the you're reading papers in a subfolder. use md files to review and discuss ideas, track references, etc

English

nullptr@resetptr·3d

@silicognition claude code has been really helpful for this keep a folder with your venv, claude.md etc set up and ask cc to create new jupyter notebooks based on ideas you want to explore

English

934

silicognition (blue tick here)@silicognition·3d

people who are doing research, how do you go from reading papers & ideation to getting down to something concrete which can be actually done? i have ideas, read a lot of papers but from a fuzzy cloud of insights & inspirations, i would like to get to the finish line help pls!

English

136

1.9K

58.7K

nullptr@resetptr·3d

@tenobrus corollary is even if you already have set up strict mode / structured outputs it wouldn't drop performance / diversity that much i find tool definitions allow me to separate / organize descriptions, examples, etc better

English

209

Tenobrus@tenobrus·3d

fyi that neither codex nor claude code enable strict mode / structured outputs in their agent loops. u can check the source or intercept requests. they just rely on the models to make valid tool calls without grammar enforcement. so if they're not doing it, why are you?

English

313

19K

nullptr@resetptr·3d

sidenote: sarvam's APIs are kinda flaky, repeated 504 gateway errors which required multiple retries. i'm sure this'll get better with time tho. great job!

English

148

nullptr@resetptr·3d

all 4 are within ~2% accuracy of each other reasoning is in English tho even when prompted in Indic languages which was interesting. will spend some more time exploring why sarvam’s so much more token-efficient (it's prolly, ((most definitely)) data)

English

168

nullptr@resetptr·5d

@yule_gan @yacinelearning thanks for sharing! Shao et al immediately sprung up in my mind when i read your post

English

Yulu Gan@yule_gan·5d

@resetptr @yacinelearning Actually our findings help explain the results in the paper you mentioned.

English

Yacine Mahdid@yacinelearning·6d

I’m sooooo going to cover this guys this is borderline blasphemous

Yulu Gan@yule_gan

Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt. To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs. What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets. Paper: arxiv.org/pdf/2603.12228 Code: github.com/sunrainyg/Rand… Website: thickets.mit.edu

English

240

42.4K

nullptr@resetptr·5d

@Mihonarium @NeelNanda5 @OwainEvans_UK it is, but having it formalized and properly defined helps develop a lexicon to discuss/research it

English

169

Mikhail Samin@Mihonarium·6d

@NeelNanda5 @OwainEvans_UK I’m so confused about what is the development here. Isn’t that just obviously what LLMs do? Like, the default way they think when without chain of thought?

English

1.8K

Neel Nanda@NeelNanda5·6d

Out of context reasoning is one of the most fascinating developments in the science of how LLMs work. This primer by @OwainEvans_UK, one of the main discoverers of the phenomena, is a great introduction

English

711

75K

nullptr@resetptr·6d

@infoxiao @kchonyc @karpathy i once had o3 produce consistently bad results until i realized it was because i was using the plural of a word instead of singular

English

Xiao Ma@infoxiao·6d

@kchonyc @karpathy i once measured if using 'i' or 'you' made a difference for gemini. it did not. i was disappointed.

English

16.5K

Kyunghyun Cho@kchonyc·13 Mar

thanks to @karpathy , now i have cracked the mystery why my agent doesn't follow my instruction closely enough.

English

103

177

3.7K

785.9K

nullptr@resetptr·6d

@adocomplete is this also supported on claude code?

English

Ado@adocomplete·25 Şub

28 Days of Claude API - Day 24 - PDF Support Send a PDF to the API. Get answers about the content. Every page is processed as both text & image so nothing gets missed. Three ways to use: URL, base64, or Files API file_id. No preprocessing. No custom parsers. Just send & ask.

English

1.9K

nullptr@resetptr·6d

@yacinelearning had to dig deep for this one x.com/StellaLisy/sta…

Stella Li@StellaLisy

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

English

163

nullptr@resetptr·6d

@yacinelearning reminds me of arxiv.org/pdf/2506.10947

English

1.5K

nullptr@resetptr·6d

#multi-context-window-workflows" target="_blank" rel="nofollow noopener">platform.claude.com/docs/en/build-…

ZXX

nullptr@resetptr·6d

TIL anthropic has specific suggestions for long-running tasks. starting new instead of compacting convos is a suggestion! need to try it out some more

English

nullptr@resetptr·12 Mar

welp

English

nullptr@resetptr·9 Mar

whatever happened to moltbook?

English

126

Entdecken

@Madbonze16 @prajdabre @SarvamAI @garybasin @zack_overflow @ChristosTzamos @silicognition @tenobrus