nullptr

1.1K posts

nullptr banner
nullptr

nullptr

@resetptr

加入时间 Ocak 2012
382 关注771 粉丝
Shourya Jain
Shourya Jain@Madbonze16·
@resetptr @prajdabre @SarvamAI This might be down to maximum reasoning tokens they specified during training, which might be because they didn't have a lot of compute to allow it a lot of reasoning tokens
English
1
0
0
39
nullptr
nullptr@resetptr·
ran some quick weekend experiments on @SarvamAI's 105B model on a subset of the IndicMMLU-Pro dataset Sarvam's model is really good at reasoning efficiency. uses ~2.5x less tokens to reach ~same accuracy
nullptr tweet media
English
2
4
35
2.4K
nullptr
nullptr@resetptr·
yeah that was my first guess after reading their blog post but the reasoning is less verbose which efficient tokenization doesn't really explain for eg, Sarvam's vs GLM's reasoning excerpt (which continues on beyond this screenshot) for a question on cramer's rule (correction - i should've said chars not tokens)
nullptr tweet medianullptr tweet media
English
1
0
0
44
nullptr
nullptr@resetptr·
@garybasin @zack_overflow technically you could tho, right? gumbel softmax etc, although maybe not as well or efficiently
English
1
0
4
83
nullptr
nullptr@resetptr·
@zack_overflow i wonder what the intermediate steps of the hard coded transformers look like. will certain paths "look wrong" in the gradients that you could then learn and then pause in between?
English
0
0
1
331
Christos Tzamos
Christos Tzamos@ChristosTzamos·
1/4 LLMs solve research grade math problems but struggle with basic calculations. We bridge this gap by turning them to computers. We built a computer INSIDE a transformer that can run programs for millions of steps in seconds solving even the hardest Sudokus with 100% accuracy
English
239
787
5.9K
1.6M
nullptr
nullptr@resetptr·
@silicognition include the you're reading papers in a subfolder. use md files to review and discuss ideas, track references, etc
English
0
0
3
45
nullptr
nullptr@resetptr·
@silicognition claude code has been really helpful for this keep a folder with your venv, claude.md etc set up and ask cc to create new jupyter notebooks based on ideas you want to explore
English
1
0
8
934
silicognition (blue tick here)
people who are doing research, how do you go from reading papers & ideation to getting down to something concrete which can be actually done? i have ideas, read a lot of papers but from a fuzzy cloud of insights & inspirations, i would like to get to the finish line help pls!
English
71
136
1.9K
58.7K
nullptr
nullptr@resetptr·
@tenobrus corollary is even if you already have set up strict mode / structured outputs it wouldn't drop performance / diversity that much i find tool definitions allow me to separate / organize descriptions, examples, etc better
English
0
0
3
209
Tenobrus
Tenobrus@tenobrus·
fyi that neither codex nor claude code enable strict mode / structured outputs in their agent loops. u can check the source or intercept requests. they just rely on the models to make valid tool calls without grammar enforcement. so if they're not doing it, why are you?
English
18
3
313
19K
nullptr
nullptr@resetptr·
sidenote: sarvam's APIs are kinda flaky, repeated 504 gateway errors which required multiple retries. i'm sure this'll get better with time tho. great job!
English
1
0
5
148
nullptr
nullptr@resetptr·
all 4 are within ~2% accuracy of each other reasoning is in English tho even when prompted in Indic languages which was interesting. will spend some more time exploring why sarvam’s so much more token-efficient (it's prolly, ((most definitely)) data)
nullptr tweet media
English
1
0
4
168
Mikhail Samin
Mikhail Samin@Mihonarium·
@NeelNanda5 @OwainEvans_UK I’m so confused about what is the development here. Isn’t that just obviously what LLMs do? Like, the default way they think when without chain of thought?
English
3
0
37
1.8K
Neel Nanda
Neel Nanda@NeelNanda5·
Out of context reasoning is one of the most fascinating developments in the science of how LLMs work. This primer by @OwainEvans_UK, one of the main discoverers of the phenomena, is a great introduction
Neel Nanda tweet media
English
28
52
711
75K
nullptr
nullptr@resetptr·
@infoxiao @kchonyc @karpathy i once had o3 produce consistently bad results until i realized it was because i was using the plural of a word instead of singular
English
0
0
0
83
Xiao Ma
Xiao Ma@infoxiao·
@kchonyc @karpathy i once measured if using 'i' or 'you' made a difference for gemini. it did not. i was disappointed.
English
4
0
49
16.5K
Kyunghyun Cho
Kyunghyun Cho@kchonyc·
thanks to @karpathy , now i have cracked the mystery why my agent doesn't follow my instruction closely enough.
Kyunghyun Cho tweet media
English
103
177
3.7K
785.9K
Ado
Ado@adocomplete·
28 Days of Claude API - Day 24 - PDF Support Send a PDF to the API. Get answers about the content. Every page is processed as both text & image so nothing gets missed. Three ways to use: URL, base64, or Files API file_id. No preprocessing. No custom parsers. Just send & ask.
Ado tweet media
English
2
1
16
1.9K
nullptr
nullptr@resetptr·
#multi-context-window-workflows" target="_blank" rel="nofollow noopener">platform.claude.com/docs/en/build-…
ZXX
0
0
0
27
nullptr
nullptr@resetptr·
TIL anthropic has specific suggestions for long-running tasks. starting new instead of compacting convos is a suggestion! need to try it out some more
nullptr tweet media
English
1
0
2
57
nullptr
nullptr@resetptr·
welp
nullptr tweet media
English
0
0
1
19
nullptr
nullptr@resetptr·
whatever happened to moltbook?
English
1
0
1
126