Stefano

1.6K posts

Stefano banner
Stefano

Stefano

@maeste

Code climber - no problem with strong opinions - https://t.co/mU1It6LAKI - https://t.co/wslalil0Wo -

Cremona, Lombardia Katılım Ekim 2008
354 Takip Edilen365 Takipçiler
Sabitlenmiş Tweet
Stefano
Stefano@maeste·
My entire coding setup: one terminal. No IDE. No browser. No Electron eating 16GB of RAM. LINCE is a terminal-native multi-agent workstation. Sandbox, dashboard, voice input, session persistence. Minimal dependencies. lince.sh
Stefano tweet media
English
0
1
1
163
Stefano
Stefano@maeste·
@antirez I double checked and you Are right. Flash is not distillation of Pro. 2 different models with distinct pre and post training. Diatillation is, as you said, intra model only to align routing to the experts in the MoE architecture
English
0
0
0
16
antirez
antirez@antirez·
@maeste I think the distillation is intra-model. They train several purpose-built variants for math, programming, ... and so forth and use those to build the final model.
English
2
0
0
63
antirez
antirez@antirez·
Exactly, I found myself for the first time *ever* to talk to a model that can run on my Computer about the random things you could ask to Claude. Like history and other stuff. I also did a benchmark on Italian historical facts wth Qwen 27B vs DeepSeek v4 Flash 2 bit quants (continue)
Armin Ronacher ⇌@mitsuhiko

A nice thing about DeepSeek V4 Flash locally is that it’s a big enough model that you can have it explain shit to you and it won’t completely lie to you. Tried to walk through some choices in ds4.c and I felt pretty good about the experience.

English
14
9
272
28.9K
Stefano
Stefano@maeste·
@antirez Different version of 1.6T though. Or are you saying flash is trained from scratch, with different traing of 1.6T. Sounds strange because it's anti economical. I need to read again the paper
English
0
0
0
16
Stefano
Stefano@maeste·
@antirez Well in theory they could be very very lucky in the diatillation and get. >50%, but with 1.6T is like win in the casino
English
0
0
0
9
Stefano
Stefano@maeste·
@antirez Possibly, even if it sounds strange if what I remember is correct: flash is a policy distillation of the pro (just from. Top of my head, need to check). And if it is the pure RL should happen only on the big one and distilled to flush. So theory is 50% as well, or even a bit less
English
2
0
0
55
Stefano
Stefano@maeste·
@antirez Sure because w/ smaller one you need to decide what to. Give it in the pretraing
English
0
0
0
9
antirez
antirez@antirez·
@maeste Knowledge depends a lot on size. But knowledge has very vast effects also on capabilities, because the model can sample from many ideas / patterns / algorithms. DeepSeek v4 Flash is strong on both the sides.
English
2
0
0
130
Stefano
Stefano@maeste·
@antirez That said ds4 is still preview, which means the RL phase is not finished yet. If I well remember in their announce paper they say around 50% which is very very promising in terms of capabilities for the final
English
1
0
0
52
Stefano
Stefano@maeste·
@antirez Well you are testing knowledge compression, while most benchmarks test capabilities. If on one hand knowledge is important, many researchs have evidence capabilities depends more on RL phase, while knowledge is more in the realm of pretreanig
English
2
0
0
44
Stefano
Stefano@maeste·
@simonw This is another great use case for lince.sh: spawn your agent(s) with "n" -> close them as soon as they finish the task with "x" -> spawn a new one with "n" -> repeat. Keep focus: all in milliseconds in the same terminal Save memory, work w/ any agent, sandboxed
English
0
0
0
29
Simon Willison
Simon Willison@simonw·
My Mac had less available memory than I expected, turned out the "claude" Claude Code processes on this machine (running in various terminal windows) were consuming ~30GB on their own! The largest one was using 4.9GB
English
83
8
568
70.9K
Stefano
Stefano@maeste·
This is another great use case for lince.sh: spawn your agent(s) with "n" -> close them as soon as they finish the task with "x" -> spawn a new one with "n" -> repeat. Keep focus: all in milliseconds in the same terminal Save memory, work w/ any agent, sandboxed
Simon Willison@simonw

My Mac had less available memory than I expected, turned out the "claude" Claude Code processes on this machine (running in various terminal windows) were consuming ~30GB on their own! The largest one was using 4.9GB

English
0
0
0
29
Stefano retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
With the model's simultaneous speech capability, Horace has gotten a lot easier to work with recently.
English
41
59
1.1K
245.6K
Stefano
Stefano@maeste·
@karpathy @trq212 Html is good for output, I think md is better atm when you communicate with llm and both llm and human need to write the doc. To consume output html is better, depend on the device, a screenshot of the rendered html is better. Hermes decided to send as hit on the phone yesterday
English
0
0
1
34
Andrej Karpathy
Andrej Karpathy@karpathy·
This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage: 1) raw text (hard/effortful to read) 2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default 3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral x.com/zan2434/status… There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen. TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.
Thariq@trq212

x.com/i/article/2052…

English
804
1.7K
16.6K
2.2M
Stefano
Stefano@maeste·
@antirez I suppose is cuda in general, or anything is specific for dgx spark?
English
1
0
0
928
antirez
antirez@antirez·
Soon in DS4: 1. CUDA support (14 t/s, 350 t/s prefill on DGX Spark), 2. Single direction steering support. 3. Huge refactoring to support Metal / CUDA / CPU in a more sensible way.
English
22
23
357
37.9K
Stefano
Stefano@maeste·
@simonw Indeed it is for final output, but if you use documents for human/llm collaboration I prefer information dense format, as md. html is good to read, but bad to write on humans side. In my 2nd brain wiki I use html only for what I call views, md for the collaborative work w/ llm
English
0
0
0
17
Stefano
Stefano@maeste·
@trq212 @antirez @badlogicgames Well as said in another comment it may be useful for pure output, but if llm and human need to collaborate on a document better md. Hand editing html is still pain
English
0
0
0
36
Thariq
Thariq@trq212·
@antirez @badlogicgames like with most things LLM-related this is empirical and subjective, people can have different preferences but imo worth trying i was also skeptical initially tbh, tried things like MDX instead but kept coming back to HTML
English
19
1
127
17K
antirez
antirez@antirez·
Markdown vs HTML. Every time we go from a semantically dense to a semantically sparse format, we lose. Even more today where less tokens from the same content is way better. I can understand we need a better markdown. I can't understand we should replace it with HTML.
English
176
89
1.5K
158.7K
Stefano
Stefano@maeste·
@antirez Yup I agree in general sense. Any artifact used to wok and exchange info with the llm should be information dense. Format w/ information less dense in favor of human consumability is for output only. It's the reason because I added the concept of views to my llm wiki 2nd brain
English
0
0
0
471
Stefano
Stefano@maeste·
API key in env + open network = wrapper, not sandbox. LINCE paranoid: Linux netns isolation (--unshare-net) + credential proxy holds the key over a unix socket. lince.sh/changelog
Stefano tweet media
English
0
2
0
48
Stefano retweetledi
Backlog.md
Backlog.md@Backlog_md·
Backlog v1.45.0 contains multiple new features and QoL improvements across the Web UI, TUI, CLI and MCP. For the full release notes check: github.com/MrLesk/Backlog…
Backlog.md tweet media
English
0
1
4
493