Stefano

1.6K posts

Stefano

@maeste

Code climber - no problem with strong opinions - https://t.co/mU1It6LAKI - https://t.co/wslalil0Wo -

Cremona, Lombardia Katılım Ekim 2008

354 Takip Edilen365 Takipçiler

Sabitlenmiş Tweet

Stefano@maeste·11 Nis

My entire coding setup: one terminal. No IDE. No browser. No Electron eating 16GB of RAM. LINCE is a terminal-native multi-agent workstation. Sandbox, dashboard, voice input, session persistence. Minimal dependencies. lince.sh

English

163

Stefano@maeste·20h

@antirez I double checked and you Are right. Flash is not distillation of Pro. 2 different models with distinct pre and post training. Diatillation is, as you said, intra model only to align routing to the experts in the MoE architecture

English

antirez@antirez·1d

@maeste I think the distillation is intra-model. They train several purpose-built variants for math, programming, ... and so forth and use those to build the final model.

English

antirez@antirez·1d

Exactly, I found myself for the first time *ever* to talk to a model that can run on my Computer about the random things you could ask to Claude. Like history and other stuff. I also did a benchmark on Italian historical facts wth Qwen 27B vs DeepSeek v4 Flash 2 bit quants (continue)

Armin Ronacher ⇌@mitsuhiko

A nice thing about DeepSeek V4 Flash locally is that it’s a big enough model that you can have it explain shit to you and it won’t completely lie to you. Tried to walk through some choices in ds4.c and I felt pretty good about the experience.

English

272

28.9K

Stefano@maeste·1d

@antirez Different version of 1.6T though. Or are you saying flash is trained from scratch, with different traing of 1.6T. Sounds strange because it's anti economical. I need to read again the paper

English

Stefano@maeste·1d

@antirez Well in theory they could be very very lucky in the diatillation and get. >50%, but with 1.6T is like win in the casino

English

Stefano@maeste·1d

@antirez Possibly, even if it sounds strange if what I remember is correct: flash is a policy distillation of the pro (just from. Top of my head, need to check). And if it is the pure RL should happen only on the big one and distilled to flush. So theory is 50% as well, or even a bit less

English

Stefano@maeste·1d

@antirez Sure because w/ smaller one you need to decide what to. Give it in the pretraing

English

antirez@antirez·1d

@maeste Knowledge depends a lot on size. But knowledge has very vast effects also on capabilities, because the model can sample from many ideas / patterns / algorithms. DeepSeek v4 Flash is strong on both the sides.

English

130

Stefano@maeste·1d

@antirez That said ds4 is still preview, which means the RL phase is not finished yet. If I well remember in their announce paper they say around 50% which is very very promising in terms of capabilities for the final

English

Stefano@maeste·1d

@antirez Well you are testing knowledge compression, while most benchmarks test capabilities. If on one hand knowledge is important, many researchs have evidence capabilities depends more on RL phase, while knowledge is more in the realm of pretreanig

English

Stefano@maeste·1d

@simonw This is another great use case for lince.sh: spawn your agent(s) with "n" -> close them as soon as they finish the task with "x" -> spawn a new one with "n" -> repeat. Keep focus: all in milliseconds in the same terminal Save memory, work w/ any agent, sandboxed

English

Simon Willison@simonw·1d

My Mac had less available memory than I expected, turned out the "claude" Claude Code processes on this machine (running in various terminal windows) were consuming ~30GB on their own! The largest one was using 4.9GB

English

568

70.9K

Stefano@maeste·1d

This is another great use case for lince.sh: spawn your agent(s) with "n" -> close them as soon as they finish the task with "x" -> spawn a new one with "n" -> repeat. Keep focus: all in milliseconds in the same terminal Save memory, work w/ any agent, sandboxed

Simon Willison@simonw

English

Stefano retweetledi

Thinking Machines@thinkymachines·1d

With the model's simultaneous speech capability, Horace has gotten a lot easier to work with recently.

English

1.1K

245.6K

Stefano@maeste·1d

Wow if you seriously tried voice mode with any SOTA model...you probably get how this video is mind-blowing. The live translation is impressive.

Mira Murati@miramurati

Today we're sharing our work on interaction models. A new class of model trained from scratch to handle real-time interaction natively, instead of gluing it onto a turn-based one. youtu.be/A12AVongNN4

English

Stefano@maeste·1d

@karpathy @trq212 Html is good for output, I think md is better atm when you communicate with llm and both llm and human need to write the doc. To consume output html is better, depend on the device, a screenshot of the rendered html is better. Hermes decided to send as hit on the phone yesterday

English

Andrej Karpathy@karpathy·1d

This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage: 1) raw text (hard/effortful to read) 2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default 3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral x.com/zan2434/status… There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen. TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.

Thariq@trq212

x.com/i/article/2052…

English

804

1.7K

16.6K

2.2M

Stefano@maeste·2d

@antirez I suppose is cuda in general, or anything is specific for dgx spark?

English

928

antirez@antirez·2d

Soon in DS4: 1. CUDA support (14 t/s, 350 t/s prefill on DGX Spark), 2. Single direction steering support. 3. Huge refactoring to support Metal / CUDA / CPU in a more sensible way.

English

357

37.9K

Stefano@maeste·2d

@simonw Indeed it is for final output, but if you use documents for human/llm collaboration I prefer information dense format, as md. html is good to read, but bad to write on humans side. In my 2nd brain wiki I use html only for what I call views, md for the collaborative work w/ llm

English

Simon Willison@simonw·4d

Asking for HTML explanations of things is pretty neat, I tried it just now with the obfuscated Python POC for the new copy.fail Linux vulnerability: #trying-this-out" target="_blank" rel="nofollow noopener">simonwillison.net/2026/May/8/unr…

Thariq@trq212

x.com/i/article/2052…

English

845

123.9K

Stefano@maeste·2d

Yup, local inference as Hacker or user is something becoming more and more relevant... But very expensive

antirez@antirez

Appreciate Ivan tweet. To put this into context, to build DS4 I used: my MacBook M3 Max (mine, 8k euros), 1 M3 Ultra with 512 GB (got access, 10k euros), one DGX Spark (got access, 4k euros?). Are we far from the times all you needed to do hacking was a computer? That's sad.

English

Stefano@maeste·3d

@trq212 @antirez @badlogicgames Well as said in another comment it may be useful for pure output, but if llm and human need to collaborate on a document better md. Hand editing html is still pain

English

Thariq@trq212·3d

@antirez @badlogicgames like with most things LLM-related this is empirical and subjective, people can have different preferences but imo worth trying i was also skeptical initially tbh, tried things like MDX instead but kept coming back to HTML

English

127

17K

antirez@antirez·3d

Markdown vs HTML. Every time we go from a semantically dense to a semantically sparse format, we lose. Even more today where less tokens from the same content is way better. I can understand we need a better markdown. I can't understand we should replace it with HTML.

English

176

1.5K

158.7K

Stefano@maeste·3d

@antirez Yup I agree in general sense. Any artifact used to wok and exchange info with the llm should be information dense. Format w/ information less dense in favor of human consumability is for output only. It's the reason because I added the concept of views to my llm wiki 2nd brain

English

471

Stefano@maeste·6d

API key in env + open network = wrapper, not sandbox. LINCE paranoid: Linux netns isolation (--unshare-net) + credential proxy holds the key over a unix socket. lince.sh/changelog

English

Stefano@maeste·4 May

CLI, AI and a Rover that didn't go straight: robotics seen by a software engineer. Read the full story here: artificialcode.substack.com/p/cli-ai-and-a…

English

Stefano retweetledi

Backlog.md@Backlog_md·3 May

Backlog v1.45.0 contains multiple new features and QoL improvements across the Web UI, TUI, CLI and MCP. For the full release notes check: github.com/MrLesk/Backlog…

English

493

Keşfet

@antirez @simonw @karpathy @trq212 @elonmusk @BarackObama @taylorswift13 @cristiano