antirez

42.1K posts

antirez banner
antirez

antirez

@antirez

Reproducible bugs are candies. I like programming too much for not liking automatic programming.

Sicily, Italy Katılım Mayıs 2007
770 Takip Edilen55.4K Takipçiler
Sabitlenmiş Tweet
antirez
antirez@antirez·
My second short story release in English is ready: Tales of Illustrious Computer Scientists: Iola Varga, nun and computer scientist. invece.org/iola.html
English
8
6
122
127K
antirez retweetledi
lcamtuf
lcamtuf@lcamtuf·
The coreutils Rust rewrite story is pretty funny. Coreutils are tools like rm, mv, mkdir, etc. Unlike binutils, this isn't a fertile ground for memory safety bugs. But, the rewrite was completed, and in the spirit of progress, Canonical decided to switch. 🡇
English
29
43
633
89.5K
antirez
antirez@antirez·
@p_mbanugo GPT started to be good with the 5.2 series but 5.3 for me was a jump.
English
0
0
1
54
Peter Mbanugo
Peter Mbanugo@p_mbanugo·
@antirez Does it matter if it's GPT 5.3 codex or GPT 5.4 or later? I'm curious if there's a min version you think was very well or good enough.
English
1
0
0
72
antirez retweetledi
Alexandru Ică
Alexandru Ică@vg_head·
@antirez I am working on a knowledge base full of legislation, and IIUC this is _precisely_ what I would want. Markdown files, where the agent can grep through everything trivially. Thank you for this. I was searching for a solution for a long time.
English
1
1
1
894
antirez
antirez@antirez·
There are projects that I develop not looking at the code, but looking and owning the concepts, algorithms, ideas, and product. But not for Redis, not yet at least. When in the future this will be possible, server software the way it is developed today will be over, there will be still projects I believe, but developed in a very different way. Programmers will do mainly what Linus did so far for the kernel.
English
0
0
0
307
Vittorio Romeo
Vittorio Romeo@supahvee1234·
@antirez Closely matches my own experiences with current SOTA AI. Extremely useful collaborator, far from being a replacement for human intelligence and creativity.
English
1
0
1
404
antirez
antirez@antirez·
ARGREP was the *last* command I added to the specification. I realized that the Array type was perfect to store text files only very later during the development :) But I believe it is going to be my main use case in the short run.
English
0
0
3
1.4K
antirez
antirez@antirez·
I believe the fact that Redis is so well understood by LLMs and people, is remote, and this support will allow to create knowledge bases for agents that are not centralized, do not need to live in the filesystem, and are trivial to update / access.
English
2
1
7
1.8K
antirez
antirez@antirez·
ARGREP was the *last* command I added to the specification. I realized that the Array type was perfect to store text files only very later during the development :)
English
2
0
7
1.4K
antirez
antirez@antirez·
I believe the fact that Redis is so well understood by LLMs and people, is remote, and this support will allow to create knowledge bases for agents that are not centralized, do not need to live in the filesystem, and are trivial to update / access.
English
1
1
20
2.3K
antirez
antirez@antirez·
One thing to understand about the new Array type of Redis, and the support of ARGREP, is that you can store, in Redis keys, different markdown documents (skills) that are collectively used and updated by a multitude of remote agents.
English
2
8
83
6.9K
Gian Giovani
Gian Giovani@g_giovanii·
@antirez Is this the new data type that teased many times before?
English
1
0
0
683
antirez
antirez@antirez·
@pupposandro Basically DeepSeek v4 Lightning Indexer but as a component of existing models (even if with limitations compared to the DS4 architecture of course). Interesting idea.
English
0
0
1
86
Sandro
Sandro@pupposandro·
We just released something new: Luce PFlash Long-context prefill is a silent killer for throughput speed. llama.cpp takes ~257 seconds to prefill 128K tokens of Qwen3.6-27B on a single RTX 3090. So we tried to solve the problem. A small Qwen3-0.6B drafter loads in-process, scores token importance across the whole prompt, and the heavy 27B target only prefills the spans that matter. 128K prompt in 24.8 seconds, ~10.4x faster TTFT, NIAH retrieval preserved at every measured context. It is a clean C++/CUDA port of FlashPrefill wired through Block-Sparse Attention, with a custom Qwen3-0.6B BF16 forward so drafter and target share one ggml allocator. The whole thing is a single daemon command (compress) in front of the existing dflash spec-decode stack. More details here: github.com/Luce-Org/luceb…
GIF
Sandro@pupposandro

x.com/i/article/2050…

English
38
91
704
112K
antirez
antirez@antirez·
What I was able to achieve today was a *much* better slope in the prefill rate, that remains at 200 t/s even in very long contexts. This already makes the game a more fair one, since it does not start to degradate in a sensible way as you continue to work. In the M3 Ultra is much faster btw. I also tested there, 2x speed in prefill.
English
0
0
20
2.1K
Mario Zechner
Mario Zechner@badlogicgames·
@antirez wonder what magic could be done to improve prefill rate. i think that's mostly what's holding things back at the moment.
English
2
0
14
2.7K
antirez
antirez@antirez·
DeepSeek v4 small KV cache + MacBook fast SSD disks = the idea that the disk is not a good target for KV cache is, in this context, totally obsolete. It works *great*. The session you see is opencode using my inference engine for DS4, saving, loading sessions from disk.
antirez tweet media
English
35
28
578
42.4K
antirez
antirez@antirez·
@lucastech 128 GB, with space for generous contexts. 2 bit asymmetric quantization where shared experts, routing and projections are taken at full quality.
English
0
0
9
1.1K
Lucas Tech
Lucas Tech@lucastech·
@antirez The Apple nvme are dramatically faster than most ssd, which makes the disk access much more tenable than it would be in most other systems. How much ram do you need for that to run locally though?
English
1
0
2
1.3K
antirez
antirez@antirez·
@thought_sync Yep I'll make all of it MIT licensed. It will take some time as I believe in the AI space we see too many rushed things, so I want to make sure it works well before releasing it.
English
0
0
18
817
Vyacheslav
Vyacheslav@thought_sync·
@antirez Is it possible to try out your inference engine!?
English
1
0
0
901
antirez
antirez@antirez·
@danveloper Yep consider that this is an M3 Max not an M3 Ultra, in the Ultra I get 2x prefill speed, and the same speed with 4 bit quants instead of 2 bit (only for routed experts, all the other weights are as released by @deepseek_ai).
English
2
1
12
2.2K
Dan Woods
Dan Woods@danveloper·
@antirez There are so many caches and hardware accelerators in the Apple Fabric, I'm sure you could make that even faster if you wanted to cut the ~6s off prefill. But, completely usable anyway!
English
1
0
10
2.7K