Lucebox

12 posts

Lucebox banner
Lucebox

Lucebox

@luceboxai

Local inference for consumer hardware.

San Francisco Katılım Ocak 2026
6 Takip Edilen100 Takipçiler
Lucebox retweetledi
poolside
poolside@poolsideai·
ok this is sick @pupposandro @davideciffa and @luceboxai got Laguna XS.2 running on a single RTX 3090 with ~111 tok/s decode, 5.4x faster 128K prefill vs llama.cpp, and made it the first MoE target for PFlash open weights doing open weights things
English
4
10
77
3.6K
Lucebox retweetledi
Joel - coffee/acc
Joel - coffee/acc@JoelDeTeves·
Update on @luceboxai OOMing with Hermes Agent on RTX 3090: @davideciffa gave me a great suggestion this morning to try with Lucebox and I am happy to report that it works! Here are the settings to make it work with Hermes Agent on RTX 3090: DFLASH27B_KV_TQ3=1 DFLASH27B_PREFILL_UBATCH=128 python3 scripts/server.py --tokenizer Qwen/Qwen3.6-27B --port 8000 --max-ctx 65536 --fa-window 1024 --prefix-cache-slots 1 --budget 8 --daemon This *also* works with @DJLougen Ornstein model! Really looking forward to testing this out! Thank you David! This is one of the most exciting projects in local AI right now!
English
7
5
37
3.5K
Lucebox retweetledi
mrciffa
mrciffa@davideciffa·
You can now benchmark Lucebox Speculative Inference on CUDA/HIP mixed backends, thanks to @maxweicj ! Full AMD HIP server support coming soon 🏎️
mrciffa tweet media
English
3
2
36
3K
Lucebox retweetledi
Joel - coffee/acc
Joel - coffee/acc@JoelDeTeves·
Testing @luceboxai ddtree + dflash on the RTX 3090 (Lenovo P920 beast machine) 83 tokens/sec on a single card with Qwen3.6-27B 🤯🤯🤯 This is wild!
Joel - coffee/acc tweet media
English
13
10
106
13.4K
Lucebox retweetledi
mrciffa
mrciffa@davideciffa·
Big day for Lucebox! Codex, Hermes and OpenClaw now run locally on our speculative inference engine with Qwen3.6-27B. Full OpenAI tool-call compatibility. Thanks @csujun and @jkyamog for the great contribution. 🏎️
GIF
English
10
11
100
9.6K
Lucebox retweetledi
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
In reality I have a 3090 not the TI version 😢 and the real issue is the CPU i9-10900KF and the motherboard that are so old that they slow down everything! I'll build a new computer or ask to @luceboxai guys to help me here 😎
English
6
1
26
4.4K
Lucebox retweetledi
Sandro
Sandro@pupposandro·
89.7 tok/s with Qwen3.6-27B at 60K context on a single RTX 3090. 3.64x faster than full attention, 100% speculative acceptance. Just merged sliding window flash attention + two-phase cache into Luce DFlash. FA now attends to the last 2048 KV positions instead of the full 60K, decode jumps from 25 to 91 tok/s. Two-phase cache skips ~1.4 GB of rollback tensors during prefill, migrates them after. Freed enough VRAM to bump prefill ubatch from 192 to 384. Huge thanks to @dusterbloom for the PR, @davideciffa for the review. Repo in the first comment ⬇️
Sandro tweet media
English
43
30
443
26.7K
Lucebox retweetledi
Sandro
Sandro@pupposandro·
Qwen3.6-27B at 35 tok/s on a GB10 DGX. Almost 3× faster than vLLM+DFlash, 9× vs vLLM bf16. Luce DFlash is now available on Blackwell consumer GPUs. 5090 and GB10 owners, you've been asking. OpenAI-compatible tool calling works out of the box, so it drops straight into OpenCode, Hermes, Cline, whatever you run. Huge thanks to the incredible @superoo7 for shipping this to the community. Repo in the first comment.
Sandro tweet media
English
30
20
231
38.4K