Lucebox

12 posts

Lucebox

@luceboxai

Local inference for consumer hardware.

San Francisco Katılım Ocak 2026

6 Takip Edilen100 Takipçiler

Lucebox retweetledi

poolside@poolsideai·1d

ok this is sick @pupposandro @davideciffa and @luceboxai got Laguna XS.2 running on a single RTX 3090 with ~111 tok/s decode, 5.4x faster 128K prefill vs llama.cpp, and made it the first MoE target for PFlash open weights doing open weights things

English

3.6K

Lucebox retweetledi

Joel - coffee/acc@JoelDeTeves·2d

Update on @luceboxai OOMing with Hermes Agent on RTX 3090: @davideciffa gave me a great suggestion this morning to try with Lucebox and I am happy to report that it works! Here are the settings to make it work with Hermes Agent on RTX 3090: DFLASH27B_KV_TQ3=1 DFLASH27B_PREFILL_UBATCH=128 python3 scripts/server.py --tokenizer Qwen/Qwen3.6-27B --port 8000 --max-ctx 65536 --fa-window 1024 --prefix-cache-slots 1 --budget 8 --daemon This *also* works with @DJLougen Ornstein model! Really looking forward to testing this out! Thank you David! This is one of the most exciting projects in local AI right now!

English

3.5K

Lucebox retweetledi

Sandro@pupposandro·3d

x.com/i/article/2054…

ZXX

109

37.9K

Lucebox retweetledi

mrciffa@davideciffa·4d

You can now benchmark Lucebox Speculative Inference on CUDA/HIP mixed backends, thanks to @maxweicj ! Full AMD HIP server support coming soon 🏎️

English

Lucebox retweetledi

Joel - coffee/acc@JoelDeTeves·4d

Testing @luceboxai ddtree + dflash on the RTX 3090 (Lenovo P920 beast machine) 83 tokens/sec on a single card with Qwen3.6-27B 🤯🤯🤯 This is wild!

English

106

13.4K

Lucebox retweetledi

mrciffa@davideciffa·5d

Big day for Lucebox! Codex, Hermes and OpenClaw now run locally on our speculative inference engine with Qwen3.6-27B. Full OpenAI tool-call compatibility. Thanks @csujun and @jkyamog for the great contribution. 🏎️

GIF

English

100

9.6K

Lucebox retweetledi

Ivan Fioravanti ᯅ@ivanfioravanti·3 May

In reality I have a 3090 not the TI version 😢 and the real issue is the CPU i9-10900KF and the motherboard that are so old that they slow down everything! I'll build a new computer or ask to @luceboxai guys to help me here 😎

English

4.4K

Lucebox retweetledi

Sandro@pupposandro·27 Nis

89.7 tok/s with Qwen3.6-27B at 60K context on a single RTX 3090. 3.64x faster than full attention, 100% speculative acceptance. Just merged sliding window flash attention + two-phase cache into Luce DFlash. FA now attends to the last 2048 KV positions instead of the full 60K, decode jumps from 25 to 91 tok/s. Two-phase cache skips ~1.4 GB of rollback tensors during prefill, migrates them after. Freed enough VRAM to bump prefill ubatch from 192 to 384. Huge thanks to @dusterbloom for the PR, @davideciffa for the review. Repo in the first comment ⬇️

English

443

26.7K

Lucebox retweetledi

Sandro@pupposandro·23 Nis

Qwen3.6-27B at 35 tok/s on a GB10 DGX. Almost 3× faster than vLLM+DFlash, 9× vs vLLM bf16. Luce DFlash is now available on Blackwell consumer GPUs. 5090 and GB10 owners, you've been asking. OpenAI-compatible tool calling works out of the box, so it drops straight into OpenCode, Hermes, Cline, whatever you run. Huge thanks to the incredible @superoo7 for shipping this to the community. Repo in the first comment.