Lucebox

26 posts

Lucebox banner
Lucebox

Lucebox

@luceboxai

The computer for local agents.

San Francisco 가입일 Ocak 2026
9 팔로잉411 팔로워
고정된 트윗
Lucebox
Lucebox@luceboxai·
Lucebox had 35+ contributors in 6 weeks from launch. Huge thank you to everyone helping test, benchmark, debug, and improve local inference. Just the start.
Lucebox tweet media
English
4
3
21
3K
Sandro
Sandro@pupposandro·
Open heart RTX 3090 surgery on @ivanfioravanti's Zotac card. The card was very old and was easily hitting 90 C under load. Original pads were baked, and paste turned to dust. We're switching the thermal interface and will send him full pre and post benchmarks after the operation. For this we're using @Thermal_Grizzly phase-change pads on the GPU core, non-conductive and rated to hold forever. Fresh pads on the memories. Doing this work on every single @luceboxai machine we produce.
Sandro tweet media
English
23
8
166
12K
Lucebox 리트윗함
mrciffa
mrciffa@davideciffa·
Thanks to @csujun now our Lucebox engine enable users with only 16GB of memory to run Qwen 3.6 35B. Maintaining output quality and speed by offloading cold experts on the CPU. Many more news about hybrid decoding in the following days. 🏎️
mrciffa tweet media
English
7
5
54
6K
Lucebox 리트윗함
Sandro
Sandro@pupposandro·
Scrapped 500+ issues and PRs to ship a massive @luceboxai repo redesign and fixes. Very proud of the team. github.com/Luce-Org/luceb… The fastest inference server isn't going to come from a datacenter, it's going to run on the GPU already in your house.
Sandro tweet media
English
10
11
120
10.8K
Lucebox
Lucebox@luceboxai·
Great work from @dusterbloom 🔥
mrciffa@davideciffa

Thanks to @dusterbloom now Luce PFlash is self-adaptive and can auto-tune to your favourite harness context (OpenClaw, Hermes etc..) to give you up to x10 faster prefill time compared to standard inference engine. 🏎️

English
4
2
11
4.8K
Lucebox 리트윗함
Sandro
Sandro@pupposandro·
@luceboxai is not affiliated with any cryptocurrency or coin, and we’ll never be.
English
3
3
26
2K
Lucebox 리트윗함
mrciffa
mrciffa@davideciffa·
If you have an Nvidia RTX 4090 --ddtree-budget 36 is the best configuration that buys you 2.5x speed up during decoding for Qwen3.6_27B. Thanks for the benchmark github.com/1TommyCheung 🙌
mrciffa tweet media
English
2
11
103
7.6K
Lucebox 리트윗함
mrciffa
mrciffa@davideciffa·
Thanks to @csujun now @luceboxai server supports Gemma4! Pretty good speed up with quantized DFlash drafters to make everything fit in 24GB of VRAM. At the same time tool calling gets a 1.5-1.7x boost in every supported harness 🏎️🏎️
mrciffa tweet mediamrciffa tweet media
English
4
2
26
6.4K
Lucebox 리트윗함
Sandro
Sandro@pupposandro·
Very important work from @huggingface. Mapping what the community is running matters for us at Lucebox too: it shows where our help is most needed. I was surprised to see so many RTX 3060s. Also cool to see @julien_c is a 3090 fan as well!
Sandro tweet media
Julien Chaumond@julien_c

What hardware actually powers open-source AI? Not benchmarks. Not vendor marketing. Real-world community usage. We’re launching @huggingface Hardware: → trending GPUs & CPUs → VRAM distribution → inference hardware trends → what the OSS AI ecosystem really runs on

English
6
8
108
28.8K
Lucebox
Lucebox@luceboxai·
🚀 Big performance win! Luce PFlash now runs up to 12× faster on 128K context with AMD Strix Halo. Huge thanks to our contributors for making this possible! 🙌
Lucebox tweet media
English
1
1
6
718
Lucebox 리트윗함
mrciffa
mrciffa@davideciffa·
Testing and UX should be first-class priorities for our inference engine. We just added Lucebox harness launchers so users can run Lucebox directly from tools like Hermes, Codex, Pi, OpenClaw etc. Each harness includes RTX 3090-safe starting settings to avoid OOM. We’ll keep improving them with community benchmarks and contributor feedback. 🏎️ github.com/Luce-Org/luceb…
mrciffa tweet media
English
2
6
33
2.3K
Lucebox 리트윗함
Lucebox 리트윗함
Poolside
Poolside@poolsideai·
ok this is sick @pupposandro @davideciffa and @luceboxai got Laguna XS.2 running on a single RTX 3090 with ~111 tok/s decode, 5.4x faster 128K prefill vs llama.cpp, and made it the first MoE target for PFlash open weights doing open weights things
English
4
10
79
4.4K
Lucebox 리트윗함
Joel - coffee/acc
Joel - coffee/acc@JoelDeTeves·
Update on @luceboxai OOMing with Hermes Agent on RTX 3090: @davideciffa gave me a great suggestion this morning to try with Lucebox and I am happy to report that it works! Here are the settings to make it work with Hermes Agent on RTX 3090: DFLASH27B_KV_TQ3=1 DFLASH27B_PREFILL_UBATCH=128 python3 scripts/server.py --tokenizer Qwen/Qwen3.6-27B --port 8000 --max-ctx 65536 --fa-window 1024 --prefix-cache-slots 1 --budget 8 --daemon This *also* works with @DJLougen Ornstein model! Really looking forward to testing this out! Thank you David! This is one of the most exciting projects in local AI right now!
English
7
5
36
3.9K
Lucebox 리트윗함
mrciffa
mrciffa@davideciffa·
You can now benchmark Lucebox Speculative Inference on CUDA/HIP mixed backends, thanks to @maxweicj ! Full AMD HIP server support coming soon 🏎️
mrciffa tweet media
English
3
2
36
3.2K
Lucebox 리트윗함
Joel - coffee/acc
Joel - coffee/acc@JoelDeTeves·
Testing @luceboxai ddtree + dflash on the RTX 3090 (Lenovo P920 beast machine) 83 tokens/sec on a single card with Qwen3.6-27B 🤯🤯🤯 This is wild!
Joel - coffee/acc tweet media
English
13
10
106
13.7K
Lucebox 리트윗함
mrciffa
mrciffa@davideciffa·
Big day for Lucebox! Codex, Hermes and OpenClaw now run locally on our speculative inference engine with Qwen3.6-27B. Full OpenAI tool-call compatibility. Thanks @csujun and @jkyamog for the great contribution. 🏎️
GIF
English
10
11
98
9.8K