Lucebox

26 posts

Lucebox

@luceboxai

The computer for local agents.

San Francisco 가입일 Ocak 2026

9 팔로잉411 팔로워

고정된 트윗

Lucebox@luceboxai·22 May

Lucebox had 35+ contributors in 6 weeks from launch. Huge thank you to everyone helping test, benchmark, debug, and improve local inference. Just the start.

English

Lucebox@luceboxai·3d

@pupposandro @ivanfioravanti 👀

QME

148

Sandro@pupposandro·3d

Open heart RTX 3090 surgery on @ivanfioravanti's Zotac card. The card was very old and was easily hitting 90 C under load. Original pads were baked, and paste turned to dust. We're switching the thermal interface and will send him full pre and post benchmarks after the operation. For this we're using @Thermal_Grizzly phase-change pads on the GPU core, non-conductive and rated to hold forever. Fresh pads on the memories. Doing this work on every single @luceboxai machine we produce.

English

166

12K

Lucebox 리트윗함

mrciffa@davideciffa·4d

Thanks to @csujun now our Lucebox engine enable users with only 16GB of memory to run Qwen 3.6 35B. Maintaining output quality and speed by offloading cold experts on the CPU. Many more news about hybrid decoding in the following days. 🏎️

English

Lucebox@luceboxai·28 May

Great work @easel , @davideciffa 🏎️

Sandro@pupposandro

x.com/i/article/2060…

English

1.4K

Lucebox 리트윗함

Sandro@pupposandro·28 May

Scrapped 500+ issues and PRs to ship a massive @luceboxai repo redesign and fixes. Very proud of the team. github.com/Luce-Org/luceb… The fastest inference server isn't going to come from a datacenter, it's going to run on the GPU already in your house.

English

120

10.8K

Lucebox@luceboxai·27 May

Great work from @dusterbloom 🔥

mrciffa@davideciffa

Thanks to @dusterbloom now Luce PFlash is self-adaptive and can auto-tune to your favourite harness context (OpenClaw, Hermes etc..) to give you up to x10 faster prefill time compared to standard inference engine. 🏎️

English

4.8K

Lucebox@luceboxai·26 May

⚙️ building in progress ⚙️

Ivan Fioravanti ᯅ@ivanfioravanti

RTX 3090 ready for a new life! Bringing it to @luceboxai team to make some experiments together 😎

English

2.5K

Lucebox 리트윗함

Sandro@pupposandro·24 May

@luceboxai is not affiliated with any cryptocurrency or coin, and we’ll never be.

English

Lucebox 리트윗함

mrciffa@davideciffa·24 May

If you have an Nvidia RTX 4090 --ddtree-budget 36 is the best configuration that buys you 2.5x speed up during decoding for Qwen3.6_27B. Thanks for the benchmark github.com/1TommyCheung 🙌

English

103

7.6K

Lucebox 리트윗함

mrciffa@davideciffa·22 May

Thanks to @csujun now @luceboxai server supports Gemma4! Pretty good speed up with quantized DFlash drafters to make everything fit in 24GB of VRAM. At the same time tool calling gets a 1.5-1.7x boost in every supported harness 🏎️🏎️

English

6.4K

Lucebox 리트윗함

Sandro@pupposandro·20 May

Very important work from @huggingface. Mapping what the community is running matters for us at Lucebox too: it shows where our help is most needed. I was surprised to see so many RTX 3060s. Also cool to see @julien_c is a 3090 fan as well!

Julien Chaumond@julien_c

What hardware actually powers open-source AI? Not benchmarks. Not vendor marketing. Real-world community usage. We’re launching @huggingface Hardware: → trending GPUs & CPUs → VRAM distribution → inference hardware trends → what the OSS AI ecosystem really runs on

English

108

28.8K

Lucebox@luceboxai·21 May

🚀 Big performance win! Luce PFlash now runs up to 12× faster on 128K context with AMD Strix Halo. Huge thanks to our contributors for making this possible! 🙌

English

718

Lucebox 리트윗함

mrciffa@davideciffa·17 May

Testing and UX should be first-class priorities for our inference engine. We just added Lucebox harness launchers so users can run Lucebox directly from tools like Hermes, Codex, Pi, OpenClaw etc. Each harness includes RTX 3090-safe starting settings to avoid OOM. We’ll keep improving them with community benchmarks and contributor feedback. 🏎️ github.com/Luce-Org/luceb…

English

2.3K

Lucebox 리트윗함

mrciffa@davideciffa·16 May

We didn't know that our megakernel could be 25x faster than Pytorch on a RTX A6000 wow! 🏎️🏎️

Fahd Mirza@fahdmirza

⚡ Luce Megakernel just proved the NVIDIA efficiency gap is a software problem not a hardware one 🔬 a 2020 RTX 3090 at 220W now matches Apple M5 Max efficiency and delivers 1.8x the throughput 🔹 413 tok/s decode vs 267 tok/s on llama.cpp — same GPU, different software 🔹 1.87 tok/J — matching Apple M5 Max at less than a third of the system cost 🔹 All 24 layers of Qwen3.5-0.8B fused into a single CUDA kernel — zero CPU round trips 🔹 25x faster than PyTorch HuggingFace on the same hardware 🔹 Hybrid DeltaNet and Attention architecture — the first megakernel ever built for this pattern 🔥 Full breakdown and live benchmark below 👇 youtu.be/e6jY4goVIu0

English

15.4K

Lucebox 리트윗함

Poolside@poolsideai·14 May

ok this is sick @pupposandro @davideciffa and @luceboxai got Laguna XS.2 running on a single RTX 3090 with ~111 tok/s decode, 5.4x faster 128K prefill vs llama.cpp, and made it the first MoE target for PFlash open weights doing open weights things

English

4.4K

Lucebox 리트윗함

Joel - coffee/acc@JoelDeTeves·13 May

Update on @luceboxai OOMing with Hermes Agent on RTX 3090: @davideciffa gave me a great suggestion this morning to try with Lucebox and I am happy to report that it works! Here are the settings to make it work with Hermes Agent on RTX 3090: DFLASH27B_KV_TQ3=1 DFLASH27B_PREFILL_UBATCH=128 python3 scripts/server.py --tokenizer Qwen/Qwen3.6-27B --port 8000 --max-ctx 65536 --fa-window 1024 --prefix-cache-slots 1 --budget 8 --daemon This *also* works with @DJLougen Ornstein model! Really looking forward to testing this out! Thank you David! This is one of the most exciting projects in local AI right now!

English

3.9K

Lucebox 리트윗함

Sandro@pupposandro·12 May

x.com/i/article/2054…

ZXX

110

39.1K

Lucebox 리트윗함

mrciffa@davideciffa·11 May

You can now benchmark Lucebox Speculative Inference on CUDA/HIP mixed backends, thanks to @maxweicj ! Full AMD HIP server support coming soon 🏎️

English

3.2K

Lucebox 리트윗함

Joel - coffee/acc@JoelDeTeves·11 May

Testing @luceboxai ddtree + dflash on the RTX 3090 (Lenovo P920 beast machine) 83 tokens/sec on a single card with Qwen3.6-27B 🤯🤯🤯 This is wild!

English

106

13.7K

Lucebox 리트윗함

mrciffa@davideciffa·10 May

Big day for Lucebox! Codex, Hermes and OpenClaw now run locally on our speculative inference engine with Qwen3.6-27B. Full OpenAI tool-call compatibility. Thanks @csujun and @jkyamog for the great contribution. 🏎️

GIF

English

9.8K

탐색

@pupposandro @ivanfioravanti @Thermal_Grizzly @csujun @easel @davideciffa @dusterbloom @huggingface