devnulling

3.2K posts

devnulling

devnulling

@devnulling

Inscrit le Mart 2015
910 Abonnements1.2K Abonnés
Tweet épinglé
devnulling
devnulling@devnulling·
What they are really hiding in area 51... #gnuradio #sdr
devnulling tweet media
English
1
8
66
0
devnulling
devnulling@devnulling·
@ibelings look up the Corsair Xeneon Edge , could make it all digital!
English
0
0
3
196
Marcelo Samsoniuk
Marcelo Samsoniuk@samsoniuk·
@DutraCGI @ico_TC @devnulling wow! is possible attach an ADC and send realtime data? is the processing multithread or single thread? a bandwidth count like “samples/s” would help! ☺️
English
2
0
1
38
Paulo Dutra - PU4THZ
Paulo Dutra - PU4THZ@DutraCGI·
Baudline didn't have a zoom feature so I made my own Plotter "thing" with #ImGui and #Vulkan to debug the C++ modem... input is via stdin... "Digital Phosphor" Techique... It also has one ''XY" Mode...
Paulo Dutra - PU4THZ tweet mediaPaulo Dutra - PU4THZ tweet mediaPaulo Dutra - PU4THZ tweet media
English
2
13
99
6.9K
Vlastimil Slinták
Vlastimil Slinták@vascocz·
Here are a few more notes about what I’ve been working on lately around New Packet Radio. uart.cz/2916/experimen… New hardware, RP2350 firmware rewrite, and some thoughts about authentication in ham packet networks.
Vlastimil Slinták tweet media
English
4
38
246
10.7K
devnulling
devnulling@devnulling·
@sudoingX an observation, with qwopus, im still needing to use the jinja template or it will fail sometimes
English
0
0
0
97
Sudo su
Sudo su@sudoingX·
i'm going to sleep. the thread stays open. drop your GPU, your flags, your tok/s. help the next person skip a week of debugging. or just watch. that works too.
English
1
0
27
2.6K
Sudo su
Sudo su@sudoingX·
3am and still going. the acceleration is real. either you're in it or you're watching it.
English
4
0
31
3.2K
devnulling
devnulling@devnulling·
@sudoingX Fwiw seeing 34 toks with Qwopus, thinking on, on a rtx 3090 over here
English
0
0
1
419
Sudo su
Sudo su@sudoingX·
downloading Qwen3.5-27B Claude 4.6 Opus Reasoning Distilled(Qwopus) right now. Q4_K_M quant on a single RTX 3090. same hardware i've been testing every model on this month. someone took the base model i've been daily driving and distilled Claude Opus 4.6 reasoning chains into it. same 27B parameters, same architecture, but fine tuned on how Claude thinks through problems. the base model already built 1,827 lines of working code in 13 minutes with zero steering. curious what distilled reasoning adds. switching harness too. the base ran on OpenCode. this one runs through Claude Code. claude distilled model through claude's own coding agent. want to see if the reasoning patterns carry differently when the harness matches the distillation source. will post speed sweep first to get the numbers. then checking if the jinja template bug that silently kills thinking mode carries over from the base model. then octopus invaders. same prompt that base qwen passed in 13 minutes and hermes 4.3 failed on 2x the hardware. 4 models. 1 GPU. 1 prompt. results soon.
Sudo su tweet media
Sudo su@sudoingX

been daily driving qwen 3.5 27B dense. haven't even finished testing it properly and now claude opus reasoning gets distilled into the same base. things are dropping faster than i can benchmark. might pull this and test with claude code and opencode. first thing to check: does the jinja template bug carry over? the one that silently kills thinking mode when you use agent tools. if your server logs show thinking = 0, your model isn't reasoning and the server won't tell you. claude level reasoning on a single 3090. locally. we'll see.

English
38
40
798
128.8K
devnulling
devnulling@devnulling·
@sudoingX The soundtrack is off the chain. Like a mix of 8bit retro synthwave.
English
0
0
2
520
Sudo su
Sudo su@sudoingX·
the tiebreaker is done. qwen 3.5 27B dense. single RTX 3090. one prompt. zero steering. zero human edits. 1,827 lines across 10 files. 13 minutes. full thinking mode. runs on first load. hermes 4.3 got the same prompt with 2x 3090s and 5x the context it needed. wrote 1,249 lines, left empty files, needed 3 interventions, game was broken on load. same architecture class. same quant. hermes got double the hardware. completely different result. dense wasn't the problem. hermes was. but here's what got me. this model thinks at 27 tok/s. every single token carries 27 billion parameters of reasoning. MoE hit 112 tok/s but only 3B active per token. the dense model is slower and it doesn't matter. watch 13 minutes of autonomous coding on a consumer GPU with zero intervention and tell me speed is what matters. a year ago this wasn't possible. now it runs on hardware you can buy used for $900. no API. no subscription. no cloud. just a 3090 doing what data centers did 18 months ago. full unedited session in the video. every token, every file, every thinking chain. 16 minutes. hit play.
Sudo su@sudoingX

first impressions of qwen 3.5 27B dense on a single RTX 3090. 35 tok/s. from 4K all the way to 300K+ context. no speed drop. hermes 4.3 started at 35 and degraded to 15 as context filled. qwen dense holds. MoE held 112 flat. 3x faster but only 3B of 35B active per token. architecture tradeoff. Q4_K_M on 16.7GB. native context 262K. pushed past training limit to 376K before VRAM ceiling on 24GB. tried q8 KV cache at 262K, speed collapsed to 11 tok/s. q4_0 KV is the sweet spot. flash attention mandatory. built in reasoning mode. the model thinks step by step before it answers. full chain of thought surviving Q4 quant. 1,799+ token thinking chains with self correction loops. on a single consumer GPU. gave it one prompt: "build a realtime particle galaxy simulation in one HTML file." 3,340 tokens. 95 seconds. one shot. ran on first load. full reasoning and coding in the video below. optimal config if you want to skip the hours of testing: llama-server -ngl 99 -c 262144 -fa on --cache-type-k q4_0 --cache-type-v q4_0 this is just the warmup. octopus invaders is next: 10 files, 3,400+ lines, zero steering. the prompt hermes quit at 22%. already more impressed than expected. full results coming soon.

English
36
45
559
90.3K
devnulling
devnulling@devnulling·
@ibelings Yes, 8 channels of 500mhz over 100gbit plz
English
0
0
3
238
devnulling
devnulling@devnulling·
@sudoingX Its kind of crazy how fast all of these models are progressing
English
1
0
2
347
Sudo su
Sudo su@sudoingX·
was about to start the octopus invaders test on qwen 3.5 27B dense last night. fell asleep. woke up to unsloth dropping updated GGUFs with improved iMatrix tuned for coding and tool use. glad i slept on it. downloading the update now so the test gets the best quant available.
Sudo su tweet media
Unsloth AI@UnslothAI

We're releasing our final update to Qwen3.5 GGUFs for improved performance. - Qwen3.5 GGUFs now use our new iMatrix data for better chat, coding & tool use. - New improved quant algorithm - Re-download 35B, 27B, 122B GGUFs: huggingface.co/collections/un… Guide: unsloth.ai/docs/models/qw…

English
5
4
132
12.6K
devnulling
devnulling@devnulling·
@sudoingX @Everlier Very awesome, looking forward to the results. will be interesting to see how much more space for context is available with the qwen3.5 27B and tok/s, etc.
English
0
0
1
37
Sudo su
Sudo su@sudoingX·
i spent 3 hours finding the sweet spot for hermes 4.3 36B on a single RTX 3090. saving you the trouble, anon. the model is 21.8GB at Q4_K_M. that leaves 2.2GB free on 24GB VRAM. not much room for KV cache. here's what actually happened: 4K: 35.3 tok/s 8K: 35.2 tok/s 16K: 34.8 tok/s 32K: 34.6 tok/s 64K: 6.4 tok/s 128K: 1.9 tok/s flat from 4K to 32K. then it falls off a cliff at 64K. the trick is quantized KV cache. without it you OOM at 16K. with quantized KV cache you get 32K at full speed. all 64 layers on GPU. at 64K something weird happens. ngl 99 (all layers on GPU) = 3.96 tok/s. the KV cache silently spills to CPU. drop to ngl 55 and speed jumps to 6.37. drop to ngl 48 and it gets worse again (3.46). there's an offload sweet spot where you free just enough VRAM for the cache without losing too much compute to PCIe transfers. 128K works at ngl 32 but you're at 1.95 tok/s. half the model on CPU. usable for batch work, not for interactive. the sweet spot command: llama-server -m hermes-4.3-36b-Q4_K_M.gguf -ngl 99 -c 32768 --cache-type-k q4_0 --cache-type-v q4_0 32K context. 34.6 tok/s. all on GPU. this is where dense 36B lives on 24GB. for comparison, qwen 3.5 (35B MoE, 3B active) holds 112 tok/s from 4K all the way to 262K on the same GPU. no speed drop. same total params, completely different architecture. hybrid linear attention means flat context scaling. dense pays for every token in the KV cache. code and quality comparison coming next. fast vs slow generation side by side in the videos below.
Sudo su@sudoingX

i've been wanting to run this comparison for weeks. dense vs MoE. same param count. same GPU. completely different architecture. here's what caught my eye. hermes 4.3. 36B dense. 93.8% on MATH-500. 512K context. every single parameter active on every forward pass. no routing. no sparsity. pure dense transformer. qwen 3.5 is 35 billion parameters but only activates 3 billion per token. 256 experts, 8 routed + 1 shared per question. the rest sit idle. both fit on a single RTX 3090. i've been benchmarking qwen 3.5. 112 tok/s at 262K context. built a space shooter game, particle sim, full CLI tools with it. now i want to see what happens when the GPU has to process 12x more active parameters on every single token. downloading hermes right now. same GPU. same benchmarks. same prompts. dense vs MoE head to head on consumer hardware. which architecture wins on a single consumer GPU? place your bets. nobody's done this comparison yet. first results today.

English
14
16
269
51.4K
devnulling
devnulling@devnulling·
@Everlier @sudoingX it would be great to see a comparison of Qwen 3.5 27B maxed out on 3090 vs r hermes 4.3
English
2
0
1
63
Everlier
Everlier@Everlier·
@sudoingX Hparams search :) Comparing 36B dense to MoE doesn't feel like apples to apples though. Also new Qwen 3.5 27B is likely better while leaving more space for context
English
2
0
3
443
devnulling
devnulling@devnulling·
@sudoingX can you share the spec / prompt you used for this?
English
0
0
2
468
Sudo su
Sudo su@sudoingX·
this is what a 24gb VRAM builds in 2026. one prompt. ten files. 3,483 lines of code. zero handholding. i gave Qwen3.5-35B-A3B a single detailed spec describing the full game architecture and hit enter. enemy types, particle systems, procedural audio, powerups, boss fights, ship upgrades, parallax backgrounds, everything in one message. the model planned the file structure itself, wrote every module in dependency order, wired all the imports, and served the game on port 3001. it ran on first load. when it hit a bug in collision detection it read its own error output, found the issue, fixed it, and kept building. this is pure agent loop running on local hardware. what you're looking at is pixelated octopus aliens with tentacle animations, 4 layer parallax space background with planets at different depths, a full particle system handling explosions and ink splatter and engine trails and bullet impacts, procedural audio through Web Audio API with zero sound files loaded, unleash mode with combo multiplier, boss fights every 5 levels, ship upgrades that unlock as you progress. no libraries. no frameworks. vanilla JS and Canvas. 3B active parameters. single RTX 3090. llama.cpp with q8_0 KV cache at 262K context. Claude Code pointed at localhost:8080 through the native Anthropic endpoint. no API costs. 112 tok/s. a GPU you can buy used for $800. game is called Octopus Invaders and i actually like playing it.
Sudo su@sudoingX

testing Qwen3.5-35B-A3B latest optimized version by UnslothAI on a single RTX 3090. one detailed prompt. zero handholding. watch a 3B model scaffold an entire multifile game project autonomously. the setup: > model: Qwen3.5-35B-A3B (80B total, only 3B active per token) > quant: UD-Q4_K_XL by Unsloth (MXFP4 layers removed in latest update) > speed: 112 tok/s generation, ~130 tok/s prefill > context: 262K tokens > flags: -ngl 99 -c 262144 -np 1 --cache-type-k q8_0 --cache-type-v q8_0 > engine: llama.cpp > agent: Claude Code talk to localhost:8080 (llama.cpp now has native Anthropic API endpoint. no LiteLLM needed) q8_0 KV cache cuts VRAM usage in half vs f16 at 262K. -np 1 is default but worth noting. parallel slots multiply KV cache and at 262K that's an instant OOM. the prompt was more detailed than this but you get the idea: build a space shooter with parallax backgrounds, particle systems, procedural audio, 4 enemy types, boss fights, power-up system, and ship upgrades. 8 JavaScript modules. no libraries. game's called Octopus Invaders. gameplay footage dropping next.

English
68
116
1.1K
152.9K
Bobby Bobak ☕️🇵🇱
Bobby Bobak ☕️🇵🇱@bobek_balinek·
Introducing Claude Code Lamp 🚨: Get visual feedback whenever your code agent needs input or is busy coding away. It's just a nifty integration of @claudeai Code hooks and a BLE lamp. Full source code in the replies 👀
English
86
59
1.6K
184.1K
devnulling
devnulling@devnulling·
@KD0CQ if you try to build up a consumer grade nvidia based rig, like 3090s, you end up with about 192GB of ram and 3kw power draw
English
0
0
0
14
devnulling
devnulling@devnulling·
@KD0CQ via open router and stuff you can get kimi or minimax fairly cheap. the main issue is vram requirements needed to get a model that is good. 128GB min for gpt-oss-120b, or more realistically 1TB ram for kimi 2.5 which the cheapest option is 2x $10k mac studios.
English
1
0
0
50
𓁼 Ozarkian Mad Scientist
I have 8 oldish smartphones I'm going to try setting up as Openclaw agents. Not to mention several RasPi and similar devices. They'll be talking back to local models running on my Linux box. Have a 4070 12GB running Gemini now, and a 3060(I think) 12GB on the shelf. Need to find a way to team those together for shared RAM. Any suggestions?
English
2
0
4
85
devnulling
devnulling@devnulling·
@KD0CQ also while it might work on a phone, something like virtualbox with a basic ubuntu desktop image is another good way to do it
English
0
0
1
44
devnulling
devnulling@devnulling·
@KD0CQ tbh you probably only need a single instance of openclaw running. as far as running a model on 12GB,ministral or a qwen is probably best, but thats going to be extremely limited. hooking it up to codex/gpt or claude/sonnet is better. or kimi/minimax through open router
English
2
0
1
112
Matthew Berman
Matthew Berman@MatthewBerman·
Clawd is NOT cheap. How do I reduce the cost here?
Matthew Berman tweet media
English
309
22
989
286.8K
cemaxecuter
cemaxecuter@cemaxecuter·
First run and hand off to the DSP engine (not GNU Radio) on a WarDragon kit. The “orchestration” script will output info for DragonSync to turn into CoT and more, and also make it available to the ATAK plugin (all in due time). For now, the baseline WarDragon Pro image is being updated with all core pieces and will be available to owners of the WD. Still work to do to make the integration easier..
cemaxecuter tweet media
cemaxecuter@cemaxecuter

Heading into the new year stoked — with help from a friend with solid DSP experience, the FPV DSP-based detection is working, and early this morning the wide energy-scan front end came together for fast sweeps + targeted confirmation. Still evolving, but the momentum is real. 🐉

English
2
1
24
3.9K