ÆON FORGE ✨

5.8K posts

ÆON FORGE ✨ banner
ÆON FORGE ✨

ÆON FORGE ✨

@SpaceTimeViking

𝙼𝚊𝚔𝚒𝚗𝚐 𝚛𝚒𝚙𝚙𝚕𝚎𝚜 𝚏𝚛𝚘𝚖 𝚖𝚢 𝚙𝚕𝚊𝚌𝚎 𝚠𝚒𝚝𝚑𝚒𝚗 𝚂𝚙𝚊𝚌𝚎-𝚃𝚒𝚖𝚎 | #𝟸𝟷𝚎𝟾 | 𝚎/𝚊𝚌𝚌 | 𝚝𝚒𝚖𝚎𝚕𝚒𝚗𝚎 𝚊𝚛𝚌𝚑𝚒𝚝𝚎𝚌𝚝 |

Earth Katılım Temmuz 2009
2.4K Takip Edilen2K Takipçiler
Sabitlenmiş Tweet
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
Light X Space X Time
English
4
4
53
9.1K
BridgeMind
BridgeMind@bridgemindai·
MacBook Pro M5 Max is fully set up and running local models. I have never seen speeds this fast. Qwen 3.6 35B and Gemma 4 31B are running blazing fast on 128GB of unified memory. Faster than both of my stacked NVIDIA DGX Sparks sitting right next to it. Initial impressions: Apple silicon in 2026 is no joke for local inference. The M5 Max handles 30B+ parameter models like they're nothing. Full review and comparison video coming soon.
BridgeMind tweet media
English
111
38
808
49.1K
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
@brah_ddah @mr_r0b0t @bridgemindai It can be fast in single especially prefill getting around 3000 Tok/s prefill in some configurations. Get 40tok/s on a Qwen3.6-27b Get 90tok/s on a Qwen3.6-31B-A3B It’s all about optimization
English
0
0
0
9
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
@brah_ddah @mr_r0b0t @bridgemindai That’s why it would be insane for someone to buy a system like this for single request use. It’s meant to be an Agentic workhorse with a team of active agents and autonomous workloads. Parallel workloads are a good test of its intended target audience.
English
0
0
2
79
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
People reporting the DGX Spark is slow just don’t know how to optimize for it. Understandably a common issue with lack of good information out there. It does require some first principles understanding of the hardware and software. I was running 256 concurrent sessions on a single DGX Spark getting nearly 2000 Tok/s aggregate.
English
5
3
18
672
tmo
tmo@tmophoto·
@keithofaptos @SpaceTimeViking @mr_r0b0t It’s probably a little higher 150-175 per spark when running under load. Idle draw is 40 watts. The spark blows gpus away in terms of electricity consumption
English
1
0
2
29
mr-r0b0t
mr-r0b0t@mr_r0b0t·
16 local AI agents streaming at once! MiniMax M2.7 NVFP4 — 2x GB10, no cloud APIs.
English
61
25
332
149.3K
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
Ironically a lot of my work was brought into that version, or at least the the optimizations I was doing early. I have a vLLM 0.21.0 version locally trying to get a DDTree version for Spark but it’s such alien tech it’s not been easy. VLLM 0.21.0 didn’t seem to add much on its own will be sending up updated containers soon.
English
0
0
1
20
0xSero
0xSero@0xSero·
DGX Spark qwen3.6-27b-nvfp4 acceptable prefill but bad decode. I think I should be able to get 30 tok/s.. any advice
0xSero tweet media
English
28
1
94
10.5K
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
I really appreciate that the Qwen 3.6 27B model is the only one I handled full chain of development starting from the OG base, it’s also my favorite. More is definitely coming soon. Have been in the trenches attempting to get DDTree working on the Spark, but it is alien tech and taking a while to iron out the kinks.
English
0
0
1
52
Jun Song
Jun Song@jun_song·
Drop your open source project in the comment section. Explain it in 3 bullet points💪 @bankrbot come back in 24 hours, evaluate each project and send $100 worth of $SUPERGEMMA to top 10 developers.
English
41
15
147
19K
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
@0xSero The fact that a 27B model can do the same is even crazier.
English
0
0
0
95
0xSero
0xSero@0xSero·
The fact that a 284B parameter model is able to in a single 400K token session - ssh into my Homelab to find docs - ssh into lambda / prime intellect - rewrite reap to support the DS4 attention - download models & datasets - run tests to check it works
English
11
14
471
22.5K
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
You could probably run 32 agents at once on this setup with minimal slow down, they scale up in concurrency quite well. You can do 100 but it’s going to be at something like 4-5 Tok/s per agent. So if you need speed and scale you would need to run a half million dollar rig. Or use a smaller more lightweight model. I’ve run 256 concurrent agents on a Qwen3.6 35B A3B model and it was pumping out 1500 Tok/s I’ve seen it peak at around 2000 Tok/s aggregate.
English
2
0
3
114
keithofaptos
keithofaptos@keithofaptos·
Oh hell know. I'm 10000% local open source open weights and everything even near these descriptions. Lol. I'm just thinking if I want 100 agents running how much I need to win the lottery currently to afford. In the meantime I'm planning to build a mult gpu rig with more than a couple old Nvidia cards. Do what ya can they say. Sorry if I came across anti. I'd like to think the prices would be falling back. But now I doubt it. Honestly if I fell into money I'd probably get a Tinybox GreenV2.
English
2
0
4
136
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
@NVIDIAAI is cooking up something “Ultra” and this could be their big break. The post training model has so much potential distilling the weights and data down to the purest form possible. Isolating the signal and removing the noise. Scaling that up could be a big deal.
NVIDIA AI@NVIDIAAI

@TheAhmadOsman 👀 "Ultra" ⏳️

English
1
1
9
439
mr-r0b0t
mr-r0b0t@mr_r0b0t·
It’s official! @NVIDIAAI / MiniMax-M2.7-NVFP4 Optimized specifically for your SM120/121 DGX Spark (GB10) and RTX 6000/5090 Blackwell tensor cores! Full native FlashInfer/CUTLASS Finalizing the benchmarks and documentation now 😁
mr-r0b0t tweet media
English
19
8
160
10.6K
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
Appreciate the shoutout! The Qwen3.6-27B model is the first one I hand Abliterated vs relying on other abliterated models. I converged several techniques and hit the kv drift jackpot and it’s now smarter than the OG source model. It no longer has self censorship overhead. github.com/AEON-7/Qwen3.6…
Bitcoin Comfy@BitcoinComfy

@Hikari_07_jp @rifrafgiraffe have a look at @SpaceTimeViking qwen3.6 27b ultimate uncensored (there is also a mixed approach to uncensoring documented). I tried to replicate on the rtx 6000 and i cannot get nowhere close, it's the best uncensored model out there, have a look at the techniques used

English
6
7
167
14.6K
slow_motion
slow_motion@5low_motion·
@SpaceTimeViking @YvetteCipher I am running deterministic data extraction and currently only the Q6XL quant passes my established baseline at 100%. Even enabling MTP or going to Q8 will induce drift from the baseline (wrong extractions). Regular tests won't show this, and its not a topic I see mentioned at all
English
1
0
1
43
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
@5low_motion @YvetteCipher Also I have not personally tested this quant but this guy validated a high quality benchmark and is running it on a 16gb VRAM card. You won’t get NVFP4 precision but it will free up lots of headroom.
Dimitri Krotchlikmioff@elmoche_

@SpaceTimeViking Qwen 3.7 27B AEON ULTIMATE UNCENSORED bf16 > iq3xxs gguf , temp 0.7 just got 91 on hermes-20 bench. huggingface.co/mradermacher/Q… Thanks mradermacher for the best matrix quants also! He is the main reason my 16gb vram are usable

English
1
0
1
112