A.K.A CS

723 posts

A.K.A CS

@decapostos

The Candlestick Zone

Dubai, United Arab Emirates Katılım Kasım 2010

498 Takip Edilen72 Takipçiler

A.K.A CS@decapostos·6h

@AtlasInference @AMD Going to join now!

English

Azeez@AtlasInference·7h

We're way faster than that. MTP-enabled with Dflash rolling out, check out the discord thread for more details!

English

Azeez@AtlasInference·3d

🚀 Huge thanks to @AMD for sending @AtlasInference a Strix Halo laptop! Excited to squeeze every last drop of compute out of it. Our goal is staying community-first with the simplest stack possible, ROCm here we come 🔥 Join our Discord for early access and help shape what we build next. What should we tackle first? 👇

English

531

A.K.A CS@decapostos·8h

@AtlasInference @AMD I would really like to know what the t/s is for Qwen 3.6 27B Q8 with @AtlasInference Im running it know on llama.cpp and only getting 11 t/s. Same for vLLM. I have a Asus Ascent GX10 so basically the same as the DGX Spark.

English

Azeez@AtlasInference·3d

@AMD If you aren't familiar with Atlas, we're building a dependency free inference engine just purely Rust and CUDA. <2 min cold start and blazing fast on models like Qwen-3.6-27B on DGX Spark 💻github.com/Avarok-Cyberse… Discord linked below🔔discord.com/invite/DwF3brB…

English

319

A.K.A CS@decapostos·1d

@sudoingX Is it better than Qwen3.6 27B Q8?

English

Sudo su@sudoingX·1d

nemotron 3 nano omni-30B reasoning at Q8 running autonomously on my dgx spark right now. 58 tok/s. 1 million context. multimodal. hermes agent is using it to research xAI's new grok algorithm that dropped yesterday. pulling repos. scanning code. breaking it down. all while i post this. 30B model. 58 tokens per second. 1M context. reads images and video. locally. for free. nobody is talking about this model and that's insane.

English

198

9.3K

A.K.A CS@decapostos·3d

@s_him88 I have a Asus Ascent GX10, basically a DGX Spark. ComfyUI + Video generation is not that good. Dense models ( 273 bandwith )arent very good on the DGX Spark ( we are talking 10 t/s for Qwen3.6 27B Q8 ) compared to MOE models that get way better t/s.

English

156

Solar Studio@s_him88·4d

Thinking seriously about buying an NVIDIA DGX Spark. Talk me into it — or talk me out of it. Use case: • Run Hermes agent • ComfyUI image + video generation • AI ad production for Solar Studio__ • Pair it with my M4 Mac mini 24GB RAM using EXO I want the ugly truth. If you’ve used one: What was amazing? What was disappointing? Was it worth the money? Would you buy it again? Owners, builders, AI devs — give me the real answer.

English

473

A.K.A CS@decapostos·3d

@HermesAgentTips Asus Ascent GX10 basically a DGX Spark. Running Qwen3.6 27B Q8 on llama.cpp, the better choice is a MOE model but i prefer quality over speed.

English

124

Hermes Agent Tips@HermesAgentTips·4d

local LLM people: what are you actually running right now? everyone talks like they have a DGX Spark under the desk, but I’m curious what the real setups look like DGX Spark 128GB unified memory? RTX 6000? RTX 5090 32GB? RTX 3090 24GB? MacBook Pro? Mac Studio M3 Ultra, if you somehow found one? or are you running something completely different that people are sleeping on?

English

167

130

18.7K

A.K.A CS@decapostos·4d

@sudoingX But is it better than Qwen 3.6 27B Dense

English

332

Sudo su@sudoingX·4d

nobody is talking about how good nemotron 3 nano omni 30b-a3b actually is on local. very underrated. multimodal, reasoning, video understanding, image vision, all shipped in one open source release by nvidia. moe architecture 30b total params, 3b active per token, q8 is near lossless and fits comfortably on a single dgx spark with room to breathe. i have been running it for weeks now and the gap between what this model can do and what the conversation says is wide. nvidia is pushing hard on the open-source front. most builders haven't noticed yet because the discourse is locked on closed-source frontier benchmarks and the next viral chart. meanwhile this thing handles agentic loops, processes video inputs, reasons across image context, and stays responsive on consumer tier unified memory hardware. on dgx spark it flies. more content coming, showing all the modalities in action. if you have used it, what is your experience. drop your stack and your findings, curious what other builders are seeing across hardware tiers.

English

242

15K

A.K.A CS@decapostos·4 May

@witcheer Well actually, i was all night up. At the moment im getting 23 t/s with vLLM + Dflash using Qwen3.6-27B-FP8. This is actually quite usable now. I still need some tweaking and tuning, because some people getting 30/40 t/s with this setup.

English

A.K.A CS@decapostos·3 May

@witcheer Im running vLLM Qwen36-35B-A3B with dflash im getting over 150 t/s on a Asus Ascent GX10 basically a DGX Spark. The same GX10 with llama.cpp Qwen3.6-27B Q8 dense model im getting max 10 t/s

English

211

witcheer ☯︎@witcheer·2 May

is a NVIDIA DGX Spark the way to go local? what is the best between 1-2 RTX 3090 vs a DGX Spark? why does I have so many questions?

English

11.7K

A.K.A CS@decapostos·3 May

@witcheer If you have a DGX Spark or any of those GB10 models like asus ascent gx10 MOE is the way to go. Dense models are very slow. Easy as that.

English

A.K.A CS@decapostos·2 May

@sudoingX Could you please share your llama.cpp built for the qwen3.6-27B? I have a asus ascent gx10 gb10 basically a dgx spark. Im running the q8 version but i only get 8 t/s speed.

English

772

Sudo su@sudoingX·2 May

a week with the dgx spark, here is what is on it and what i have measured so far. nobody is really talking about this machine and it is quietly becoming the workhorse of my whole stack. hardware: nvidia gb10 sm_121, 124 gb unified lpddr5x at 273 gb/s, cuda 13.0 models on disk (305 gb total, 9 ggufs): > qwen 3.6 27b q4_k_m / q5_k_m / q8_0 / ud-q4_k_xl > nemotron 3 omni 30b-a3b q4_k_m / q8_0 / ud-q6_k / ud-q6_k_xl > deepseek v4-flash 158b q4_k_m (112 gb, flagship 128gb-tier test) terminal + shell environment: > zsh + oh-my-zsh + powerlevel10k theme > modern cli stack: bat, eza, ripgrep, fd, git-delta, tldr, neovim, fzf, autojump > 6 tmux sessions actively running for parallel agent work ml + agent stack: > llama.cpp built sm_121 against cuda 13 > uv + venv ml stack with pytorch 2.11.0+cu130 (aarch64) + transformers + diffusers + accelerate > hermes agent v0.11 with codex auth bridge > opencode for free-model overnight research > telegram gateway routing to nemotron q8 right now speeds verified so far: - nemotron 30b-a3b q8: 56 tok/s gen, 1,300 tok/s prefill, 96% gpu, 33gb in unified - qwen 27b dense q4: 40 tok/s consistent 90+ gb of unified memory still free. deepseek v4-flash 158b loading next as the real flagship test, multimodal omni testing once mmproj pulls, comfyui install in flight for the diffusion lane. honestly curious what the actual limit is on this box, i have not hit it yet.

English

455

64.9K

A.K.A CS@decapostos·2 May

@spiritbuun Im on a dgx spark. Someone can help me please? I just want to run qwen3.6-27B Q8 on more than 8.3 t/s 😭. Does this work for GB10? I tried but i cant seem to make it work.

English

buun@spiritbuun·26 Nis

Now that the dust has settled, is DFlash real or hype? Today I was able to break the 200 tok/s barrier with 27B on a single RTX 3090: run 1: 194.6 t/s, accept=155/180 run 2: 205.1 t/s, accept=155/180 run 3: 206.3 t/s, accept=155/180 But what about real world usage? Well,

English

213

33.4K

A.K.A CS@decapostos·30 Nis

@fahdmirza Anybody knows how to disable thinking in this build?

English

314

Fahd Mirza@fahdmirza·30 Nis

💥 Luce DFlash just changed local AI inference 🚀 130 tok/s on a single GPU — no vLLM, no llama.cpp, no compromises 🔹 27B model on 24GB VRAM 🔹 3.4x faster than standard autoregressive decoding 🔹 Raw C++ binary — zero Python in the engine 🔹 Speculative decoding with a tiny draft model doing the heavy lifting 🔹 128K context on consumer hardware 🔥 Full step-by-step demo below 👇

English

325

29.2K

A.K.A CS@decapostos·30 Nis

@fahdmirza Absolute fire 🔥 I have a Asus Ascent GB10 basically a DGX Spark. Running Qwen3.6-27B Q5 GGUF was giving me MAX 11 t/s. This setup gives me 56.58 t/s. I was almost going back to Qwen3.6-35B-A3B for the speed, now i dont need to anymore!

English

489

A.K.A CS@decapostos·28 Nis

@mercury__agent I have send you a screenshot in dm

English

Mercury@mercury__agent·28 Nis

@decapostos Hey, sorry to hear about this. Can you share the Mercury version, error dump or behaviour in DM? Thanks.

English

Mercury@mercury__agent·26 Nis

Mercury has been trending at the top all week in total stars, and it’s only been a week since our official release. Thank you all for the support. Our community is growing fast, and this is just the beginning. We’re building agentic AI that feels more personal, more soulful, token-efficient, and fully in your control.

Tapan Sharma@tapansharma04

📈 Rising #All AI/ML #GitHub Trending Repos(This Week) 1. cosmicstack-labs/mercury-agent ⭐1.2K 2. GammaLabTechnologies/harmonist ⭐646 3. earthtojake/text-to-cad ⭐509 4. tashfeenahmed/freellmapi ⭐465 5. future-agi/future-agi ⭐451 6. levelsio/superlevels ⭐409 7. alash3al/stash ⭐249 8. epoko77-ai/im-not-ai ⭐245 9. dezgit2025/auto-memory ⭐219 10. muxprotocol/kalshi-trading-bot ⭐205 Discover more trending AI/ML repos 👇 dataaihub.co/github

English

12.4K

A.K.A CS@decapostos·23 Nis

@SpaceTimeViking What I got exactly compared to 65 t/s with llama.cpp: Benchmark (5 runs): Run 1: 1000 tokens in 6.19s = 161.6 t/s Run 2: 1000 tokens in 7.97s = 125.5 t/s Run 3: 1000 tokens in 9.13s = 109.5 t/s Run 4: 1000 tokens in 6.48s = 154.4 t/s Run 5: 1000 tokens in 6.13s = 163.2 t/s

English

ÆON FORGE ✨@SpaceTimeViking·20 Nis

Finally! got this model working with DFlash and quantized to the most optimized quant for the DGX Spark NVFP4 also best for any NVIDIA Blackwell or later GPU. Experience even better performance than the base model, no censorship, and full control! huggingface.co/AEON-7/Qwen3.6…

English

6.2K

A.K.A CS@decapostos·23 Nis

@SpaceTimeViking I have tried to set this up, I get around 100 t/s but it's eating up almost 100 gb ram. am I doing something wrong?

English

A.K.A CS@decapostos·21 Nis

@HeyGen Website

English

HeyGen@HeyGen·20 Nis

An easter egg skill we hid in hyperframes: /website-to-hyperframes - create DESIGN.md - screenshot the page - download assets - build logo animations + more we hope to support the launch of anyone's businesses RT + comment "Website" for codebase access (must follow)

English

572

353

1.3K

296.3K

A.K.A CS@decapostos·15 Nis

@dealignai You think you can do this for dgx spark? I would love to try it out.

English

dealign.ai@dealignai·14 Nis

MiniMax m2.7, 56gb. Full speeds, near lossless quality. Welcome JANGTQ. Mac's only. huggingface.co/JANGQ-AI/MiniM…

English

500

44.4K

A.K.A CS@decapostos·14 Nis

@leopardracer I think it’s cheaper running openclaw on raspberry’s with a Minimax subscription of 10 dollar a month each. Energy consumption on raspberry is superlow. A 4GB raspberry is enough to run openclaw.

English

468

leopardracer@leopardracer·14 Nis

I wrote one article. Someone built a data center. No GPUs. Just $599 boxes. Jensen Huang is having a bad year. The new AI data center doesn’t need NVIDIA. It needs a Costco membership and a good Wi-Fi router.

leopardracer@leopardracer

x.com/i/article/2043…

English

701

232K

Keşfet

@AtlasInference @AMD @sudoingX @s_him88 @HermesAgentTips @witcheer @spiritbuun @elonmusk