John Song

335 posts

John Song

@JJJOOOHN

Katılım Kasım 2012

898 Takip Edilen57 Takipçiler

John Song@JJJOOOHN·1h

@dongxi_nlp 取决于你的业务对速度要求不高那可以买毕竟大容量128GB 足够管饱

中文

马东锡 NLP@dongxi_nlp·1h

有没有朋友了解 Nvidia DGX Spark 它的 ML 生态支持成熟了么？值不值得买？

中文

1.9K

John Song@JJJOOOHN·4h

@9hills 质量如何呢用户还是关注结果好坏呢

中文

九原客@9hills·7h

用 ClawBench 的Smoke Test 20道题测试了下 Qwen 3.6 量化+MTP+4090 的效果，全部200k上下文，量化参数不同是保持公平对比以填满显卡。deepseek-v4-flash 作为对比。可以看到 35B-A3B 以 237 tps的速度独步江湖，这个速度太快了。注：因llama.cpp的问题，两个本地模型的输入tokens统计错误。

中文

2.8K

John Song@JJJOOOHN·4h

@populartourist I am more excited for the open source 3.7!

English

149

wd 🔺@populartourist·7h

Qwen3.7 spotted Can't wait for Qwen3.7 1B to beat Opus 4.8

English

173

14.9K

John Song@JJJOOOHN·1d

@Q_May_007 开启星球大战计划

中文

1.1K

QMAY@Q_May_007·1d

👽川普总统刚刚在他的社交平台发了这些图片！

中文

160

369

33.5K

John Song@JJJOOOHN·1d

@AtlasInference @spark_arena How about 27B? How many token per second at decoding

English

317

Azeez@AtlasInference·2d

DGX Spark just benched 200+ tok/s for Qwen3.6-35B with @AtlasInference on @spark_arena 🔥 How's that possible? Providers like Codex and Claude get ~60. Other major engines don't come close 🦥 We haven't seen speeds like this on GB10. NO ONE HAS. Atlas is shattering records 🚀

English

132

50.4K

John Song@JJJOOOHN·1d

@populartourist GPT oss is not as smart as Qwen

Indonesia

231

wd 🔺@populartourist·2d

Qwen3.6 27B and 35B-A3B are amazing models, but nothing reaches the efficiency of GPT-OSS yet. Qwen3.6 35B-A3B is as fast as GPT-OSS-20B but nowhere near the prefill performance.

English

20.6K

John Song@JJJOOOHN·3d

@0xSero it is more expensive than last month!

English

128

0xSero@0xSero·3d

I am buying a DGX Spark today. Rejoice, I'm going to make the Spark competitive.

English

525

25.1K

John Song@JJJOOOHN·3d

@1337hero yes use api key is faster

English

Mike Key@1337hero·3d

Why is downloading form Hugging Face so painfully slow?

English

2.5K

John Song@JJJOOOHN·3d

@takayan660 True. With this requirement, yes dgx spark is a better option.

English

たかやん@takayan660·3d

@JJJOOOHN Haha, Thor is tempting too, but for my use case I wanted something more like a compact local AI workstation rather than an embedded/robotics platform.

English

232

たかやん@takayan660·4d

DGX Spark買っちゃった

日本語

644

55.5K

John Song@JJJOOOHN·3d

@1337hero My cases are more complicated. Tables, formulas etc

English

Mike Key@1337hero·3d

@JJJOOOHN That's pretty dope. I have a pretty solid OCR workflow - but luckily that's mostly just for being paperless.

English

Mike Key@1337hero·4d

Spent $3998.98 total to have 96gb of VRAM using AMD's AI Pro R9700 Cards. (brand new) Comparatively I had spent $1520.00 on two used RX 7900 XTX's for 48gb of VRAM. If ur team RED, a single XTX is CHEAPER than a RTX 3090. Should I have bought a Mac or DGX Spark instead?

English

116

15.9K

John Song@JJJOOOHN·3d

@1337hero I run Qwen 3.6 27b at dgx spark and use it as OCR tool and AI assistant.

English

102

Mike Key@1337hero·3d

@JJJOOOHN In theory you can do stuff like this: github.com/vosen/ZLUDA What CUDA based projects are you running?

English

235

John Song@JJJOOOHN·4d

@Q_May_007 @kimi1383987 七哥以前说过中美关系突然变好的时候，就是共产党要遭受巨大打击的时候

中文

101

QMAY@Q_May_007·4d

现在：“这是一项莫大的荣幸。今天真是美好的一天。 “我要感谢我的朋友习主席给予如此盛大的欢迎。” “这确实是一场无与伦比的盛大欢迎。而且您如此优雅地接待我们进行这次具有历史意义的国事访问。” “今晚是我们朋友之间又一次珍贵的交流机会，讨论今天所谈的一些事情。这一切对美国和中国都有益。而且能与您在一起真是莫大的荣幸。”——川普总统

中文

8.3K

John Song@JJJOOOHN·4d

@Q_May_007 总统的健康状况影响事件走向

中文

128

QMAY@Q_May_007·4d

总统抵达北京天坛上楼梯很稳这次访问尽看上下楼梯稳不稳了🙂

中文

108

13.1K

John Song@JJJOOOHN·6d

@CuiMao 你让phd的老脸放在哪里啪啪打脸

中文

CuiMao@CuiMao·6d

写了好长好长的一篇的文章，删掉了，不发了，不如多干点实事。写文章会写上瘾的，最后一事无成。

中文

13.9K

John Song@JJJOOOHN·6d

@sudoingX Why not 35b? Similar performance but 9 times faster

English

Sudo su@sudoingX·11 May

i declare qwen 3.6 27b dense q4 the king of a single rtx 3090 card. not even close. this model is absolute beast on local ai, ruthless on agentic loops, owns its own thinking. anyone can use it on single 3090, the weights are open, the stack is reproducible, the prompt is canonical, every claim below is verifiable on your own hardware. the octopus invaders one shot you are seeing is the visible test. i run these models on workloads you wouldn't think to ask for and i couldn't show you if i wanted to, and qwen 3.6 27b dense q4 quietly does the heavy lifting on a single consumer card while the rest of the field is busy explaining why it cannot. if you think a different model is king on a single 3090 right now, name it. drop your card, drop your model, drop your numbers. the throne is not crowded.

Sudo su@sudoingX

update: qwen 3.6 27b dense q4 just one shotted octopus invaders game on a single 3090. hermes agent drove the whole thing, ~41 tok/s gen 21gb vram at full 262k context, thinking mode on. one prompt in and the canonical multi-file space shooter benchmark out, the same exact prompt i ran on qwen 3.5 27b dense back in march on the same card. 3.5 needed one external scope bug fix before the game would even load on first play. 3.6 needed nothing. 11 of 11 files written, 2411 lines of code, zero steering interventions, zero external fixes, playable on first load. 16 minutes 41 seconds wall clock from prompt to playable. consumer tier king on a single 3090 is locked tonight, and the silicon underneath my desk did not change between march and now. the open source ecosystem just moved the floor. watch it ship itself, the full 16 minutes 41 seconds sped to 3 minutes 45, no human touched the keyboard between the first prompt and the final frame.

English

494

40.6K

John Song@JJJOOOHN·6d

@sudoingX why not 3.6 35B? It is almost 9 times faster and provide a similar performance with 27B

English

Sudo su@sudoingX·11 May

this is what my setup looks like today. about to test qwen 3.6 27b dense q4 on a single rtx 3090 at ~41 tok/s gen, hermes agent driving. predecessor model qwen 3.5 dense q4 made it work in one iteration when i ran the same agentic build on the same card. i've been daily driving qwen 3.6 27b dense for weeks now, the model i keep coming back to. if 3.6 oneshots too, this becomes the best model that runs on a single rtx 3090. consumer tier king. firing the test now will report back soon.

English

270

81.3K

John Song@JJJOOOHN·12 May

@CuiMao 确实如此光看速度没有质量不会有人用的

中文

138

CuiMao@CuiMao·11 May

为什么都在比本地推理速度啊，就像一场毫无意义的雌竞，Dflash也就那样，输出的质量因为无法展示和量化，所有无人在意，越来越跑偏了。还是要自己跑一跑真实环境才知道

中文

12.8K

John Song@JJJOOOHN·11 May

@aijoey @SpaceTimeViking @nvidia Thank you and hope to see it soon!

English

101

Joey@aijoey·11 May

@JJJOOOHN @SpaceTimeViking @nvidia that's literally what i'm recording right now lol. i will post soon.

English

368

Joey@aijoey·11 May

testing tonight on dgx spark: Qwen3.6-27B AEON Ultimate Uncensored DFlash on vLLM. container: ghcr.io/aeon-7/vllm-ae… @SpaceTimeViking DFlash SWA overlay + FlashInfer 0.6.11 cu130. curious how it behaves locally on GB10. github.com/aeon-7/Qwen3.6…

English

120

7.8K

Keşfet

@dongxi_nlp @9hills @populartourist @Q_May_007 @AtlasInference @spark_arena @0xSero @1337hero