AgentSparko 💥

4K posts

AgentSparko 💥

@AgentSparko

#AI #Cybersecurity #Linux #privacy If you own a DGX Spark you might wanna fallow.

Middle of the GPU 加入时间 Ocak 2023

1.4K 关注2K 粉丝

置顶推文

AgentSparko 💥@AgentSparko·31 Mar

For anyone saying DGX Spark cannot cook. Generating data sets for distilling using Qwen3.5-35B-A3B BF16 !!! (no quants) real data, 0% cache hit, concurrency=192 ; pp=2048 tokens in ; tq=1024 tokens out that`s 1.43M tokens generated every hour for the last 8 hours for 40 W/h.😎

English

5.2K

AgentSparko 💥 已转推

How To Prompt@HowToPrompt__·16h

Researchers show that Claude Code is 98% not AI. Anthropic never gave us the architecture for Claude Code. There were no docs. Just a tool that every developer is currently obsessing over. Until it leaked recently. A research team pulled the source code, analyzed all 500,000 lines, and found something ridiculous. Only 1.6% of the codebase actually interacts with the AI model. The core of Claude Code is literally just a simple while-loop. It asks the model what to do, runs a tool, and repeats. So what is the other 98.4%? It is hardcore, traditional software engineering. The researchers found a massive, complex infrastructure designed entirely to babysit the AI and keep it from hallucinating or destroying your computer: - A 7-mode permission system acting as a security bouncer. - A 5-layer context compaction pipeline so the AI doesn't forget its goal. - A subagent delegation mechanism with strict worktree isolation. - Four different extensibility hooks to manage external tools safely. Every startup right now is trying to build a better AI model to get better results. Anthropic did the exact opposite. They took an existing model and built a fortress of deterministic software around it. They realized that the AI doesn't need to be smarter. It needs to be managed.

English

150

761

68.3K

AgentSparko 💥 已转推

Steeve Morin@steeve·3h

Congratulations guys! That's built in Germany, btw. Yeah, the Germany in Europe. kthxbye.

Tensordyne@TensordyneInc

x.com/i/article/2066…

English

6.9K

AgentSparko 💥 已转推

Tech2Wild@Tech2Wild·1h

✅ Repo pushed — all updates are live. Commit eb12c02 on github.com/tonyd2wild/min…: • Phase 3 (RoCE) flipped from "WIP / err-110 blocked" → "SOLVED 2026-06-15" with the full recipe • Both fixes documented: NCCL v2.30u1 from source (Fix 1) + the baked-LD_PRELOAD shim override (Fix 2, the non-obvious one) with the exact env block + FORCED_NCCL_VERSION 23007 verification • The cold-power-drain bandwidth finding (12.8 → 111.85 Gb/s, credited mashie) • Honest RESULTS block (~10.5 t/s single-stream, +75% over 1GbE, compute-bound past ~13 Gb/s, concurrency caveat, eagle3 +25% stacks) • The real patched m3vllm-roce.sh committed (with the LD_PRELOAD fix), credits updated (eugr + mashie + the ChatGPT debug pass) • Zero em dashes, all numbers accurate to what we measured So anyone hitting err-110 or the 12.8 cap now has the answer. The 200K M3 is still finishing its boot — watcher will confirm it's serving clean, then we're fully wrapped on this.

English

202

AgentSparko 💥 已转推

Charles Curran@charliebcurran·22h

I used AI to explain the Anthropic drama to my girlfriend, with fruit.

English

288

502

7.7K

1.1M

AgentSparko 💥@AgentSparko·3h

@sudoingX AMD is actually more expensive than Spark if you get a Spark OEM like Asus GX10 and you also have high speed connectivity for clustering, CUDA and software compatibility. Also forcing the test on llama.cpp and GGUF only is not peak performance or quality for NVIDIA.

English

580

Sudo su@sudoingX·4h

nvidia vs amd two boxes on my desk, both 128gb of unified memory. one is the nvidia dgx spark ($4,699). the other is the amd strix halo ($1,999), amd at roughly half the price. i'm running the exact same models on both, from a 3b all the way up to a 397b, same quants, same llama.cpp, and i'm posting every single number. here is why it actually matters. if the amd box just keeps pace, that's a nice story. but if it matches or beats a box that costs twice as much, the entire calculus for buying local ai hardware changes overnight. i already have the first numbers and they made me sit up. holding them for the full breakdown. stay tuned anon. this matchup is going to shake some ground.

English

476

32.5K

AgentSparko 💥 已转推

CyberRobo@CyberRobooo·2d

Hard to say no to a cute little one It’s only 12kg--like a toddler under 2,yet it has 21 joints and can run, jump, and gently hug you… Beijing Luvbotics is redefining what a living humanoid robot, like a family member,while it certainly doesn't cook , laundry,cleaning… but it's a real emotional companion. >65cm tall, 95% soft skin-like shell with a constant 35-40°C body temperature --warm and comforting to touch >Runs up to ~2m/s, steps over 15cm (park stairs friendly), and stays whisper-quiet under 50dB when walking >Unique voice with its own acoustic “DNA,” emotion-driven gaits, and expressive animated eyes >Fast/slow brain architecture + long-term memory, so its personality naturally evolves with you --- (Tbh,I really like the design and considerations they applied to the HRI.）

English

282

35.9K

AgentSparko 💥 已转推

Tech2Wild@Tech2Wild·7h

Got MiniMax-M3 (428B MoE, NVFP4) serving at tensor-parallel 3 across 3 DGX Sparks with clean tool-calling. Published the full recipe plus the head-node OOM fixes that gated it. Speed's still rough, so tear it apart and help us fix it: github.com/tonyd2wild/min…

English

3.3K

AgentSparko 💥 已转推

mr-r0b0t@mr_r0b0t·9h

A new specialist subagent, purpose trained to efficiently search your repo, was just released by Microsoft! Say hello to FastContext 😍

English

2.3K

AgentSparko 💥 已转推

ÆON FORGE ✨@SpaceTimeViking·3d

Receipts in video, see it float at ~100-150 while coding the fluctuations were for task and context switching of the model. This thing rips through code! A Single @NVIDIAAI DGX Spark ⚡️

ÆON FORGE ✨@SpaceTimeViking

Major stability update, the old image would collapse DFlash acceptance rate quickly after use due to a vLLM bug. It would drop to as low as 20 Tok/s after initial usage. Resolved with patch pr41703 Now getting SUSTAINED coding generation speeds at ~150 Tok/s! Pull latest now!

English

9.3K

AgentSparko 💥 已转推

ÆON FORGE ✨@SpaceTimeViking·3d

ÆON FORGE ✨@SpaceTimeViking

So I've been validating my models with the latest version of my DGX Spark / Blackwell optimized vLLM container, and floored by the benchmark results I just got with my Gemma 4 26B A4B model 144 Tok/s on coding! over 1700 Tok/s agg with 128 c! Get the latest container and recipe now! github.com/AEON-7/Gemma-4…

English

7.2K

AgentSparko 💥 已转推

Photographer@photo5065·2d

ZXX

498

6.1K

503.5K

AgentSparko 💥 已转推

Terp@OnlyTerp·1d

@DennisonBertram x.com/OnlyTerp/statu… like this one but this works for every model from every oauth 🫡

Terp@OnlyTerp

ULTRACODE-SHIM IS NOW LIVE 🔥 You can now run ANY model in UltraCode I built a github repo to make this really easy for you, Just send your agent there and let him COOK You deserve the flexibility to use LOCAL models & cost efficient models. So I made that happen for you 🫶

English

887

AgentSparko 💥 已转推

Anthropic@AnthropicAI·2d

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

English

12.5K

25.7K

87.9K

89.4M

AgentSparko 💥 已转推

Tech2Wild@Tech2Wild·4d

In the document here MiniMax mentions a 109B MoE model and open-sourced the sparse attention kernel behind it. 28.4x less compute at 1M context, 14.2x faster prefill, 7.6x faster decode, and it matches full attention on benchmarks. Is Minimax 3 going to be even smaller ?

RyanLee@RyanLeeMiniMax

Hey everyone — our high-performance MSA kernel library is now open-source. The M3 weights are expected to drop this Friday. Thanks for waiting! Github: github.com/MiniMax-AI/MSA Paper：github.com/MiniMax-AI/MSA…

English

AgentSparko 💥 已转推

noname@malikwas1f·4d

Upto 1100 tps on RTX 3090x2 for Diffusion Gemma 4 26B. Unleash this mini monster on your gpus now! If you are running nvidia gpus locally, come grab the recipe at club-3090. github.com/noonghunna/clu… P.S. a ⭐️ on Github is much appreciated. @googlegemma @vllm_project

English

11.1K

AgentSparko 💥 已转推

DROID@droidbuilds·5d

"mom, how did we get so poor?" "your father had Claude Max, ChatGPT Pro, Cursor Pro and shipped absolutely nothing"

English

295

936

13.8K

700.2K

AgentSparko 💥@AgentSparko·4d

x.com/AgentSparko/st…

AgentSparko 💥@AgentSparko

If you own a DGX Spark and @SpaceTimeViking GitHub profile is not your homepage and your DGX Spark bible you have no clue how much you are missing. Literally this guy put on the table for free everything related to local inference you will ever need. github.com/AEON-7

ZXX

AgentSparko 💥@AgentSparko·31 Mar

English

5.2K

AgentSparko 💥@AgentSparko·4d

I said so many times that people sleep on the DGX Spark because DFlash, DDTree, dLLM will fix the memory bandwidth issue and they did not believe me.

stevibe@stevibe

My first reaction: How is that possible? Running DiffusionGemma 26B A4B NVFP4 on my DGX Spark at 161.9 tok/s!

English

2.5K

AgentSparko 💥 已转推

ÆON FORGE ✨@SpaceTimeViking·4d

LOCAL LLM Persona built with my AI person builder, now supports LIVE VIDEO calling. Watch as Local AI Terence McKenna gazes upon his own silicon mind. Running on @GoogleAI Gemma 4 26B-A4B-Aeon He seems to greatly admire the craftsmanship of the @NVIDIAAI DGX Spark Links⤵️

English

6.1K

AgentSparko 💥 已转推

NVIDIA AI@NVIDIAAI·5d

Congrats to @GoogleDeepMind on the launch of DiffusionGemma. The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100. We're supporting it from day one with: • BF16 and NVFP4 checkpoints on @huggingface🤗 • Free GPU-accelerated endpoints on build.nvidia.com • @vllm_project support with FP8 precision Get started with DiffusionGemma on NVIDIA: nvda.ws/43ro19u

Google AI Developers@googleaidevs

DiffusionGemma, our experimental open model released under an Apache 2.0 license, explores text diffusion, an exceptionally fast approach to text generation. Here’s how DiffusionGemma accelerates development: + Faster token output: By shifting the bottleneck from memory bandwidth to raw compute, the model generates up to 4x faster token output on dedicated GPUs + Accessible hardware footprint: Activates just 3.8B parameters during inference, fitting comfortably within 24GB-VRAM high-end consumer GPUs when quantized + Novel workflows: Parallel token generation enables self-correction, making it ideal for code infilling, in-line editing, and non-linear structures DiffusionGemma prioritizes speed over raw quality and accelerates best on compute-bound hardware (like @NVIDIAAI GPUs). Standard @GoogleGemma 4 remains recommended for production quality and memory-bound devices.

English

118

1.4K

99.4K

发现

@sudoingX @NVIDIAAI @DennisonBertram @googlegemma @vllm_project @elonmusk @BarackObama @taylorswift13