AgentSparko 💥

3.9K posts

AgentSparko 💥

@AgentSparko

#AI #Cybersecurity #Linux #privacy If you own a DGX Spark you might wanna fallow.

Middle of the GPU Joined Ocak 2023

1.4K Following2K Followers

Pinned Tweet

AgentSparko 💥@AgentSparko·31 Mar

For anyone saying DGX Spark cannot cook. Generating data sets for distilling using Qwen3.5-35B-A3B BF16 !!! (no quants) real data, 0% cache hit, concurrency=192 ; pp=2048 tokens in ; tq=1024 tokens out that`s 1.43M tokens generated every hour for the last 8 hours for 40 W/h.😎

English

5.2K

AgentSparko 💥 retweeted

Photographer@photo5065·1d

ZXX

498

501.4K

AgentSparko 💥 retweeted

Terp@OnlyTerp·1d

@DennisonBertram x.com/OnlyTerp/statu… like this one but this works for every model from every oauth 🫡

Terp@OnlyTerp

ULTRACODE-SHIM IS NOW LIVE 🔥 You can now run ANY model in UltraCode I built a github repo to make this really easy for you, Just send your agent there and let him COOK You deserve the flexibility to use LOCAL models & cost efficient models. So I made that happen for you 🫶

English

859

AgentSparko 💥 retweeted

Anthropic@AnthropicAI·2d

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

English

12.4K

25.7K

87.5K

88.4M

AgentSparko 💥 retweeted

Tech2Wild@Tech2Wild·3d

In the document here MiniMax mentions a 109B MoE model and open-sourced the sparse attention kernel behind it. 28.4x less compute at 1M context, 14.2x faster prefill, 7.6x faster decode, and it matches full attention on benchmarks. Is Minimax 3 going to be even smaller ?

RyanLee@RyanLeeMiniMax

Hey everyone — our high-performance MSA kernel library is now open-source. The M3 weights are expected to drop this Friday. Thanks for waiting! Github: github.com/MiniMax-AI/MSA Paper：github.com/MiniMax-AI/MSA…

English

AgentSparko 💥 retweeted

noname@malikwas1f·4d

Upto 1100 tps on RTX 3090x2 for Diffusion Gemma 4 26B. Unleash this mini monster on your gpus now! If you are running nvidia gpus locally, come grab the recipe at club-3090. github.com/noonghunna/clu… P.S. a ⭐️ on Github is much appreciated. @googlegemma @vllm_project

English

11.1K

AgentSparko 💥 retweeted

DROID@droidbuilds·4d

"mom, how did we get so poor?" "your father had Claude Max, ChatGPT Pro, Cursor Pro and shipped absolutely nothing"

English

295

935

13.8K

699.4K

AgentSparko 💥@AgentSparko·4d

x.com/AgentSparko/st…

AgentSparko 💥@AgentSparko

If you own a DGX Spark and @SpaceTimeViking GitHub profile is not your homepage and your DGX Spark bible you have no clue how much you are missing. Literally this guy put on the table for free everything related to local inference you will ever need. github.com/AEON-7

ZXX

AgentSparko 💥@AgentSparko·31 Mar

English

5.2K

AgentSparko 💥@AgentSparko·4d

I said so many times that people sleep on the DGX Spark because DFlash, DDTree, dLLM will fix the memory bandwidth issue and they did not believe me.

stevibe@stevibe

My first reaction: How is that possible? Running DiffusionGemma 26B A4B NVFP4 on my DGX Spark at 161.9 tok/s!

English

2.5K

AgentSparko 💥 retweeted

ÆON FORGE ✨@SpaceTimeViking·4d

LOCAL LLM Persona built with my AI person builder, now supports LIVE VIDEO calling. Watch as Local AI Terence McKenna gazes upon his own silicon mind. Running on @GoogleAI Gemma 4 26B-A4B-Aeon He seems to greatly admire the craftsmanship of the @NVIDIAAI DGX Spark Links⤵️

English

6.1K

AgentSparko 💥 retweeted

NVIDIA AI@NVIDIAAI·4d

Congrats to @GoogleDeepMind on the launch of DiffusionGemma. The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100. We're supporting it from day one with: • BF16 and NVFP4 checkpoints on @huggingface🤗 • Free GPU-accelerated endpoints on build.nvidia.com • @vllm_project support with FP8 precision Get started with DiffusionGemma on NVIDIA: nvda.ws/43ro19u

Google AI Developers@googleaidevs

DiffusionGemma, our experimental open model released under an Apache 2.0 license, explores text diffusion, an exceptionally fast approach to text generation. Here’s how DiffusionGemma accelerates development: + Faster token output: By shifting the bottleneck from memory bandwidth to raw compute, the model generates up to 4x faster token output on dedicated GPUs + Accessible hardware footprint: Activates just 3.8B parameters during inference, fitting comfortably within 24GB-VRAM high-end consumer GPUs when quantized + Novel workflows: Parallel token generation enables self-correction, making it ideal for code infilling, in-line editing, and non-linear structures DiffusionGemma prioritizes speed over raw quality and accelerates best on compute-bound hardware (like @NVIDIAAI GPUs). Standard @GoogleGemma 4 remains recommended for production quality and memory-bound devices.

English

118

1.4K

99.2K

AgentSparko 💥 retweeted

Terp@OnlyTerp·4d

I spent a ton of time having Fable 5 Extra High train and runn tests on Nemotron 3 Ultra & I just dropped MASSIVE improvements on the github 🫡 I had a theory that if I could force proper reasoning chains & steps where he grounds himself with source of truth (usually web search or memory search) if a confidence score is under a certain % - that he would be able to perform Significantly more reliable for everything I ran more benchmarks to prove this & the tests look great, Go ahead and try it & tell me what you think, Should be helpful data for fine tuning, I'm gonna keep adding to it and see where the actual limit is for prompting before a fine tune is needed, the model gets 1m context so i will make the prompt as long as i need until i get the best possible results, then the goal is to keep those results and shrink it as much as possible, then once we are min maxed, should be good data to fine tune the model 😙 500 TPS with blackbox for $20 a month is what I'm using for Nemotron 🫣

Terp@OnlyTerp

I had a theory that a lot of Nemotron 3 Ultra's issues could be fixed without fine tuning After running tests, I got a reliable & warm personality out of it using just one 353 word system prompt As usual, I've fully open-sourced my research for you github.com/OnlyTerp/nemot… 🔥

English

3.2K

AgentSparko 💥@AgentSparko·4d

Funny that the whole West wants China to democratize a product of the Western democracy. 😂

0xAA@0xAA_Science

DeepSeek 开始蒸馏 Fable 和 Mythos 了，很快会以 1% 的价格给大家用。

English

AgentSparko 💥 retweeted

Sayak Paul@RisingSayak·4 Haz

We want to work with kernel developers to help them publish their cool kernels on the @huggingface Hub via🤗 Kernels. This has several advantages: * A consistent build structure * Extreme ease of use * Standardized distribution * Reproducibility Reach out if interested 🤗

English

9.7K

AgentSparko 💥 retweeted

jack gamrot@jgamrot·4d

@Snixtp

QME

2.1K

AgentSparko 💥@AgentSparko·4d

That looks like a dream. Please add me to the sponsorship list if possible. It's so hard to see all these awesome models and cannot use them as I have just one DGX Spark.

Nader Khalil🍊@NaderLikeLadder

How it started // how it's going @exolabs @NVIDIAAI

English

472

AgentSparko 💥@AgentSparko·4d

@NVIDIAAI I'm here crying that I have just one DGX Spark and cannot work with the largest models. Can you please add me to your sponsorship list ?

0xSero@0xSero

The beast is roaring.

English

AgentSparko 💥 retweeted

mr-r0b0t@mr_r0b0t·5d

x.com/i/article/2064…

ZXX

7.4K

AgentSparko 💥@AgentSparko·5d

@SpaceTimeViking Crazy numbers, imagine this with DDTree, it would feel like we wave a @Cerebras supercomputer hidden inside the tiny DGX Spark.

English

202

AgentSparko 💥 retweeted

ÆON FORGE ✨@SpaceTimeViking·5d

So I've been validating my models with the latest version of my DGX Spark / Blackwell optimized vLLM container, and floored by the benchmark results I just got with my Gemma 4 26B A4B model 144 Tok/s on coding! over 1700 Tok/s agg with 128 c! Get the latest container and recipe now! github.com/AEON-7/Gemma-4…

ÆON FORGE ✨@SpaceTimeViking

vLLM major update supports all the DGX Spark +Blackwell optimizations, plus now with support for NVFP4 kv cache get 2x-4x boost in model context capacity! If using NVFP4 KV Cache you will have to opt for MTP instead of DFlash but worth it for large Agent swarms. github.com/AEON-7/vllm-ul…

English

14.4K

AgentSparko 💥@AgentSparko·5d

This AI shit has to stop. Thee are enough dumb people who think this is real and then expose themselves to wild creatures and end up dead. How can these people not think about this when they use AI to create such crap. No responsibility what so ever. Ban these accounts!

Gabriele Corno@Gabriele_Corno

A mountain lioness appears to attack this explorer after a spring snowstorm; a story with many twists and turns…….

English

Discover

@DennisonBertram @googlegemma @vllm_project @GoogleAI @NVIDIAAI @GoogleDeepMind @huggingface @Snixtp