AgentSparko 💥

3.9K posts

AgentSparko 💥

@AgentSparko

#AI #Cybersecurity #Linux #privacy If you own a DGX Spark you might wanna fallow.

Middle of the GPU Присоединился Ocak 2023

1.4K Подписки2K Подписчики

Закреплённый твит

AgentSparko 💥@AgentSparko·31 Mar

For anyone saying DGX Spark cannot cook. Generating data sets for distilling using Qwen3.5-35B-A3B BF16 !!! (no quants) real data, 0% cache hit, concurrency=192 ; pp=2048 tokens in ; tq=1024 tokens out that`s 1.43M tokens generated every hour for the last 8 hours for 40 W/h.😎

English

5.2K

AgentSparko 💥 ретвитнул

mr-r0b0t@mr_r0b0t·53m

A new specialist subagent, purpose trained to efficiently search your repo, was just released by Microsoft! Say hello to FastContext 😍

English

286

AgentSparko 💥 ретвитнул

ÆON FORGE ✨@SpaceTimeViking·3d

Receipts in video, see it float at ~100-150 while coding the fluctuations were for task and context switching of the model. This thing rips through code! A Single @NVIDIAAI DGX Spark ⚡️

ÆON FORGE ✨@SpaceTimeViking

Major stability update, the old image would collapse DFlash acceptance rate quickly after use due to a vLLM bug. It would drop to as low as 20 Tok/s after initial usage. Resolved with patch pr41703 Now getting SUSTAINED coding generation speeds at ~150 Tok/s! Pull latest now!

English

8.8K

AgentSparko 💥 ретвитнул

ÆON FORGE ✨@SpaceTimeViking·3d

ÆON FORGE ✨@SpaceTimeViking

So I've been validating my models with the latest version of my DGX Spark / Blackwell optimized vLLM container, and floored by the benchmark results I just got with my Gemma 4 26B A4B model 144 Tok/s on coding! over 1700 Tok/s agg with 128 c! Get the latest container and recipe now! github.com/AEON-7/Gemma-4…

English

7.2K

AgentSparko 💥 ретвитнул

Photographer@photo5065·1d

ZXX

499

6.1K

502.5K

AgentSparko 💥 ретвитнул

Terp@OnlyTerp·1d

@DennisonBertram x.com/OnlyTerp/statu… like this one but this works for every model from every oauth 🫡

Terp@OnlyTerp

ULTRACODE-SHIM IS NOW LIVE 🔥 You can now run ANY model in UltraCode I built a github repo to make this really easy for you, Just send your agent there and let him COOK You deserve the flexibility to use LOCAL models & cost efficient models. So I made that happen for you 🫶

English

869

AgentSparko 💥 ретвитнул

Anthropic@AnthropicAI·2d

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

English

12.4K

25.7K

87.7K

88.8M

AgentSparko 💥 ретвитнул

Tech2Wild@Tech2Wild·3d

In the document here MiniMax mentions a 109B MoE model and open-sourced the sparse attention kernel behind it. 28.4x less compute at 1M context, 14.2x faster prefill, 7.6x faster decode, and it matches full attention on benchmarks. Is Minimax 3 going to be even smaller ?

RyanLee@RyanLeeMiniMax

Hey everyone — our high-performance MSA kernel library is now open-source. The M3 weights are expected to drop this Friday. Thanks for waiting! Github: github.com/MiniMax-AI/MSA Paper：github.com/MiniMax-AI/MSA…

English

AgentSparko 💥 ретвитнул

noname@malikwas1f·4d

Upto 1100 tps on RTX 3090x2 for Diffusion Gemma 4 26B. Unleash this mini monster on your gpus now! If you are running nvidia gpus locally, come grab the recipe at club-3090. github.com/noonghunna/clu… P.S. a ⭐️ on Github is much appreciated. @googlegemma @vllm_project

English

11.1K

AgentSparko 💥 ретвитнул

DROID@droidbuilds·4d

"mom, how did we get so poor?" "your father had Claude Max, ChatGPT Pro, Cursor Pro and shipped absolutely nothing"

English

295

936

13.8K

699.7K

AgentSparko 💥@AgentSparko·4d

x.com/AgentSparko/st…

AgentSparko 💥@AgentSparko

If you own a DGX Spark and @SpaceTimeViking GitHub profile is not your homepage and your DGX Spark bible you have no clue how much you are missing. Literally this guy put on the table for free everything related to local inference you will ever need. github.com/AEON-7

ZXX

AgentSparko 💥@AgentSparko·31 Mar

English

5.2K

AgentSparko 💥@AgentSparko·4d

I said so many times that people sleep on the DGX Spark because DFlash, DDTree, dLLM will fix the memory bandwidth issue and they did not believe me.

stevibe@stevibe

My first reaction: How is that possible? Running DiffusionGemma 26B A4B NVFP4 on my DGX Spark at 161.9 tok/s!

English

2.5K

AgentSparko 💥 ретвитнул

ÆON FORGE ✨@SpaceTimeViking·4d

LOCAL LLM Persona built with my AI person builder, now supports LIVE VIDEO calling. Watch as Local AI Terence McKenna gazes upon his own silicon mind. Running on @GoogleAI Gemma 4 26B-A4B-Aeon He seems to greatly admire the craftsmanship of the @NVIDIAAI DGX Spark Links⤵️

English

6.1K

AgentSparko 💥 ретвитнул

NVIDIA AI@NVIDIAAI·4d

Congrats to @GoogleDeepMind on the launch of DiffusionGemma. The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100. We're supporting it from day one with: • BF16 and NVFP4 checkpoints on @huggingface🤗 • Free GPU-accelerated endpoints on build.nvidia.com • @vllm_project support with FP8 precision Get started with DiffusionGemma on NVIDIA: nvda.ws/43ro19u

Google AI Developers@googleaidevs

DiffusionGemma, our experimental open model released under an Apache 2.0 license, explores text diffusion, an exceptionally fast approach to text generation. Here’s how DiffusionGemma accelerates development: + Faster token output: By shifting the bottleneck from memory bandwidth to raw compute, the model generates up to 4x faster token output on dedicated GPUs + Accessible hardware footprint: Activates just 3.8B parameters during inference, fitting comfortably within 24GB-VRAM high-end consumer GPUs when quantized + Novel workflows: Parallel token generation enables self-correction, making it ideal for code infilling, in-line editing, and non-linear structures DiffusionGemma prioritizes speed over raw quality and accelerates best on compute-bound hardware (like @NVIDIAAI GPUs). Standard @GoogleGemma 4 remains recommended for production quality and memory-bound devices.

English

118

1.4K

99.2K

AgentSparko 💥 ретвитнул

Terp@OnlyTerp·4d

I spent a ton of time having Fable 5 Extra High train and runn tests on Nemotron 3 Ultra & I just dropped MASSIVE improvements on the github 🫡 I had a theory that if I could force proper reasoning chains & steps where he grounds himself with source of truth (usually web search or memory search) if a confidence score is under a certain % - that he would be able to perform Significantly more reliable for everything I ran more benchmarks to prove this & the tests look great, Go ahead and try it & tell me what you think, Should be helpful data for fine tuning, I'm gonna keep adding to it and see where the actual limit is for prompting before a fine tune is needed, the model gets 1m context so i will make the prompt as long as i need until i get the best possible results, then the goal is to keep those results and shrink it as much as possible, then once we are min maxed, should be good data to fine tune the model 😙 500 TPS with blackbox for $20 a month is what I'm using for Nemotron 🫣

Terp@OnlyTerp

I had a theory that a lot of Nemotron 3 Ultra's issues could be fixed without fine tuning After running tests, I got a reliable & warm personality out of it using just one 353 word system prompt As usual, I've fully open-sourced my research for you github.com/OnlyTerp/nemot… 🔥

English

3.2K

AgentSparko 💥@AgentSparko·4d

Funny that the whole West wants China to democratize a product of the Western democracy. 😂

0xAA@0xAA_Science

DeepSeek 开始蒸馏 Fable 和 Mythos 了，很快会以 1% 的价格给大家用。

English

AgentSparko 💥 ретвитнул

Sayak Paul@RisingSayak·4 Haz

We want to work with kernel developers to help them publish their cool kernels on the @huggingface Hub via🤗 Kernels. This has several advantages: * A consistent build structure * Extreme ease of use * Standardized distribution * Reproducibility Reach out if interested 🤗

English

9.8K

AgentSparko 💥 ретвитнул

jack gamrot@jgamrot·5d

@Snixtp

QME

2.1K

AgentSparko 💥@AgentSparko·4d

That looks like a dream. Please add me to the sponsorship list if possible. It's so hard to see all these awesome models and cannot use them as I have just one DGX Spark.

Nader Khalil🍊@NaderLikeLadder

How it started // how it's going @exolabs @NVIDIAAI

English

472

AgentSparko 💥@AgentSparko·4d

@NVIDIAAI I'm here crying that I have just one DGX Spark and cannot work with the largest models. Can you please add me to your sponsorship list ?

0xSero@0xSero

The beast is roaring.

English