DanTon

4.2K posts

DanTon

@Danton69200

France Katılım Ocak 2011

3.8K Takip Edilen1.7K Takipçiler

DanTon@Danton69200·8h

Yesterday @Google released Gemma 4 31B and pushed even harder into the local model on a high-end consumer GPU lane. Now @Alibaba_Qwen is already teasing smaller open Qwen3.6 releases, and if this poll is any indication, 27B is the one people want first. Qwen3.5-27B is still one of the local staples, so a Qwen3.6-27B could get very interesting very fast. The pace of local AI right now is honestly insane. What a time to be building

Chujie Zheng@ChujieZheng

We are planning to open-source the Qwen3.6 models (particularly medium-sized versions) to facilitate local deployment and customization for developers. Please vote for the model size you are **most** anticipating—the community’s voice is vital to us!

English

DanTon@Danton69200·9h

@ChujieZheng Qwen3.6-27B

GIF

Magyar

187

Chujie Zheng@ChujieZheng·13h

English

252

200

2.9K

186.1K

DanTon@Danton69200·20h

@Frenchiee Dire “ça tourne sur un Mac 16 Go” comme ça, sans préciser le modèle, la quantization, ni le backend.. Je pense que tu parles du 26B A4B. Théoriquement, ça tourne sur 16GB. Comme une patate, mais ça tourne...

Français

1.1K

Frenchie 🇫🇷@Frenchiee·22h

Ton Mac va pouvoir faire tourner un des meilleurs modèles open source au monde. Par Google. En local. Sans abo 🆓. Sans internet. Y'a jamais eu autant de puissance dispo en open source. Google vient de sortir Gemma 4 en open source (Apache 2.0). Ce que ça change : Aujourd'hui tu paies Cursor, Claude, ChatGPT.. tous les mois pour coder avec l'IA. Même les trucs simples : renommer une variable, générer un composant basique, répondre à une question sur ton code. Avec un modèle comme Gemma 4 en local : >tu gardes ton abo principal pour les tâches complexes (debug, archi, gros refacto) >tu fall back sur le modèle local pour tout le reste : autocomplete, questions simples, petits agents qui tournent en boucle (on parlait de ça en @smoltalk ce matin) >tes agents autonomes qui font 200 appels API par jour ? Ils tournent en local, gratos >il peut lire ~200 fichiers de code d'un coup sans perdre le fil (256K tokens de contexte) >c'est Google, pas un modèle chinois où t'as aucune idée de ce qui se passe avec tes données Résultat : classé #6 mondial, et ça tourne sur un Mac 16GB. ⚠️ Ça remplace pas Opus ou Codex pour les trucs les plus complexes. Mais pour les 70% de tâches simples que tu fais chaque jour : c'est gratuit, offline, et ça tourne sur ton ordi. Ps : le petit modèle (E2B) tourne même dans un navigateur (oui, dans un onglet Chrome)

Google@Google

We just released Gemma 4 — our most intelligent open models to date. Built from the same world-class research as Gemini 3, Gemma 4 brings breakthrough intelligence directly to your own hardware for advanced reasoning and agentic workflows. Released under a commercially permissive Apache 2.0 license so anyone can build powerful AI tools. 🧵↓

Français

649

102.5K

DanTon@Danton69200·21h

Announcement of the day: Gemma 4. I’m not home right now so I can’t test Gemma 4 myself yet, so this is a benchmark-based read. And yeah, the 31B is clearly stepping into Qwen3.5-27B territory for people running local models on a single high-end 24GB GPU. Looking at the official benchmark sheets, Qwen still has a small edge: 86.1 vs 85.2 on MMLU-Pro, 85.5 vs 84.3 on GPQA Diamond, and 80.7 vs 80.0 on LiveCodeBench v6. So far, Qwen looks stronger on structured reasoning and coding benchmarks. Gemma 4 31B, on the other hand, seems to have the edge in general chat preference right now, and it also comes with native multimodality. Arena leans that way too: Gemma 4 31B sits at 1452, Qwen3.5-27B at 1404. So IMO the simple read is: Qwen still looks a bit cleaner on paper, a lot more mature as a local model stack right now, and probably the more stable and efficient fit on a 24GB VRAM setup. But Gemma 4 31B is absolutely a real challenger.

Google@Google

English

DanTon@Danton69200·3d

@NielsRogge @openclaw This feels more like an orchestration issue than a raw model speed issue. I’d use a direct calendar tool call instead of a full agent workflow. If it still takes 3 minutes, the problem is probably in the tool path, not Qwen

English

506

Niels Rogge@NielsRogge·3d

Wtf... Qwen3.5-35B-A3B took 3 minutes (!!) to answer my simple question, "What's on my calendar today?" via @openclaw I don't know what these local LLM fellas are running on, but a DGX Spark sure is not the best thing

English

147

32.3K

DanTon@Danton69200·3d

@meggmcnulty @coinbureau

GIF

QME

Meg McNulty@meggmcnulty·3d

@coinbureau 9 minutes sounds scary, but real systems still lack that many stable qubits. The bigger takeaway is timing, crypto upgrades need to happen before the hardware catches up.

English

397

Coin Bureau@coinbureau·3d

⚠️GOOGLE SAYS A QUANTUM ATTACK ON BITCOIN TAKES JUST 9 MINS WITH A 41% SUCCESS RATE Google's quantum team now says cracking Bitcoin may require less than 500K qubits, far below the “millions” once assumed. Research suggests an attack could take 9mins, faster than a typical 10-min block confirmation, giving a 41% success rate. Google now flags 2029 as a key deadline to upgrade Bitcoin’s cryptography before quantum becomes a real threat.

English

898

662

932.9K

DanTon@Danton69200·3d

@patrickjbradley @heynavtoor Not the first problem I’d worry about. SSD wear is mostly a write problem. Not the case here. IMO the more interesting limits are bandwidth, latency and heat.

English

112

Frank Didit@patrickjbradley·3d

@Danton69200 @heynavtoor Is the issue that using the SSD so aggressively will wear it out quickly?

English

125

Nav Toor@heynavtoor·4d

🚨 397 billion parameters. On a MacBook. No cloud. No GPU cluster. No data center. A laptop. Someone ran one of the largest AI models on Earth on a machine you can buy at the Apple Store. It's called flash-moe. A pure C and Metal inference engine that runs Qwen3.5-397B on a MacBook Pro with 48GB RAM. At 4.4 tokens per second. With tool calling. No Python. No PyTorch. No frameworks. Just raw C and hand-tuned Metal shaders. Here's why this should not be possible: → The model is 209GB. The laptop has 48GB of RAM. → It streams the entire model from the SSD in real time → Only loads the 4 experts needed per token out of 512 → Uses just 5.5GB of actual memory during inference → Production-quality output with full tool calling → 58 experiments. Hand-optimized Metal compute kernels. → The entire engine is ~7,000 lines of C and ~1,200 lines of Metal shaders Here's the wildest part: One person built this. A VP of AI at CVS Health. Not Google. Not OpenAI. A healthcare company executive. Side project. Used Claude Code as his coding partner. Built the entire engine in 24 hours. Running a 397B model on cloud GPUs costs hundreds of dollars per hour. Companies spend millions per year on inference infrastructure for models this size. This runs on a $3,499 laptop. Offline. Private. No API key. No monthly bill. Forever. Trending on GitHub. 332 points on Hacker News. 100% Open Source.

English

114

347

2.6K

199.3K

DanTon@Danton69200·4d

@priestessofdada @heynavtoor Fair point and yeah the engineering is genuinely impressive. But I was pushing back on the hype around what it means, not on the ambition or the work itself 🙂

English

386

Lynn Cole@priestessofdada·4d

@Danton69200 @heynavtoor Calling it "routing tricks" is way underselling it. The simple audacity and ambition here is laudable. Speaking as someone who's tried and failed to do this exact thing

English

435

DanTon@Danton69200·4d

@mikehealthai1 @heynavtoor Yeah I think it could matter. Stuff like this probably doesn’t break the Nvidia story on its own, but it does show there’s still room for serious software side gains. And if those gains keep compounding, that’s where things get interesting.

English

345

[email protected]@mikehealthai1·4d

@Danton69200 @heynavtoor Any further thoughts ? Is it potentially a big deal ? What’s interesting is anything that impacts the exponentials, the biggest weakness in the Nvidia projections of last week is the possibility of algorithmic improvements.

English

422

DanTon@Danton69200·4d

Clairement la DRAM/VRAM. Par contre, rêver d’une DRAM/VRAM moins chère grâce à ça, c’est autre chose. Même si je le souhaite et l'espère. L'effet de news à eu un impact court terme sur le prix, c'est certain, mais les AI provider vont souvent transformer le gain en plus de contexte, plus de batch ou plus de débit. Ça ne reste que ma vision, mais j'ai bien peur que ce sera le cas.

Français

Mister Ben@Benjami96917305·4d

@Danton69200 @Trump_Fact_News Ça impacterait surtout les besoins en DRAM ? Ou aussi en NAND ??

Français

Trump Fact News 🇺🇸@Trump_Fact_News·27 Mar

🚨 ALERTE INFO : Google fait chuter les actions de RAM et NAND en résolvant la pénurie de mémoire grâce à un algorithme nécessitant 6 fois moins de DRAM et fonctionnant 8 fois plus vite, sans perte de précision. L’algorithme, nommé TurboQuant, devrait également faire baisser les prix du matériel. (F)

Français

103

1.3K

135.8K

DanTon@Danton69200·5d

@heynavtoor Cool story, but this isn’t a new bombshell. The original VibeVoice-TTS open-source release was back in Aug 2025, and Microsoft pulled it in Sep 2025 over misuse concerns. Interesting case study, sure. But framing it like a brand new release is overstating it.

English

2.2K

Nav Toor@heynavtoor·5d

🚨 Microsoft just open sourced a voice AI that was too dangerous to keep live. They took it down. Added watermarks and safety controls. Then re-released it. For free. It's called VibeVoice. Microsoft's frontier open source voice AI. Clone any voice from 10 seconds of audio. Generate 90 minutes of multi-speaker conversation. Real-time streaming. All running locally on your machine. No ElevenLabs. No $99/month subscription. No per-minute pricing. Here's what this thing does: → Text-to-speech that sounds indistinguishable from a real human → Generate up to 90 minutes of audio in a single pass → 4 distinct speakers in one conversation with natural turn-taking → Clone any voice from just 10 seconds of audio → Real-time streaming TTS. First audio in ~200 milliseconds. → Speech-to-text that processes 60 minutes of audio in one pass → Identifies who said what and when. Speaker labels + timestamps. → Supports 50+ languages for transcription → Custom hotwords for names, technical terms, domain-specific accuracy Here's the wildest part: Give it a podcast script. It generates a full multi-speaker conversation that sounds like two real humans talking. Natural pauses. Emotional nuance. Turn-taking. 90 minutes. One command. Microsoft had to take this repo down once because people were misusing it for deepfakes and disinformation. They brought it back with embedded watermarks, audio disclaimers, and safety controls. That's how powerful this is. A $3 trillion company built it. Released it. Pulled it. Fixed it. And gave it back to the world. ElevenLabs: $99/month. Play.ht: $39/month. Amazon Polly: pay per character. This: Free. Local. MIT License. 23.5K GitHub stars. 2.6K forks. Backed by Microsoft Research. 100% Open Source.

English

404

2.7K

516.2K

DanTon@Danton69200·6d

Two years ago I built this rig mostly for gaming. Today it’s building itself with @NousResearch’s Hermes Agent and @Alibaba_Qwen’s Qwen3.5-35B-A3B. After just two days, using Hermes in a strict and tightly scoped way, the results are already far beyond my expectations

DanTon@Danton69200

#BlackMythWukong with RTX 4090, 7950x3d and 64go DDR5. Full Ray tracing and cinematics settings on OLED monitor. Really beautiful

English

149

DanTon@Danton69200·27 Mar

@gilkmanz @sudoingX @davidpgil I suggest Qwen3.5-35B-A3B with a single RTX 4090. Works fine for me with the same GPU and Hermes. You can also try qwen3.5-27B with custom settings but it will be slower

English

Tommy G@gilkmanz·27 Mar

@sudoingX @davidpgil I have an RTX 4090 and am running Openclaw. Currently using qwen2.5-coder:32b. Seems sluggish. If i try Hermes whats a good local model to pair it with ?

English

174

Sudo su@sudoingX·27 Mar

to all of you saying local models aren't there yet because some corporate salesman on an openai paycheck told you so. you're running their bloated tools and blaming the model. the model is fine. the bloated harness is the problem. i've tested literally every harness out there and i have the facts and receipts on my timeline and DM. openclaw is 120K+ lines of typescript bloat backed by corporate, mining your thinking while you pay for the privilege. switch to hermes agent and watch the same model become usable. don't take my word for it. just try. i have DMs from people who made the switch and their "broken" model started working instantly. same hardware, same model, different harness. if you're using hermes agent and someone near you is still on openclaw, help them get away from the bloat. they're frustrated at every step, burning tokens doing completely nothing, paying subscriptions to think on someone else's server. buy a single GPU from ebay. compile llama.cpp. install hermes. replace your openai subscriptions and think free. once you think free you start seeing light. you deserve better cognitive tools than bloat that harvests you.

English

121

1.2K

52.8K

DanTon@Danton69200·25 Mar

@AlexFinn Very cool, but slightly overstated: this mainly shrinks KV-cache memory, not model weights. So 16GB Macs may handle longer context and some borderline models better, but it doesn’t magically make previously impossible large models practical overnight

English

1.6K

Alex Finn@AlexFinn·25 Mar

This is potentially the biggest news of the year Google just released TurboQuant. An algorithm that makes LLM’s smaller and faster, without losing quality Meaning that 16gb Mac Mini now can run INCREDIBLE AI models. Completely locally, free, and secure This also means: • Much larger context windows possible with way less slowdown and degradation • You’ll be able to run high quality AI on your phone • Speed and quality up. Prices down. The people who made fun of you for buying a Mac Mini now have major egg on their face. This pushes all of AI forward in a such a MASSIVE way It can’t be stated enough: props to Google for releasing this for all. They could have gatekept it for themselves like I imagine a lot of other big AI labs would have. They didn’t. They decided to advance humanity. 2026 is going to be the biggest year in human history.

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

332

880

9.7K

1.5M

Unsloth AI@UnslothAI·17 Mar

Introducing Unsloth Studio ✨ A new open-source web UI to train and run LLMs. • Run models locally on Mac, Windows, Linux • Train 500+ models 2x faster with 70% less VRAM • Supports GGUF, vision, audio, embedding models • Auto-create datasets from PDF, CSV, DOCX • Self-healing tool calling and code execution • Compare models side by side + export to GGUF GitHub: github.com/unslothai/unsl… Blog and Guide: unsloth.ai/docs/new/studio Available now on Hugging Face, NVIDIA, Docker and Colab.

English

219

841

5.1K

1.6M

DanTon@Danton69200·17 Mar

• Auto-create datasets from PDF, CSV, DOCX Killer feature

English

3.1K

DanTon retweetledi

Hasan Toor@hasantoxr·17 Mar

🚨 BREAKING: A developer just built a military-grade firewall specifically for AI agents. It's called Kavach and it sits silently between your AI agent and your OS kernel. No cloud. No subscriptions. Runs entirely local. Here's why this matters right now: Autonomous agents like AutoGPT and LangChain scripts operate at superhuman speeds on your local file system. A bad hallucination or runaway loop can delete production databases, overwrite source code, or exfiltrate your .env keys to third-party servers before you can hit Ctrl+C. Passive monitoring doesn't stop this. Kavach does. Here's what it actually does: → Phantom Workspace: Intercepts destructive file ops and silently redirects them to a hidden directory. The agent thinks it succeeded. Your files are untouched. → Temporal Rollback: Cryptographic caching of all file modifications. 1-click restoration of any mangled file. Instant. → Network Ghost Mode: Spoofs high-risk outbound requests with fake 200 OK responses. Neutralizes exfiltration without alerting the agent. → Honeypot Architecture: Deploys a fake "system_auth_tokens.json" file. Any process that reads it triggers immediate High-Risk Lockdown. → Turing Protocol: Actively rejects synthetic mouse injections. Randomized 3-character auth codes ensure only a human can override. And the wild part? It has a Simulated Shell that intercepts commands like "rm -rf /" and returns fake success codes to the agent. The agent thinks it destroyed everything. Your files are completely safe. Built in Rust + React via Tauri. Zero-config deployment. Download the .exe or .dmg and it's running in 60 seconds. This is what AI security actually looks like. 100% Opensource. MIT License. Link in comments.

English

147

675

38.4K

Keşfet

@Google @Alibaba_Qwen @ChujieZheng @Frenchiee @smoltalk @NielsRogge @openclaw @meggmcnulty