Dmitriy Anderson

57.2K posts

Dmitriy Anderson banner
Dmitriy Anderson

Dmitriy Anderson

@DmitriyAnderson

CIO | Agentic AI Architect | Builder-Strategist

Brooklyn, NY Katılım Aralık 2013
1.1K Takip Edilen792 Takipçiler
Dmitriy Anderson retweetledi
Dmitriy Anderson retweetledi
signüll
signüll@signulll·
my only advice to ppl is that if you come to a fork on the road, you should take it.
English
24
11
287
19.4K
Dmitriy Anderson retweetledi
Teknium 🪽
Teknium 🪽@Teknium·
Couldn't have wrote a better blog in favor of Open Source AI or Hermes Agent's self improvement loop! Thanks for the strong words in support of taking back the power to the individuals and businesses who really ought to own their stack.
Satya Nadella@satyanadella

x.com/i/article/2065…

English
31
24
511
34.9K
Dmitriy Anderson retweetledi
Marc Andreessen 🇺🇸
USA.
Farzad 🇺🇸 🇮🇷@farzyness

My theory is that the American empire is JUST getting started. US has a stranglehold on Space with SpaceX, which is the next frontier for defense/war. It has a comically large lead. No one will be close for at least 20 years. It is the leading power in AI by far - both in models and chips. China is catching up fast, but the US has an inherent mechanism that will increase the likelihood that it will win in the end - a free market + capitalism + free speech. A free market + capitalism allows for brutal competition between companies. Free speech allows for AI models to be maximally truth seeking, which means that AIs CAN and WILL BECOME smarter than humans to the point where they can tell the truth about its leaders. This is literally impossible in China. Try having a Chinese model that says Xi Jinping is corrupt. Good luck with that. Then, you have a country that has more guns than people and surrounded by two massive oceans and two friendly neighbors, which means any sort of kinetic take over of the country is literally impossible. Not to mention the US has BY FAR the best and strongest military. The only way adversaries can hope to defeat the US is by tearing it from within by pitting us against each other. This is why it's virtually guaranteed that all the division/hatred/polarization you see within the country is fomented by China/Russia Psy Ops + propaganda efforts. I'm not saying these aren't naturally happening in spots - America is far from perfect - but it would be naive to think our adversaries aren't pouring millions of gallons of fuel on a fire. As long as the American public a) has the ability to exercise its free speech b) has a protected 2nd amendment c) capitalism and free markets continue to function and d) the populace is aware of how awesome America really is, it is literally impossible to stop the US's trajectory to global domination in the coming decades, especially as China's demographics continue to collapse. It's the bottom of the 9th, the game is tied, and the US has the bases loaded. It's a 3-2 pitch. All we need is a home run, and we win the rest of the century.

QST
16
19
320
19.8K
Dmitriy Anderson retweetledi
Farzad 🇺🇸 🇮🇷
My theory is that the American empire is JUST getting started. US has a stranglehold on Space with SpaceX, which is the next frontier for defense/war. It has a comically large lead. No one will be close for at least 20 years. It is the leading power in AI by far - both in models and chips. China is catching up fast, but the US has an inherent mechanism that will increase the likelihood that it will win in the end - a free market + capitalism + free speech. A free market + capitalism allows for brutal competition between companies. Free speech allows for AI models to be maximally truth seeking, which means that AIs CAN and WILL BECOME smarter than humans to the point where they can tell the truth about its leaders. This is literally impossible in China. Try having a Chinese model that says Xi Jinping is corrupt. Good luck with that. Then, you have a country that has more guns than people and surrounded by two massive oceans and two friendly neighbors, which means any sort of kinetic take over of the country is literally impossible. Not to mention the US has BY FAR the best and strongest military. The only way adversaries can hope to defeat the US is by tearing it from within by pitting us against each other. This is why it's virtually guaranteed that all the division/hatred/polarization you see within the country is fomented by China/Russia Psy Ops + propaganda efforts. I'm not saying these aren't naturally happening in spots - America is far from perfect - but it would be naive to think our adversaries aren't pouring millions of gallons of fuel on a fire. As long as the American public a) has the ability to exercise its free speech b) has a protected 2nd amendment c) capitalism and free markets continue to function and d) the populace is aware of how awesome America really is, it is literally impossible to stop the US's trajectory to global domination in the coming decades, especially as China's demographics continue to collapse. It's the bottom of the 9th, the game is tied, and the US has the bases loaded. It's a 3-2 pitch. All we need is a home run, and we win the rest of the century.
Cynical Publius@CynicalPublius

Something miraculous is happening in the USA and the world thanks to the World Cup and the USA’s 250th. I’m not quite sure yet what it is, I have to give it more thought. I think it has something to do with the world realizing that managed decline is not necessary (with Trump’s America as the example), and I think it has something to do with more Americans not being afraid to be patriots about their nation and their culture. It’s more than that though—I have to ponder. Article coming when I figure it out. Any and all ideas welcome.

English
68
112
1.1K
83.7K
Dmitriy Anderson retweetledi
Nayib Bukele
Nayib Bukele@nayibbukele·
Nunca dejes que las burlas de los demás detengan tus metas. Cuando propuse que el Hospital Rosales, que en ese momento era el peor hospital del país, se convirtiera en el mejor de Centroamérica, hasta mis propios ministros se rieron. Imaginen lo que decían la oposición y los incrédulos. Pero yo sabía que, con esfuerzo, disciplina y sin mirar hacia atrás ni hacia los lados, se podía lograr. Y lo logramos. Hoy, el Hospital Rosales es el mejor hospital de Centroamérica, público o privado. Cuenta con todas las especialidades médicas, el equipamiento más avanzado del mundo, 200 especialistas extranjeros y 3,000 salvadoreños listos para atender cualquier enfermedad de forma gratuita. El siguiente paso es que más hospitales de nuestro país alcancen ese nivel. Pronto tendremos otra sorpresa. Primero Dios.
Español
861
4.6K
20.2K
300K
Dmitriy Anderson retweetledi
signüll
signüll@signulll·
the best heuristic to know when a relationship (any type) is over is if you run out of things to talk about. obviously downstream of a lot of things but it’s a great signal.
English
33
15
731
60.7K
Dmitriy Anderson retweetledi
0xSero
0xSero@0xSero·
This guy did a much better job than me. If you want to run AI at home here’s current SOTA with functional configs.
AlexAImaginator@TraffAlex

🖥️ Best Local LLMs for Consumer GPUs — llama.cpp Guide (June 2026) What I actually run on consumer hardware right now. Every model below runs via llama.cpp with a simple one-liner — no Docker, no Python env, no cloud. ━━━ 8-16GB VRAM ━━━ 🔹 Gemma 4-12B (Google) • Smartest model in this size class — competes with stuff 2× bigger • Unsloth's MTP GGUFs: 162 tok/s vs 52 tok/s normal (3× speedup) • Minimum 8GB VRAM recommended for Q4_K_M quant • GGUF → huggingface.co/unsloth/gemma-… 🔹 LFM2.5-8B-A1B (LiquidAI) • Hybrid MoE, only 1B active params — absurdly fast for its size • Perfect for 8-12GB cards, MacBooks, or anyone on a tight budget • GGUF → huggingface.co/LiquidAI/LFM2.… ━━━ 16-32GB VRAM ━━━ 🔹 Qwen3.6-27B (Qwen) • Scored 1.00 on tool-efficiency benchmarks — best local agent available • 40 deterministic tasks, 32k/128k context needle tests — all passed • GGUF → huggingface.co/unsloth/Qwen3.… • MTP version (faster) → huggingface.co/unsloth/Qwen3.… 🔹 Qwopus3.6-27B-v2 (Jackrong) • Best quantization of Qwen3.6-27B — topped 5 agent & coding benchmarks (1200 samples) • If you're running Q4, this is the one to grab • GGUF → huggingface.co/Jackrong/Qwopu… • MTP version → huggingface.co/Jackrong/Qwopu… 🔹 Gemma 4-31B QAT (Google/Unsloth) • QAT variant with MTP draft head: 76-125 tok/s (1.67× speedup) • Excellent for multi-agent / subagent workflows • GGUF → huggingface.co/unsloth/gemma-… 🔹 Nex-N2-Mini (Nex AGI) • Post-train of Qwen3.5-35B-A3B — MoE with only 3B active params • Fits on 16GB+ VRAM, overflow loads from system RAM • Adaptive thinking saves ~20% tokens with no quality loss • For deep multi-step reasoning, nothing in this size comes close • GGUF → huggingface.co/sjakek/Nex-N2-… ━━━ Quick Picks ━━━ • 16GB all-rounder → Gemma 4-12B with MTP GGUFs • 32GB all-rounder → Qwen3.6-27B / Qwopus-v2 • Agents & tool use → Qwen3.6-27B or Qwopus Q4 • Deep reasoning → Nex-N2-Mini (MoE, fits 16GB+) • Tight budget → LFM2.5-8B-A1B • Cheapest full build: 1× used RTX 3090 (24GB) + rest of PC ≈ $1000-1500 ━━━ Setup on Windows ━━━ 1. Download llama.cpp → github.com/ggml-org/llama… (latest .zip) 2. Extract to any folder (e.g. C:\llama.cpp) 3. Download a .gguf from the links above (Q4_K_M or Q5_K_M for best quality/speed balance) 4. Run one of the commands below depending on your hardware ━━━ Launch Commands ━━━ SINGLE GPU — Standard model (no MTP): llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja SINGLE GPU — MTP model (faster inference): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU — Split across two cards: llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ --tensor-split 0.55,0.45 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU + MTP + Vision (multimodal): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ --tensor-split 0.60,0.40 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja ^ --mmproj C:\models\mmproj-F16.gguf ━━━ Parameter Breakdown ━━━ -m Path to your .gguf model file. Change this to wherever you downloaded it. --ctx-size 180000 Context window in tokens. 180k = huge context for long conversations or big codebases. Reduce to 32768 or 65536 if you don't need long context — uses less VRAM. --flash-attn on Flash Attention — dramatically speeds up inference and reduces VRAM usage. Works on RTX 30xx/40xx/50xx. Always enable this. --cache-type-k q4_0 / --cache-type-v q4_0 Quantizes the KV cache (key/value attention cache) to 4-bit. This is what makes 180k context fit in VRAM. Without it, huge contexts eat all your memory. Quality impact is minimal — this is a free performance win. --batch-size 1024 / --ubatch-size 512 batch-size = how many tokens are processed in one forward pass (throughput). ubatch-size = micro-batch actually sent to the GPU per step. Higher = faster prompt processing but needs more VRAM. If you run out of VRAM, lower these (e.g. 512/256). -ngl 100 Number of layers to offload to GPU. 100 = all layers on GPU (full offload). This is what you want if the model fits in your VRAM. If it doesn't fit, reduce this (e.g. -ngl 40) — remaining layers run on CPU/RAM. --tensor-split 0.55,0.45 How to split model layers across multiple GPUs. Values are ratios. 0.55,0.45 = GPU 0 gets 55% of layers, GPU 1 gets 45%. Adjust based on your VRAM — give more to the card with more memory. Example: 0.70,0.30 for a 24GB + 12GB setup. Not needed for single GPU setups. --main-gpu 0 Which GPU handles the batch computation (the "orchestrator"). Set to 0 (your primary GPU). The other GPU(s) handle their assigned layers. Minor performance impact — usually just leave it at 0. -np 1 Number of parallel slots (concurrent requests). 1 = one user at a time. Increase to 2-4 if you want multiple clients connected simultaneously. Each extra slot uses additional VRAM for its own KV cache. --port 8080 Which port the server listens on. Change if port 8080 is busy. --jinja Enables Jinja2 template processing — required for proper chat formatting. Most modern models expect this. Always include it. --spec-type draft-mtp Enables Multi-Token Prediction (MTP) speculative decoding. Only works with MTP GGUF models (downloaded separately). The model predicts multiple tokens at once and verifies them — big speed boost. --spec-draft-n-max 3 How many tokens the MTP draft head proposes per step. 3 is a good default. Higher = potentially faster but more VRAM and may reduce quality. --mmproj Path to the multimodal projector file (for vision models). Enables image understanding — paste screenshots into the web chat. Only needed if you want vision capabilities. Omit for text-only use. ━━━ Your Hardware → Your Command ━━━ Single GPU (8-24GB VRAM): Use the "Single GPU" command. Change -m to your model path. 8GB card → Gemma 4-12B Q4 or LFM2.5-8B 12GB card → Gemma 4-12B Q5/Q6 16GB card → Gemma 4-31B QAT Q4 or Nex-N2-Mini 24GB card → Qwen3.6-27B Q4/Q5, Qwopus-v2, Gemma 4-31B QAT Q5/Q6 Dual GPU: Use the "Dual GPU" command. Adjust --tensor-split based on your VRAM ratio. 24GB + 24GB → --tensor-split 0.50,0.50 24GB + 12GB → --tensor-split 0.70,0.30 24GB + 8GB → --tensor-split 0.75,0.25 Want speed? Use MTP versions of models with the "MTP" commands. Want vision? Add --mmproj with the projector file from the model's HuggingFace repo. 5. Once running, you get: • Web chat UI → http://localhost:8080 • OpenAI-compatible API → http://localhost:8080/v1 • Playground → http://localhost:8080/playground ━━━ Why /v1 API Is the Killer Feature ━━━ One local endpoint replaces your entire cloud API bill. The /v1 endpoint is drop-in OpenAI-spec compatible — every tool that speaks OpenAI just works. No custom code, no glue layer. Works out of the box with: • IDEs: Cursor, Continue, Windsurf, Cline, Roo Code • CLI tools: aider, Open Interpreter, OpenCode • Frameworks: LangChain, LlamaIndex, LiteLLM • Any OpenAI SDK (Python, Node, Go, Rust) Why this beats cloud APIs: • 100% private — code never leaves your machine • $0 per token — no rate limits, no quotas, no surprise bills • Works fully offline • Zero telemetry, no training on your data • Swap models by dropping in a different .gguf — no app changes needed • Run 32k–128k context windows without burning money Good combos: • Cursor + Qwopus-v2 → near-frontier quality, zero API cost • Continue + Qwen3.6-27B → best local coding agent • aider + Gemma 4-12B MTP → 162 tok/s, feels instant • OpenCode + Nex-N2-Mini → deep reasoning on 16GB Set any OpenAI-compatible client to your local endpoint: set OPENAI_API_KEY=sk-dummy (any non-empty string works) set OPENAI_BASE_URL=http://localhost:8080/v1 # every OpenAI-compatible tool now hits your local GPU Shoutouts: @0xSero @rS_alonewolf @witcheer @UnslothAI @LottoLabs

English
13
17
640
77.7K
Dmitriy Anderson retweetledi
roon
roon@tszzl·
I don’t think I love anything as much as language models love “smoke tests”
English
189
81
2.7K
107.6K
Dmitriy Anderson retweetledi
Aaron Levie
Aaron Levie@levie·
Great post. The companies that are able to get their unique IP, institutional knowledge, and data into a format and architecture that lets them capture all of the gains and progress in AI are going to be in the best position in the future. “the real opportunity is not in picking the best model but instead in building a learning loop on top of models where human capital and token capital compound. You can offload a task, or even a job, but you can never offload your learning. The future of the firm is the ability to compound that learning across people and AI. This requires a new architectural approach where every business is able to build agentic systems that improve over time, while still retaining control over their IP. A company should be able to switch out a “generalist” model without losing the “company veteran” expertise built into their learning system.” We’re all collectively figuring out the right architecture for the future of AI. But it’s clear that so much of the power and value will accrue to wherever can best leverage any AI system against their information. This is also why the applied AI layer will also gain so much value over the coming years.
Satya Nadella@satyanadella

x.com/i/article/2065…

English
53
70
659
132.3K
Dmitriy Anderson retweetledi
Pietro Schirano
Pietro Schirano@skirano·
I basically never write my own /goal anymore. I ask Codex to write one for itself, and one for each agent it spawns. Like this 👇
English
97
99
1.7K
236.2K
Dmitriy Anderson retweetledi
How To Prompt
How To Prompt@HowToPrompt__·
China open-sourced a vector database that destroys Pinecone, Chroma, and Weaviate. It's called Zvec, an in-process vector database that runs directly inside your app. No servers. No config. No $200/month bill. → Searches billions of vectors in milliseconds → pip install zvec and you're done → Battle-tested inside Alibaba at production scale → Works on Linux, macOS, Windows, even iOS 100% Open Source.
How To Prompt tweet media
English
20
102
823
44.9K
Dmitriy Anderson retweetledi
Nico
Nico@nicos_ai·
DEJA DE PROMPTEAR. EMPIEZA A LOOPEAR. Encontré un sitio que recopila los loops más usados por la comunidad → loops.elorm.xyz Los propios creadores de Claude Code lo dicen: el futuro no es promptear, es diseñar loops. No sabes qué son ni cómo crearlos? Este articulo lo explica detalladamente👇
angel@angeldot_

x.com/i/article/2064…

Español
39
482
3.2K
407.6K
Dmitriy Anderson retweetledi
The Kobeissi Letter
The Kobeissi Letter@KobeissiLetter·
Anthropic is in its own league. Just days after the launch of Anthropic's Claude Fable 5, the company suspended access to the model after US authorities raised "security concerns." Anthropic said it was ordered to "suspend foreign nationals" from using Claude Fable 5, and self-described the model as "too powerful." "The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance," Anthropic said. In other words, Anthropic's latest AI model is so powerful, that the company and US government are now questioning who should even have access to it. Less than 4 years ago, ChatGPT did not exist. Today, AI is arguably the most powerful technology ever created. Truly incredible.
English
295
286
4.4K
455.1K
Dmitriy Anderson retweetledi
Teknium 🪽
Teknium 🪽@Teknium·
It’s really great id highly recommend trying Hermes Agent 😅
YanXbt@IBuzovskyi

HERMES AGENT RUNS MONITORING, RESEARCH, LEAD DETECTION, AND COMPETITIVE ANALYSIS ON AUTOPILOT. AND KNOWS WHEN NOT TO SPEND YOUR TOKENS. the biggest unlock most people skip: Hermes cron jobs can decide ON THEIR OWN whether the LLM should wake up. WAKE AGENT — THE $0 GATE every cron job can run a Python script first. the script checks: did anything actually change? nothing changed: → script outputs {"wakeAgent": false} → LLM stays asleep → zero tokens spent something changed: → script outputs {"wakeAgent": true} → agent wakes up and handles it three gate patterns from official docs: → file-change: compare file mtime to last run. no change? sleep. → external-flag: another process drops a ready file. no flag? sleep. → HTTP-check: ping a URL, diff the response. same as last time? sleep. real example: monitor AWS costs every hour. script pulls current spend from AWS API. no spike? agent sleeps. zero cost. costs jump 40%? agent wakes, reports to Slack, takes action through Stripe MCP. you run 20 monitoring jobs a day. 18 of them find nothing. you pay for 2. NO AGENT — PURE SCRIPT, ZERO LLM some jobs don't need reasoning at all. TLS checks. uptime pings. disk alerts. heartbeats. hermes cron edit --no-agent --script check_health.py script runs. stdout goes straight to Telegram, Discord, or Slack. no LLM involved. flip any job between modes: hermes cron edit --agent # add LLM hermes cron edit --no-agent # remove LLM free monitoring that lives inside the same ecosystem as your agent. 4 MORE USE CASES THIS UNLOCKS: COMPETITIVE ANALYSIS weekly cron with script that diffs competitor pages. agent only analyzes actual changes. updates your tracking file and PRD skill automatically. PRD AS A SKILL save product requirements as a skill, not a document. skills load on demand into fresh context. documents drift. skills stay sharp. CONTENT REPURPOSING hand a video script to the agent. it drafts X and LinkedIn posts in your voice. writes to a review folder. you approve via Telegram. LEAD DETECTION webhook monitors inbox. agent spots potential leads. drafts responses using your business context. schedules meetings from your calendar. the pattern across all of these: scripts handle the mechanical work for free. the agent only spends tokens on reasoning that requires judgment. comment CRON and I'll send you 5 ready-to-paste cron configs with wakeAgent and no_agent patterns. full Hermes SOUL.MD guide 👇

English
17
22
561
47.3K