Kody A

54 posts

Kody A

@tech_kody

Ranch kid turned cloud engineer turned whatever comes next. I'm a lot of things, most of them aren't on a résumé.

Salt Lake City, UT Katılım Haziran 2025

162 Takip Edilen6 Takipçiler

Kody A@tech_kody·22h

@claudeai @trq212 @bcherny @ClaudeDevs

QME

406

Claude@claudeai·2d

Available on all paid plans. Read more: claude.com/blog/agent-vie…

English

784

208.4K

Kody A retweetledi

Claude@claudeai·2d

New in Claude Code: agent view. One list of all your sessions, available today as a research preview.

English

979

2.2K

28.6K

5.6M

Kody A retweetledi

Boris Cherny@bcherny·2d

The best way to level up from 1 agent => many agents. No more cycling between terminal tabs

Claude@claudeai

New in Claude Code: agent view. One list of all your sessions, available today as a research preview.

English

219

215

490.8K

Kody A@tech_kody·6d

“Evals for subjective, stateful agents Built” presented by @iyajainfinity & Yikai Zhu from @descript #codewithclaude @ClaudeDevs @claudeai - great eval system for real user projects! 👏 well done! claude.com/code-with-clau…

English

850

Kody A@tech_kody·6d

@siddhanganak @Ai_Tech_tool youtube.com/watch?v=7xTGNN…

YouTube

QME

Siddhangana Karmakar@siddhanganak·6d

@Ai_Tech_tool whats the link in YT? This is so difficult to consume in Twitter

English

412

AilaunchX@Ai_Tech_tool·6 May

ANDREJ KARPATHY COULD HAVE CHARGED $2,000 FOR THIS COURSE. He put it on YouTube. The full training stack. Tokenization. Neural network internals. Hallucinations. Tool use. Reinforcement learning. RLHF. DeepSeek. AlphaGo. 3 hours of the most comprehensive LLM education that exists anywhere at any price. Not how to use the tools. How the entire system was built from the ground up and why it behaves the way it does. The engineers who understand this build things the ones who only use the tools cannot even conceive of. The gap between those two groups is not 3 hours. It is everything those 3 hours quietly unlock for the rest of your career.

English

134

639

55.3K

Kody A retweetledi

Lex Fridman@lexfridman·7 May

Here's my conversation all about @FFmpeg, the legendary open-source software powering most video on the Internet. In the episode, I talk with Jean-Baptiste Kempf and Kieran Kunhya. JB is lead developer of VLC and Kieran is FFmpeg contributor, codec engineer, and the person behind the now-infamous @FFmpeg account on X. VLC (@videolan), by the way, is also a legendary piece of open-source software: it's a video player that can open basically anything & has been downloaded over 6 billion times. I think both FFmpeg and VLC are two of the most important and impactful software systems ever created, both open source, and both created & maintained by volunteers: brilliant engineers from all walks of life. Thank you to everyone who contributed to FFmpeg and VLC, and in general to all engineers giving their heart & soul to building systems used by millions (or billions) of people, and often doing so not for money, status, or fame, but purely for the love of building great software and doing good for the world. Thank you to the builders! 🙏❤️ Shoutouts in this chat to @ID_AA_Carmack @karpathy @elonmusk @TimSweeneyEpic and everyone who is a contributor & fan of open source! It's here on X in full and is up everywhere else (see comment). Timestamps: 0:00 - Episode highlight 2:17 - Introduction 5:35 - Weirdest things VLC opens 9:59 - How video playback works 19:20 - Video codecs and containers 30:07 - FFmpeg explained 51:07 - Linus Torvalds 55:46 - Turning down millions to keep VLC ad-free 1:10:04 - FFmpeg & Google drama 1:29:18 - FFmpeg developers 1:35:55 - VLC and FFmpeg 1:40:29 - History of FFmpeg 1:43:46 - Reverse engineering codecs 1:57:01 - FFmpeg testing 2:01:08 - Assembly code (handwritten) 2:25:26 - Rust programming language 2:34:42 - FFmpeg and Libav fork 2:43:04 - Open source burnout 2:50:51 - x264 and internet video 3:04:07 - Video compression basics 3:11:04 - CIA and fake VLC 3:21:39 - Ultra low latency streaming 3:39:07 - AV2 codec and video patents 3:48:59 - VLC backdoors 3:59:14 - Video archiving 4:05:51 - Future of FFmpeg and VLC

English

180

524

4.7K

432.8K

Kody A retweetledi

Simon Willison@simonw·7 May

Watching @jarredsumner and @bcherny at Code w/ Code talking about robobun, the Bun project's GitHub bot that's now made more contributions to Bun than Jarred has github.com/robobun

English

242

40.5K

Kody A@tech_kody·6 May

Claude code dev con live stream starting now  youtube.com/live/GMIWm5y90…  claude.com/code-with-clau… #codewithclaude

YouTube

English

Kody A retweetledi

Ahmad@TheAhmadOsman·6 May

You don’t pick an Inference Engine You pick a Hardware Strategy Then the Engine follows Inference Engines Breakdown (Cheat Sheet at the bottom) > llama.cpp runs anywhere CPU, GPU, Mac, weird edge boxes best when VRAM is tight and RAM is plenty hybrid offload, GGUF, ultimate portability not built for serious multi-node scale > MLX Apple Silicon weapon unified memory = “fits” bigger models than VRAM would allow but also slower than GPUs clean dev stack (Python/Swift/C++) sits on Metal (and expanding beyond) now supports CUDA + distributed too great for Mac-first workflows, not prod serving > ExLlamaV2 single RTX box go brrr EXL2 quant, fast local inference perfect for 1/2/3/4 GPU(s) setups (4090/3090) not meant for clusters or non-CUDA > ExLlamaV3 same idea, but bigger ambition multi-GPU, MoE, EXL3 quant consumer rigs pretending to be datacenters still CUDA-first, still rough edges depending on model > vLLM default answer for prod serving continuous batching, KV cache magic tensor / pipeline / data parallel runs on CUDA + ROCm (and some CPUs) this is your “serve 100s of users” engine > SGLang vLLM but more systems-brained routing, disaggregation, long-context scaling expert parallel for MoE built for ugly workloads at scale lives on top of CUDA / ROCm clusters this is infra nerd territory > TensorRT-LLM maximum NVIDIA performance FP8/FP4, CUDA graphs, insane throughput multi-node, multi-GPU, fully optimized pure CUDA stack, zero portability (And underneath all of it: Transformers → model architecture layer → CUDA / ROCm / TT-Metal → compute layer) What actually happens under the hood: > Transformers defines the model > CUDA / ROCm executes it > TT-Metal (if you’re insane) lets you write the kernel yourself The Inference Engine is just the orchestrator (simplified) When running LLMs locally, the bottleneck isn’t just “VRAM size” It isn’t even the model It’s: - memory bandwidth (the real limiter) - KV cache (explodes with long context) - interconnect (PCIe vs NVLink vs RDMA) - scheduler quality (batching + engine design) - runtime overhead (activations, graphs, etc) (and your compute stack decides all of this) P.S. Unified Memory is way slower than VRAM Cheat Sheet / Rules of Thumb > laptop / edge / weird hardware → llama.cpp > Mac workflows → MLX > 1–4 RTX GPUs → ExLlamaV2/V3 > general serving → vLLM > complex infra / long context / MoE → SGLang > NVIDIA max performance → TensorRT-LLM

English

363

18.2K

Kody A@tech_kody·1 May

@TheAhmadOsman I’ll be there on 5/6-5/7! Would love to chat.

English

Ahmad@TheAhmadOsman·30 Nis

In the Bay Area for the next couple of weeks If you’re around and wanna grab food, coffee, or yap about GPUs / local AI / inference engines / the future of owning the stack Hit me up 🤙

English

141

16.6K

Kody A@tech_kody·29 Nis

Abbott AI LLC just endorsed .agent—keeping it open and not owned by one company. If you build with AI agents: @agentcommunity_ agentcommunity.org

English

196

Kody A@tech_kody·27 Nis

This. Is. Awesome. > Cowork on third-party (3P) is a deployment mode of Claude… that routes all model inference through a provider you configure: Google Cloud’s Vertex AI, Amazon Bedrock, Microsoft Foundry, or any compatible gateway you operate. claude.com/docs/cowork/3p…

English

Kody A retweetledi

Simon Willison@simonw·27 Nis

Today OpenAI announced that "Revenue share payments from OpenAI to Microsoft continue through 2030, independent of OpenAI’s technology progress" That "independent of OpenAI’s technology progress" fragment appears to mean that the weird AGI clause is now deceased simonwillison.net/2026/Apr/27/no…

English

155

30K

Kody A@tech_kody·27 Nis

@simonw This is basically what you’ve been saying with your lethal trifecta… don’t give it a gun and say don’t shoot. Don’t give it a gun at all…

English

Simon Willison@simonw·27 Nis

The conclusions here feel wrong to me. The two lessons I see are: 1. Don't run agents anywhere they might be able to access production environment credentials - it's on you to know which credentials those are 2. Keep tested backups that are independent from your production host

JER@lifeof_jer

x.com/i/article/2048…

English

166

120

1.5K

220.1K

Kody A@tech_kody·27 Nis

OpenAI and Microsoft partnership changing! Claude cowork now available on Google Cloud’s Vertex AI, Amazon Bedrock and Microsoft foundry! #openai #microsoft #vertexai #claude #anthropic #claudecowork

English

Kody A@tech_kody·26 Nis

Don’t miss out. tiktok.com/t/ZT9Nf3QFhPPo… TikTok live now!

English

Kody A@tech_kody·25 Nis

ZXX

Kody A@tech_kody·25 Nis

@julsimon @MiniMax_AI @claudeai It says the video is private. Did you take offline? Loved your post on “What to Buy for Local LLMs (April 2026)” on Medium btw.

English

Julien Simon@julsimon·21 Nis

Live coding with @MiniMax_AI-M2.7 in @claudeai Code: youtu.be/y1OVlx33E8k

YouTube

English

120

Kody A@tech_kody·25 Nis

@richardnystrom @lennysan @_catwu @AnthropicAI I think in their postmortem for the big issue that they had recently, they actually talked about the /feedback command. I am curious how that’s implemented into their dev

English

Richard N@richardnystrom·24 Nis

@lennysan @_catwu @AnthropicAI How do they handle customer feedback at this pace?

English

302

Lenny Rachitsky@lennysan·23 Nis

How Anthropic’s product team moves faster than anyone else I sat down with @_catwu, Head of Product for Claude Code at @AnthropicAI, to get a peek into their unprecedented shipping pace, how AI is changing the PM role, and how to be the right amount of AGI-pilled. We discuss: 🔸 How Anthropic’s shipping cadence went from months to weeks to days 🔸 The emerging skills PMs need to develop right now 🔸 Why you should build products that don't work yet—then wait for the model to catch up 🔸 Why a 95% automation isn't really an automation 🔸 Cat’s most underrated AI skill (introspection) 🔸 What Cat actually looks for when hiring PMs now (hint: it's not traditional PM skills) Listen now 👇 youtu.be/PplmzlgE0kg

YouTube

English

128

951

Kody A@tech_kody·25 Nis

@picocreator Thank you!!! 🙏

English

PicoCreator - AI builder @ SF 🌉@picocreator·24 Nis

RAW notes on DS v4 paper ⚡️Quick Highlights Between sonnet and Opus Distributed trained model? Compared to native attention - 1% attention compute, and KV cache size - ~50x throughput (to validate) 1M Context length - Only ~5.7 GB KV @ FP8 - L3-405B equiv: ~504 GB

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

1.6K

Keşfet

@claudeai @trq212 @bcherny @ClaudeDevs @iyajainfinity @descript @siddhanganak @Ai_Tech_tool