Kody A

54 posts

Kody A banner
Kody A

Kody A

@tech_kody

Ranch kid turned cloud engineer turned whatever comes next. I'm a lot of things, most of them aren't on a résumé.

Salt Lake City, UT Katılım Haziran 2025
162 Takip Edilen6 Takipçiler
Kody A retweetledi
Claude
Claude@claudeai·
New in Claude Code: agent view. One list of all your sessions, available today as a research preview.
English
979
2.2K
28.6K
5.6M
AilaunchX
AilaunchX@Ai_Tech_tool·
ANDREJ KARPATHY COULD HAVE CHARGED $2,000 FOR THIS COURSE. He put it on YouTube. The full training stack. Tokenization. Neural network internals. Hallucinations. Tool use. Reinforcement learning. RLHF. DeepSeek. AlphaGo. 3 hours of the most comprehensive LLM education that exists anywhere at any price. Not how to use the tools. How the entire system was built from the ground up and why it behaves the way it does. The engineers who understand this build things the ones who only use the tools cannot even conceive of. The gap between those two groups is not 3 hours. It is everything those 3 hours quietly unlock for the rest of your career.
English
34
134
639
55.3K
Kody A retweetledi
Lex Fridman
Lex Fridman@lexfridman·
Here's my conversation all about @FFmpeg, the legendary open-source software powering most video on the Internet. In the episode, I talk with Jean-Baptiste Kempf and Kieran Kunhya. JB is lead developer of VLC and Kieran is FFmpeg contributor, codec engineer, and the person behind the now-infamous @FFmpeg account on X. VLC (@videolan), by the way, is also a legendary piece of open-source software: it's a video player that can open basically anything & has been downloaded over 6 billion times. I think both FFmpeg and VLC are two of the most important and impactful software systems ever created, both open source, and both created & maintained by volunteers: brilliant engineers from all walks of life. Thank you to everyone who contributed to FFmpeg and VLC, and in general to all engineers giving their heart & soul to building systems used by millions (or billions) of people, and often doing so not for money, status, or fame, but purely for the love of building great software and doing good for the world. Thank you to the builders! 🙏❤️ Shoutouts in this chat to @ID_AA_Carmack @karpathy @elonmusk @TimSweeneyEpic and everyone who is a contributor & fan of open source! It's here on X in full and is up everywhere else (see comment). Timestamps: 0:00 - Episode highlight 2:17 - Introduction 5:35 - Weirdest things VLC opens 9:59 - How video playback works 19:20 - Video codecs and containers 30:07 - FFmpeg explained 51:07 - Linus Torvalds 55:46 - Turning down millions to keep VLC ad-free 1:10:04 - FFmpeg & Google drama 1:29:18 - FFmpeg developers 1:35:55 - VLC and FFmpeg 1:40:29 - History of FFmpeg 1:43:46 - Reverse engineering codecs 1:57:01 - FFmpeg testing 2:01:08 - Assembly code (handwritten) 2:25:26 - Rust programming language 2:34:42 - FFmpeg and Libav fork 2:43:04 - Open source burnout 2:50:51 - x264 and internet video 3:04:07 - Video compression basics 3:11:04 - CIA and fake VLC 3:21:39 - Ultra low latency streaming 3:39:07 - AV2 codec and video patents 3:48:59 - VLC backdoors 3:59:14 - Video archiving 4:05:51 - Future of FFmpeg and VLC
English
180
524
4.7K
432.8K
Kody A retweetledi
Ahmad
Ahmad@TheAhmadOsman·
You don’t pick an Inference Engine You pick a Hardware Strategy Then the Engine follows Inference Engines Breakdown (Cheat Sheet at the bottom) > llama.cpp runs anywhere CPU, GPU, Mac, weird edge boxes best when VRAM is tight and RAM is plenty hybrid offload, GGUF, ultimate portability not built for serious multi-node scale > MLX Apple Silicon weapon unified memory = “fits” bigger models than VRAM would allow but also slower than GPUs clean dev stack (Python/Swift/C++) sits on Metal (and expanding beyond) now supports CUDA + distributed too great for Mac-first workflows, not prod serving > ExLlamaV2 single RTX box go brrr EXL2 quant, fast local inference perfect for 1/2/3/4 GPU(s) setups (4090/3090) not meant for clusters or non-CUDA > ExLlamaV3 same idea, but bigger ambition multi-GPU, MoE, EXL3 quant consumer rigs pretending to be datacenters still CUDA-first, still rough edges depending on model > vLLM default answer for prod serving continuous batching, KV cache magic tensor / pipeline / data parallel runs on CUDA + ROCm (and some CPUs) this is your “serve 100s of users” engine > SGLang vLLM but more systems-brained routing, disaggregation, long-context scaling expert parallel for MoE built for ugly workloads at scale lives on top of CUDA / ROCm clusters this is infra nerd territory > TensorRT-LLM maximum NVIDIA performance FP8/FP4, CUDA graphs, insane throughput multi-node, multi-GPU, fully optimized pure CUDA stack, zero portability (And underneath all of it: Transformers → model architecture layer → CUDA / ROCm / TT-Metal → compute layer) What actually happens under the hood: > Transformers defines the model > CUDA / ROCm executes it > TT-Metal (if you’re insane) lets you write the kernel yourself The Inference Engine is just the orchestrator (simplified) When running LLMs locally, the bottleneck isn’t just “VRAM size” It isn’t even the model It’s: - memory bandwidth (the real limiter) - KV cache (explodes with long context) - interconnect (PCIe vs NVLink vs RDMA) - scheduler quality (batching + engine design) - runtime overhead (activations, graphs, etc) (and your compute stack decides all of this) P.S. Unified Memory is way slower than VRAM Cheat Sheet / Rules of Thumb > laptop / edge / weird hardware → llama.cpp > Mac workflows → MLX > 1–4 RTX GPUs → ExLlamaV2/V3 > general serving → vLLM > complex infra / long context / MoE → SGLang > NVIDIA max performance → TensorRT-LLM
English
23
36
363
18.2K
Ahmad
Ahmad@TheAhmadOsman·
In the Bay Area for the next couple of weeks If you’re around and wanna grab food, coffee, or yap about GPUs / local AI / inference engines / the future of owning the stack Hit me up 🤙
English
24
4
141
16.6K
Kody A
Kody A@tech_kody·
This. Is. Awesome. > Cowork on third-party (3P) is a deployment mode of Claude… that routes all model inference through a provider you configure: Google Cloud’s Vertex AI, Amazon Bedrock, Microsoft Foundry, or any compatible gateway you operate. claude.com/docs/cowork/3p…
English
0
0
0
65
Kody A retweetledi
Simon Willison
Simon Willison@simonw·
Today OpenAI announced that "Revenue share payments from OpenAI to Microsoft continue through 2030, independent of OpenAI’s technology progress" That "independent of OpenAI’s technology progress" fragment appears to mean that the weird AGI clause is now deceased simonwillison.net/2026/Apr/27/no…
English
15
6
155
30K
Kody A
Kody A@tech_kody·
@simonw This is basically what you’ve been saying with your lethal trifecta… don’t give it a gun and say don’t shoot. Don’t give it a gun at all…
English
0
0
0
15
Simon Willison
Simon Willison@simonw·
The conclusions here feel wrong to me. The two lessons I see are: 1. Don't run agents anywhere they might be able to access production environment credentials - it's on you to know which credentials those are 2. Keep tested backups that are independent from your production host
JER@lifeof_jer

x.com/i/article/2048…

English
166
120
1.5K
220.1K
Kody A
Kody A@tech_kody·
ZXX
0
0
0
8
Kody A
Kody A@tech_kody·
@julsimon @MiniMax_AI @claudeai It says the video is private. Did you take offline? Loved your post on “What to Buy for Local LLMs (April 2026)” on Medium btw.
English
0
0
0
4
Kody A
Kody A@tech_kody·
@richardnystrom @lennysan @_catwu @AnthropicAI I think in their postmortem for the big issue that they had recently, they actually talked about the /feedback command. I am curious how that’s implemented into their dev
English
0
0
0
21
Lenny Rachitsky
Lenny Rachitsky@lennysan·
How Anthropic’s product team moves faster than anyone else I sat down with @_catwu, Head of Product for Claude Code at @AnthropicAI, to get a peek into their unprecedented shipping pace, how AI is changing the PM role, and how to be the right amount of AGI-pilled. We discuss: 🔸 How Anthropic’s shipping cadence went from months to weeks to days 🔸 The emerging skills PMs need to develop right now 🔸 Why you should build products that don't work yet—then wait for the model to catch up 🔸 Why a 95% automation isn't really an automation 🔸 Cat’s most underrated AI skill (introspection) 🔸 What Cat actually looks for when hiring PMs now (hint: it's not traditional PM skills) Listen now 👇 youtu.be/PplmzlgE0kg
YouTube video
YouTube
English
50
128
951
2M
PicoCreator - AI builder @ SF 🌉
RAW notes on DS v4 paper ⚡️Quick Highlights Between sonnet and Opus Distributed trained model? Compared to native attention - 1% attention compute, and KV cache size - ~50x throughput (to validate) 1M Context length - Only ~5.7 GB KV @ FP8 - L3-405B equiv: ~504 GB
DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English
8
4
23
1.6K