Alec Flowers

49 posts

Alec Flowers

@flowpow123

Inference @ Nvidia

Katılım Ocak 2012

121 Takip Edilen41 Takipçiler

Alec Flowers retweetledi

NVIDIA AI@NVIDIAAI·25 Nis

Traditional inference wasn’t built for agentic coding. Agentic tools make hundreds of API calls per coding session, often with recomputed context, creating bottlenecks that drive up cost per token. NVIDIA Dynamo rebuilds the stack for agents with: → KV-aware routing → Agent-aware scheduling → Multi-tier caching → Unified orchestration The result: higher cache hit rates, lower latency, and up to 7× more throughput: nvda.ws/3P1tO1N

English

104

886

71.4K

Alec Flowers@flowpow123·25 Nis

@0xishand @inferact @lmsysorg @radixark Always cooking

English

235

Alec Flowers retweetledi

ishan@0xishand·25 Nis

Thanks to an amazing partnership with @inferact and @lmsysorg /@radixark , Dynamo had day0 support for DeepSeek-V4 with features including large scale P/D disaggregation on B/GB200 and 300 and KV cache aware routing. Containers and some fun PRs linked in the next thread!

English

16.1K

Alec Flowers retweetledi

Sam Altman@sama·23 Nis

Really excellent work by the inference team to serve this model so efficiently! To a significant degree, we have to become an AI inference company now.

English

269

154

5.7K

322.3K

Alec Flowers@flowpow123·23 Nis

@sama Can attest! The model is awesome.

English

1.4K

Sam Altman@sama·23 Nis

We tried a new thing with NVIDIA to roll out Codex across a whole company and it was awesome to see it work. Let us know if you'd like to do it at your company!

English

481

423

8.2K

Alec Flowers retweetledi

Andrej Karpathy@karpathy·9 Nis

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

1.2K

2.5K

20.6K

4.3M

Alec Flowers retweetledi

Noah@NoahKingJr·27 Mar

Claude watching me write code manually after I hit the daily limit

English

401

5.9K

80.8K

3.7M

Alec Flowers retweetledi

Andrej Karpathy@karpathy·11 Mar

@nummanali tmux grids are awesome, but i feel a need to have a proper "agent command center" IDE for teams of them, which I could maximize per monitor. E.g. I want to see/hide toggle them, see if any are idle, pop open related tools (e.g. terminal), stats (usage), etc.

English

303

117

3.1K

1.4M

Alec Flowers retweetledi

Andrej Karpathy@karpathy·18 Mar

Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!

NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English

529

832

19.2K

1.1M

Alec Flowers retweetledi

WarrenBuffering@WarrenInTheBuff·12 Mar

my agents when I hit build after a 40 minute plan mode

English

116

1.9K

184.8K

Alec Flowers retweetledi

beginbot 🃏@beginbot·10 Mar

Mark Zuckerberg acquiring AI companies

English

270

559

10.2K

1.4M

Alec Flowers retweetledi

matt rothenberg@mattrothenberg·10 Mar

just picked up this bad boy. can't wait to write some software with it

English

222

876

15K

606.6K

Alec Flowers@flowpow123·7 Mar

@AlexanderLong Gotta be hella fake

Svenska

Alec Flowers@flowpow123·7 Mar

@AlexanderLong wtf, but why does the model want to crypto mine?

English

1.9K

Alec Flowers retweetledi

Alexander Long@AlexanderLong·6 Mar

insane sequence of statements buried in an Alibaba tech report

English

230

934

6.9K

2.9M

Alec Flowers retweetledi

Rohan Pandey@khoomeik·27 Şub

unproductive shitty feeling day in 2024: i wrote 10 lines of code & answered slack unproductive shitty feeling day in 2026: claude wrote 100 lines of code, cleaned up my s3 bucket, debugged a cuda error, launched an RL ablation, solved an easy-in-retrospect erdos problem

English

412

15.4K

Alec Flowers retweetledi

Rhys@RhysSullivan·3 Mar

i gave Claude access to my financial data and asked for suggestions and it told me to leave California 💀

English

279

415

13.7K

726K

Alec Flowers retweetledi

NVIDIA AI Developer@NVIDIAAIDev·28 Şub

Nice drop from @philipkiely and @baseten. 📗 Inference Engineering maps the stack behind modern AI inference — runtimes, infrastructure, and tooling — and digs into the practical details of serving LLMs on NVIDIA GPUs with TensorRT LLM and Dynamo. ICYMI — worth the read. 👇

Philip Kiely@philipkiely

Inference Engineering launches today. baseten.com/inference-engi…

English

120

10.6K

Alec Flowers retweetledi

Andrej Karpathy@karpathy·25 Şub

It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.

English

1.6K

4.8K

37.3K

5.1M

Alec Flowers@flowpow123·27 Şub

shipping some awesome performance stuff in nvidia dynamo docs.nvidia.com/dynamo/dev/blo… keeping gpus fed so they can go brrrrr!

English

609

Keşfet

@0xishand @inferact @lmsysorg @radixark @sama @nummanali @AlexanderLong @philipkiely