Nav Patel

564 posts

Nav Patel

@patelnav

Always looking to build the next interesting thing. Ex: @synthesischool @openstore @fieldscope_in @apple @wifislam @microsoft

🇮🇳 | 🇨🇦 | 🇺🇸 Katılım Ocak 2010

1.3K Takip Edilen626 Takipçiler

Nav Patel@patelnav·39m

"Peaky" behaviour is exactly what the normies have a hard time grokking. They're used to intelligence correlating across domains. Not being able to count "r"s in strawberry but being able to solve complex math problems doesn't make sense in a pre-AI world. Working with idiot savants your whole career helps.

English

110

Andrej Karpathy@karpathy·49m

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

198

291

2.5K

160K

Nav Patel@patelnav·1h

@GoogleAIStudio @antigravity @DynamicWebPaige Google's marketing is confounding sometimes. I get that bugs people were complaining about were fixed, but why would you make an Ad that says "use us! we won't suck in this way anymore". Is your target audience dejected AI Studio and Antigravity users? @sundarpichai

English

180

Google AI Studio@GoogleAIStudio·2h

the fastest path from prompt to production just got a whole lot smarter now supercharged with agentic reasoning over your entire stack by the @antigravity coding agent @DynamicWebPaige breaks down what's new ⬇️

English

291

15.4K

Nav Patel@patelnav·5h

Absolutely. The gap between the conceptual breakthrough and a product that wins is the defaults. OpenClaw & Pi made the conceptual breakthroughs, but the defaults are tuned towards Peter's preferences and use-case. It's grown tremendously, but I don't think it's the right shaped product for most people. (I use it extensively) The first things to get right are setup, ownership (data + infra), and giving the agent a reliable path to do "self surgery". Which @pmarca describes at the end of this video. (WIP but will launch soon)

English

156

Nivi@nivi·6h

Marc Andreessen on the Architecture of Agents: “Your Agent Is Just Its Files”

English

170

15.4K

Nav Patel@patelnav·7h

@botanium @garrytan This gives you a full diarized transcript

English

Botanium@botanium·9h

@garrytan @patelnav What are the key features that differentiate it from Summarize?

English

381

Nav Patel retweetledi

Garry Tan@garrytan·22h

I’ve tried everything out there for YouTube summary and diarization and the best with my OpenClaw hands down is diarize.io Well done @patelnav

English

568

60.2K

Nav Patel@patelnav·22h

If someone’s going to randomly find your project, @garrytan’s a pretty good one.

Garry Tan@garrytan

I’ve tried everything out there for YouTube summary and diarization and the best with my OpenClaw hands down is diarize.io Well done @patelnav

English

12.6K

Nav Patel@patelnav·1d

@el_PA_B @ggerganov this is awesome! digging into it now

English

362

Pierre-Antoine Bannier@el_PA_B·1d

sam3.cpp - Meta's SAM 3 in pure C++ with @ggerganov's ggml - Supports SAM 3.1, 3, 2.1, 2 and EdgeTAM - FP16, 4-bit quant (EdgeTAM in 15 MB) - Apple Metal GPU, CUDA, CPU - Text-prompted: "peach" → every peach - Single-file C++14 Performance-wise: - 100ms object detection, segmentation - Video object segmentation @ 20FPS on M4 Pro with EdgeTAM github.com/PABannier/sam3…

English

118

857

54.4K

Nav Patel@patelnav·2d

@pmarca Is this just based on token usage? Because the Pro/Max $200 plans from OpenAI (and Anthropic until they nerfed it) gave plenty of tokens not at this high of a cost.

English

541

Marc Andreessen 🇺🇸@pmarca·2d

Magical OpenClaw experiences that use frontier models cost $300-1,000/day today, heading to $10,000/day and more. The future shape of the entire technology industry will be how to drive that to $20/month.

English

623

517

7.7K

1.6M

Nav Patel@patelnav·3d

@collision @sundarpichai @eladgil "This notion that at Google we haven't understood what AGI is... at one point: Demis, Jeff, Ilya, Dario were all there" Damn.

English

2.1K

John Collison@collision·3d

Tomorrow on Cheeky Pint: @sundarpichai gets into everything AI with @eladgil and me.

English

117

2.2K

407.9K

Nav Patel@patelnav·5 Mar

@aidenybai Beautiful, adding to my workflow. Sent you a PR to turn it into a bookmarklet, avoids having to install it on anything. github.com/aidenybai/reac…

English

417

Aiden Bai@aidenybai·5 Mar

Introducing React Grab Select any element on your page → tell Claude Code or Codex what to change Fully open source npx react-grab@latest

English

251

447

7.9K

941.6K

Nav Patel@patelnav·5 Mar

@KaiLentit jQuery? there's a new kid on the block called BackboneJS, have you heard of it? MVC is the future man.

English

250

Kai Lentit (e/xcel)@KaiLentit·5 Mar

Shipping a button was much harder in jQuery.

HSVSphere@HSVSphere

"Shipping a button" (vid by @KaiLentit). Might be the funniest thing I've seen in years

English

642

65.8K

Nav Patel@patelnav·5 Mar

It's not about apps or OS layer. It's about what the models support. You even models like Gemini Pro that are multi-modal, but they don't support stream. So it can STT, but you have to send chunks of audio for the model to process. The Gemini Live model does handle streaming, but it's a Gemini Flash class model, so that falls more in the first bucket from my tweet.

English

superscribe.io@superscribeio·5 Mar

@patelnav @WisprFlow the voice problem wont be solved inside individual apps. it gets solved at the OS layer where dictation streams into every field like a second keyboard

English

Nav Patel@patelnav·5 Mar

Voice Agents aren't a solved problem, you can: 1) Use an native voice-to-voice model that can tool-call: They are responsive but dumb. 2) You can STT manually into a smart model, like @WisprFlow / ctrlspeak (or the new Voice mode in Claude Code): It's slow, and manual 3) You can do something like github.com/patelnav/voca It's running TTS continuously in the background, and feeding chunks of transcripts into the Claude Code session. I tried this with a Haiku subagent/team-member in parallel to be more intelligent in clustering the transcript and deciding what to send to the host, but the latencies were too long and the experience was bad. Give this a whirl and let me know what you think!

English

779

Nav Patel@patelnav·5 Mar

@threepointone I have an entire startup thesis informed by MacRumors Buyer's Guide

English

sunil pai@threepointone·3 Mar

oh yeah today’s the day macrumours .com sigh macrumors .com every damn time

English

6.7K

Nav Patel@patelnav·5 Mar

@trq212 Any chance ya'll will open up the voice API for plugins? I'm using github.com/patelnav/voca I can have CC sit and listen and take notes. Uses parakeet for TTS

English

Thariq@trq212·3 Mar

Voice mode is rolling out now in Claude Code. It’s live for ~5% of users today, and will be ramping through the coming weeks. You'll see a note on the welcome screen once you have access. /voice to toggle it on!

English

1.1K

1.3K

17.2K

3.6M

Nav Patel@patelnav·5 Mar

@typecraft_dev @opencode Those fingers typing were like daggers in my heart

English

295

Chris Power@typecraft_dev·4 Mar

My submission to be the next @opencode CEO. thank you for your consideration

English

1.1K

86.3K

Nav Patel@patelnav·3 Mar

@mattpocockuk You know it’s working if your brain hurts from thinking too hard

English

Matt Pocock@mattpocockuk·3 Mar

My /grill-me skill just asked me 24 consecutive questions I've been sat here, writing a PRD, for an hour This is what software development has become (and I love it)

English

87.6K

Nav Patel@patelnav·2 Mar

@myprasanna Agree. There's lots of noise with models ⨯ harness ⨯ parallelization. Throughput isn't the limiting factor, like you said each model has unique strengths; they can collaborate to break the local maxima of each ones' intelligence. Building this (with this) right now. 😅

English

938

Prasanna S@myprasanna·2 Mar

I’m starting a new co in the AI coding space. Hiring the founding team members. I’ve been using the product non stop for the last month and it’s mind blowing. I think models are having different strengths right now and an ideal coding harness should run in the cloud and tap them. It should also be super simple for a lay user to configure a harness to fix common model errors you experience day to day, by throwing more model token combos at the problem. I have burnt $50k last month on this and the emergent intelligence has been epic. Lot of these we have good ideas on how to keep the intelligence and lower costs. Will release more product details soon. Join us if you’d like to work on this. Epic time for the intelligence take off and coding is driving the singularity now. What a time to be an engineer.

English

563

58K

Nav Patel@patelnav·2 Mar

Interesting split in Qwen3.5 397B results: BullshitBench: #2 after the Anthropic models AA-Omniscience: 89% hallucination rate, near worst It appears to be a highly capable at reasoning but has been trained (or RLHF'd) to always give an answer. It can detect when your logic is broken, but it can't detect when its own knowledge is insufficient.

English

1.2K

Peter Gostev (@aiDotEngineer in London)@petergostev·2 Mar

BullshitBench v2 is out! It is one of the few benchmarks where models are generally not getting better (except Claude) and where reasoning isn't helping. What's new: 100 new questions, by domain (coding (40 Q's), medical (15), legal (15), finance (15), physics(15)), 70+ model variants tested. BullshitBench is already at 380 starts on GitHub - all questions, scripts, responses and judgements are there so check it out. TL;DR: - Results replicated - @AnthropicAI latest models are scoring exceptionally well - @Alibaba_Qwen is another very strong performer - OpenAI and Google models are not doing well and are not improving - Domains do not show much difference - rates of BS detection are about the same across all domains - Reasoning, if anything, has negative effect - Newer models don't do that much better than older ones (except Anthropic) Links: - Data explorer: petergpt.github.io/bullshit-bench… - GitHub: github.com/petergpt/bulls… Highly recommend the data explorer where you can study the data and the questions & sample answers.

English

793

238.2K

Keşfet

@GoogleAIStudio @antigravity @DynamicWebPaige @sundarpichai @pmarca @botanium @garrytan @el_PA_B