Nav Patel

564 posts

Nav Patel banner
Nav Patel

Nav Patel

@patelnav

Always looking to build the next interesting thing. Ex: @synthesischool @openstore @fieldscope_in @apple @wifislam @microsoft

🇮🇳 | 🇨🇦 | 🇺🇸 Katılım Ocak 2010
1.3K Takip Edilen626 Takipçiler
Nav Patel
Nav Patel@patelnav·
"Peaky" behaviour is exactly what the normies have a hard time grokking. They're used to intelligence correlating across domains. Not being able to count "r"s in strawberry but being able to solve complex math problems doesn't make sense in a pre-AI world. Working with idiot savants your whole career helps.
English
0
0
1
110
Andrej Karpathy
Andrej Karpathy@karpathy·
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English
198
291
2.5K
160K
Nav Patel
Nav Patel@patelnav·
@GoogleAIStudio @antigravity @DynamicWebPaige Google's marketing is confounding sometimes. I get that bugs people were complaining about were fixed, but why would you make an Ad that says "use us! we won't suck in this way anymore". Is your target audience dejected AI Studio and Antigravity users? @sundarpichai
English
0
0
0
180
Google AI Studio
Google AI Studio@GoogleAIStudio·
the fastest path from prompt to production just got a whole lot smarter now supercharged with agentic reasoning over your entire stack by the @antigravity coding agent @DynamicWebPaige breaks down what's new ⬇️
English
21
28
291
15.4K
Nav Patel
Nav Patel@patelnav·
Absolutely. The gap between the conceptual breakthrough and a product that wins is the defaults. OpenClaw & Pi made the conceptual breakthroughs, but the defaults are tuned towards Peter's preferences and use-case. It's grown tremendously, but I don't think it's the right shaped product for most people. (I use it extensively) The first things to get right are setup, ownership (data + infra), and giving the agent a reliable path to do "self surgery". Which @pmarca describes at the end of this video. (WIP but will launch soon)
English
0
0
1
156
Nivi
Nivi@nivi·
Marc Andreessen on the Architecture of Agents: “Your Agent Is Just Its Files”
English
3
17
170
15.4K
Nav Patel retweetledi
Garry Tan
Garry Tan@garrytan·
I’ve tried everything out there for YouTube summary and diarization and the best with my OpenClaw hands down is diarize.io Well done @patelnav
English
22
24
568
60.2K
Pierre-Antoine Bannier
sam3.cpp - Meta's SAM 3 in pure C++ with @ggerganov's ggml - Supports SAM 3.1, 3, 2.1, 2 and EdgeTAM - FP16, 4-bit quant (EdgeTAM in 15 MB) - Apple Metal GPU, CUDA, CPU - Text-prompted: "peach" → every peach - Single-file C++14 Performance-wise: - 100ms object detection, segmentation - Video object segmentation @ 20FPS on M4 Pro with EdgeTAM github.com/PABannier/sam3…
English
10
118
857
54.4K
Nav Patel
Nav Patel@patelnav·
@pmarca Is this just based on token usage? Because the Pro/Max $200 plans from OpenAI (and Anthropic until they nerfed it) gave plenty of tokens not at this high of a cost.
English
1
0
3
541
Marc Andreessen 🇺🇸
Magical OpenClaw experiences that use frontier models cost $300-1,000/day today, heading to $10,000/day and more. The future shape of the entire technology industry will be how to drive that to $20/month.
English
623
517
7.7K
1.6M
Aiden Bai
Aiden Bai@aidenybai·
Introducing React Grab Select any element on your page → tell Claude Code or Codex what to change Fully open source npx react-grab@latest
English
251
447
7.9K
941.6K
Nav Patel
Nav Patel@patelnav·
@KaiLentit jQuery? there's a new kid on the block called BackboneJS, have you heard of it? MVC is the future man.
English
0
0
1
250
Nav Patel
Nav Patel@patelnav·
It's not about apps or OS layer. It's about what the models support. You even models like Gemini Pro that are multi-modal, but they don't support stream. So it can STT, but you have to send chunks of audio for the model to process. The Gemini Live model does handle streaming, but it's a Gemini Flash class model, so that falls more in the first bucket from my tweet.
English
0
0
1
26
superscribe.io
superscribe.io@superscribeio·
@patelnav @WisprFlow the voice problem wont be solved inside individual apps. it gets solved at the OS layer where dictation streams into every field like a second keyboard
English
1
0
0
15
Nav Patel
Nav Patel@patelnav·
Voice Agents aren't a solved problem, you can: 1) Use an native voice-to-voice model that can tool-call: They are responsive but dumb. 2) You can STT manually into a smart model, like @WisprFlow / ctrlspeak (or the new Voice mode in Claude Code): It's slow, and manual 3) You can do something like github.com/patelnav/voca It's running TTS continuously in the background, and feeding chunks of transcripts into the Claude Code session. I tried this with a Haiku subagent/team-member in parallel to be more intelligent in clustering the transcript and deciding what to send to the host, but the latencies were too long and the experience was bad. Give this a whirl and let me know what you think!
English
2
0
8
779
Nav Patel
Nav Patel@patelnav·
@threepointone I have an entire startup thesis informed by MacRumors Buyer's Guide
English
0
0
0
9
sunil pai
sunil pai@threepointone·
oh yeah today’s the day macrumours .com sigh macrumors .com every damn time
English
3
0
62
6.7K
Nav Patel
Nav Patel@patelnav·
@trq212 Any chance ya'll will open up the voice API for plugins? I'm using github.com/patelnav/voca I can have CC sit and listen and take notes. Uses parakeet for TTS
English
0
0
0
27
Thariq
Thariq@trq212·
Voice mode is rolling out now in Claude Code. It’s live for ~5% of users today, and will be ramping through the coming weeks. You'll see a note on the welcome screen once you have access. /voice to toggle it on!
English
1.1K
1.3K
17.2K
3.6M
Chris Power
Chris Power@typecraft_dev·
My submission to be the next @opencode CEO. thank you for your consideration
English
52
34
1.1K
86.3K
Nav Patel
Nav Patel@patelnav·
@mattpocockuk You know it’s working if your brain hurts from thinking too hard
English
0
0
0
88
Matt Pocock
Matt Pocock@mattpocockuk·
My /grill-me skill just asked me 24 consecutive questions I've been sat here, writing a PRD, for an hour This is what software development has become (and I love it)
English
41
23
1K
87.6K
Nav Patel
Nav Patel@patelnav·
@myprasanna Agree. There's lots of noise with models ⨯ harness ⨯ parallelization. Throughput isn't the limiting factor, like you said each model has unique strengths; they can collaborate to break the local maxima of each ones' intelligence. Building this (with this) right now. 😅
English
0
0
0
938
Prasanna S
Prasanna S@myprasanna·
I’m starting a new co in the AI coding space. Hiring the founding team members. I’ve been using the product non stop for the last month and it’s mind blowing. I think models are having different strengths right now and an ideal coding harness should run in the cloud and tap them. It should also be super simple for a lay user to configure a harness to fix common model errors you experience day to day, by throwing more model token combos at the problem. I have burnt $50k last month on this and the emergent intelligence has been epic. Lot of these we have good ideas on how to keep the intelligence and lower costs. Will release more product details soon. Join us if you’d like to work on this. Epic time for the intelligence take off and coding is driving the singularity now. What a time to be an engineer.
English
97
16
563
58K
Nav Patel
Nav Patel@patelnav·
Interesting split in Qwen3.5 397B results: BullshitBench: #2 after the Anthropic models AA-Omniscience: 89% hallucination rate, near worst It appears to be a highly capable at reasoning but has been trained (or RLHF'd) to always give an answer. It can detect when your logic is broken, but it can't detect when its own knowledge is insufficient.
Nav Patel tweet media
English
0
0
7
1.2K
Peter Gostev (@aiDotEngineer in London)
BullshitBench v2 is out! It is one of the few benchmarks where models are generally not getting better (except Claude) and where reasoning isn't helping. What's new: 100 new questions, by domain (coding (40 Q's), medical (15), legal (15), finance (15), physics(15)), 70+ model variants tested. BullshitBench is already at 380 starts on GitHub - all questions, scripts, responses and judgements are there so check it out. TL;DR: - Results replicated - @AnthropicAI latest models are scoring exceptionally well - @Alibaba_Qwen is another very strong performer - OpenAI and Google models are not doing well and are not improving - Domains do not show much difference - rates of BS detection are about the same across all domains - Reasoning, if anything, has negative effect - Newer models don't do that much better than older ones (except Anthropic) Links: - Data explorer: petergpt.github.io/bullshit-bench… - GitHub: github.com/petergpt/bulls… Highly recommend the data explorer where you can study the data and the questions & sample answers.
English
48
96
793
238.2K