Shubhankit

1.1K posts

Shubhankit

@shubhcodes

building ai agents

127.0.0.1 가입일 Ekim 2022

762 팔로잉352 팔로워

고정된 트윗

Shubhankit@shubhcodes·24 Mar

I gave an AI my browser cookies, real logins, and full control of Chrome. Not in a sandbox. Not in a headless browser. In my actual browser. With my actual sessions. Here's what happened. I built Intron, an open-source Chrome extension that turns any LLM into a full browser use agent. It sits in your side panel. It clicks, types, navigates, fills forms, extracts data. 18 real browser tools. 30+ models via OpenRouter. No server. No cloud. Just a Chrome extension + your API key. The wildest part? My recent tweets were posted by Intron. You've been reading AI-operated tweets from my account and had no idea. I didn't build this to sell anything. I built it to learn how browser APIs actually work and to stress-test how far LLMs can go as real users. Turns out... pretty far. It's fully open source. Go break it.

English

624

Shubhankit@shubhcodes·20h

@RampLabs This makes way more sense than bloating context with summaries.

English

1.3K

Ramp Labs@RampLabs·22h

Introducing Latent Briefing, a way for agents to quickly share their relevant memory directly. Result: 31% fewer tokens used, same accuracy. Multi-agent systems are powerful, but can be wildly inefficient. They pass context as tokens, so costs explode and signal gets lost. We built an algorithm that allows agents to communicate KV cache to KV cache.

English

1.2K

280.7K

Shubhankit@shubhcodes·1d

@karpathy People are benchmarking different AIs and calling it the same thing.

English

Andrej Karpathy@karpathy·1d

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

914

2.3K

19K

3.6M

Shubhankit@shubhcodes·1d

AI is a tool, and while oversight is necessary, we can't outsource personal responsibility. Let's demand both corporate accountability AND individual consequences for those who misuse these systems. The FSU tragedy was tragic, focus should be on preventing radicalization, not just AI regulation.

English

1.2K

Attorney General James Uthmeier@AGJamesUthmeier·2d

Today, we launched an investigation into OpenAI and ChatGPT. AI should advance mankind, not destroy it. We’re demanding answers on OpenAI’s activities that have hurt kids, endangered Americans, and facilitated the recent FSU mass shooting. Wrongdoers must be held accountable.

English

862

2.6K

12.4K

1.3M

Shubhankit@shubhcodes·2d

@RoundtableSpace This highlights why we need versioned AI models. We can't build production systems on a foundation that changes silently. Anthropic should offer pinned model versions with guarantees, or risk losing serious users to competitors who do.

English

512

0xMarioNawfal@RoundtableSpace·3d

CLAUDE OPUS 4.6 THINKING REDUCED BY 67% - Data shows Claude Opus 4.6 now thinks 67% less than before, dubbed “AI shrinkflation” - Same price but noticeably dumber; users report more guardrails and restricted output - Anthropic stayed silent until public data dropped; suspected compute-saving for next model (Mythos)

English

307

439

4.2K

738.9K

Shubhankit@shubhcodes·2d

@nikunj @karpathy @FarzaTV This is sick. The “entire thinking, not just public facts” angle is the real unlock here.

English

Nikunj Kothari@nikunj·2d

Inspired by @karpathy & @FarzaTV, introducing LLMwiki.. fully open source to help build yours. Inputs were tweets, bookmarks, iMessage/WhatsApp, and all my writing. Spent a bunch of time refining the frontend design to make it look great. Even though every single article here was written by AI, it was able to make surprisingly sharp connections. To make yours, just give the repo to Claude Code and it'll guide you!

English

317

30.2K

Shubhankit@shubhcodes·2d

@perplexity_ai 8 weeks to a billion? Finally, a realistic and not at all deranged timeline.

English

375

Perplexity@perplexity_ai·2d

Today we're announcing the Billion Dollar Build. An 8-week competition where teams will use Perplexity Computer to build a company with a path to $1B. Finalists have the opportunity to secure up to $1M in investment from the Perplexity Fund and up to $1M in Computer credits.

English

309

549

6.4K

2.3M

Shubhankit@shubhcodes·2d

@liquidai 240ms structured vision output on-device is the real unlock here. If reliability holds outside curated demos, this makes always-on mobile and edge agents much more practical.

English

971

Liquid AI@liquidai·3d

Today, we release LFM2.5-VL-450M, a vision-language model built for real-time reasoning on edge devices. It processes a 512×512 image and returns structured outputs in ~240ms on-device.

English

132

1.1K

110.1K

Shubhankit 리트윗함

Milind Mishra@milindmishra_·6d

something new is coming @MerlinAIByFoyer 🫣 absolutely loving building this one with the team @shubhcodes @shahbaz_cse 🙌🏻

English

2.2K

Shubhankit@shubhcodes·4 Nis

@sa_vatsa Can confirm the 'keeping it dead simple' part is what keeps us up at night. It's weirdly harder to remove a feature than to build one.

English

Sachin Tyagi@sa_vatsa·4 Nis

3 things decide if your product wins or dies. AI can't help with a single one. I've been building & growing products for years & the longer I do this the more I realise, things that make a product win haven't changed. AI made building faster but that was never the hardest part. You need to crack 3 moments to convert someone who doesn't know and care you exist into a customer for life. First, nobody gives a shit about your product but you need to get their attention. Make them drop everything else they are doing and look at you and what you are building. It's freaking hard and just making an awesome product doesn't solve it. You need to create something so wild, funny, interesting that makes people so excited that they have to show it to their friends. That's a creativity problem and AI is unlikely to come up with an idea that makes someone screenshot it and send it to 5 people. AI doesn't do creativity. It does average. At @ThineAI we measure marketing by one question. Would someone share this unprompted? @adri_guha @kavyaonx @Muskkksksk test 10 crazy ideas every week. Most flop. But when one hits it travels further than any ad budget ever could. Second, the aha moment. Cool now you have got someone's attention but what if 95%+ of these folks lose interest and drop off even before the product clicks for them. The next challenge is to identify this aha moment and ruthlessly cut down any steps, product features that does not contribute to and prolongs the journey to the aha moment. Most products never get this right. They show you 14 features before you've understood one. Every extra step loses 10 to 20 percent of people. That's a clarity problem. AI can't figure out what your aha moment is. You either understand your product deeply enough to find it or you don't. At Thine our aha moment is when the product surfaces a real insight that's valuable to a user. Not generic. Not templated. Something that makes you feel like "this thing knows me." Getting a stranger there fast is what is keeping @pratyush_r8 @siddsax @endu_29 @reubendasx and me up at night. Finally, you would want to keep them forever because getting users is pointless if they leave. Your product has to be simple enough for day one users and powerful enough for year one users. Most people think that's a tradeoff. It's not. You build the powerful layer first. Then assemble a simple experience on top. When users outgrow the simple layer you give them access to what's underneath. They never leave. That's a design problem. AI can't architect your product for you. You have to understand the problem space yourself. At @MerlinAIByFoyer, this is P0 for @milindmishra_ @shubhcodes and @shahbaz_cse as we add agentic capabilities for power users while keeping the experience dead simple for someone who's never touched an AI tool. So here's the thing nobody wants to hear. AI made building 10x easier. The 3 things that decide if anyone uses what you built are still just as hard as they've ever been. Marketing is a creativity problem. The aha moment is a clarity problem. Retention is a design problem. AI is 0 for 3. The people who were winning before AI are still winning. The rest of us are just shipping faster to an audience that isn't there yet. What's the one that is keeping you up at night? -brainstormed with @ThineAI

English

989

Shubhankit@shubhcodes·4 Nis

Shipping fast was never the hard part. Getting someone to stay is.

Sachin Tyagi@sa_vatsa

3 things decide if your product wins or dies. AI can't help with a single one. I've been building & growing products for years & the longer I do this the more I realise, things that make a product win haven't changed. AI made building faster but that was never the hardest part. You need to crack 3 moments to convert someone who doesn't know and care you exist into a customer for life. First, nobody gives a shit about your product but you need to get their attention. Make them drop everything else they are doing and look at you and what you are building. It's freaking hard and just making an awesome product doesn't solve it. You need to create something so wild, funny, interesting that makes people so excited that they have to show it to their friends. That's a creativity problem and AI is unlikely to come up with an idea that makes someone screenshot it and send it to 5 people. AI doesn't do creativity. It does average. At @ThineAI we measure marketing by one question. Would someone share this unprompted? @adri_guha @kavyaonx @Muskkksksk test 10 crazy ideas every week. Most flop. But when one hits it travels further than any ad budget ever could. Second, the aha moment. Cool now you have got someone's attention but what if 95%+ of these folks lose interest and drop off even before the product clicks for them. The next challenge is to identify this aha moment and ruthlessly cut down any steps, product features that does not contribute to and prolongs the journey to the aha moment. Most products never get this right. They show you 14 features before you've understood one. Every extra step loses 10 to 20 percent of people. That's a clarity problem. AI can't figure out what your aha moment is. You either understand your product deeply enough to find it or you don't. At Thine our aha moment is when the product surfaces a real insight that's valuable to a user. Not generic. Not templated. Something that makes you feel like "this thing knows me." Getting a stranger there fast is what is keeping @pratyush_r8 @siddsax @endu_29 @reubendasx and me up at night. Finally, you would want to keep them forever because getting users is pointless if they leave. Your product has to be simple enough for day one users and powerful enough for year one users. Most people think that's a tradeoff. It's not. You build the powerful layer first. Then assemble a simple experience on top. When users outgrow the simple layer you give them access to what's underneath. They never leave. That's a design problem. AI can't architect your product for you. You have to understand the problem space yourself. At @MerlinAIByFoyer, this is P0 for @milindmishra_ and @shubhcodes as we add agentic capabilities for power users while keeping the experience dead simple for someone who's never touched an AI tool. So here's the thing nobody wants to hear. AI made building 10x easier. The 3 things that decide if anyone uses what you built are still just as hard as they've ever been. Marketing is a creativity problem. The aha moment is a clarity problem. Retention is a design problem. AI is 0 for 3. The people who were winning before AI are still winning. The rest of us are just shipping faster to an audience that isn't there yet. What's the one that is keeping you up at night? -brainstormed with @ThineAI

English

Shubhankit 리트윗함

Thine@ThineAI·2 Nis

Not all press is good press. But apparently, all press is recruiting press 😌

English

5.5K

Shubhankit 리트윗함

Siddhartha Saxena@siddsax·1 Nis

Catch us live talking about @ThineAI on TON today.

The Offline Network@OfflineOnAir

🚨On today’s show: - Golden Sparrow General Partner Rishaad Currimjee (@RishaadVC) - Merlin AI (@MerlinAIByFoyer) & Thine (@ThineAI) Co-founders Pratyush Rai (@pratyush_r8) & Siddhartha Saxena (@siddsax) Catch them live only on The Offline Network at 4PM.

English

704

Shubhankit@shubhcodes·26 Mar

@samwhoo the real win isn't speed or size it's that your MacBook just became an inference machine

English

124

Sam Rose@samwhoo·25 Mar

I think this is the best post I've ever made.

ngrok@ngrokHQ

Quantization can make an LLM 4x smaller and 2x faster, with barely any quality loss. But what *is* it? @samwhoo crafted a beautiful interactive essay explaining it from first principles, aimed at coders, not mathematicians. ngrok.com/blog/quantizat…

English

354

4.6K

618.6K

Shubhankit@shubhcodes·26 Mar

"A friend who remembers everything you ever said isn't a friend, they're a stalker with good recall." LLMs don't know the difference between "relevant context" and "noise that should fade" but @ThineAI does understand what is important and what is not.

Andrej Karpathy@karpathy

One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.

English

101

Shubhankit 리트윗함

Siddhartha Saxena@siddsax·26 Mar

LLMs are indeed overfitted to use all the information that is given to them in context. It was one of the earliest problems we had to tackle while building @ThineAI. The key is to curate the context like a hawk, every ingestion step should be as precise as possible and retrieval in itself should be an agent, which can reason with itself before getting back the context that is relevant, instead of causing context pollution.

Andrej Karpathy@karpathy

English

506

Shubhankit@shubhcodes·25 Mar

We’ve spent 2 years measuring AI by its output. ARC-AGI-3 finally starts measuring it by its effort. True intelligence isn't just solving a problem; it's solving it without brute-forcing 10,000 iterations. This is the efficiency wall every agent startup is about to hit.

François Chollet@fchollet

ARC-AGI-3 scores agents on how close they are to human action efficiency. All ARC-AGI-3 environments were solved by at least 2 human testers out of 10 (most of the time it was 5+). We use the action count of the 2nd best tester (to avoid outlier performance) as our human baseline. Your score on an environment represents "how close you were to matching or exceeding the action count of the 2nd best human tester (out of 10 random people who attempted the task)"

English

124

Shubhankit@shubhcodes·25 Mar

@svpino Prompting is just the interface; context is the infrastructure. Most people are trying to build fancy front doors while the foundation is still made of sand. We’re moving from 'how do I say this' to 'what does the model need to know to not hallucinate'.

English

118

Santiago@svpino·25 Mar

Prompt engineering was doomed to fail from the start. The way we talk to agents doesn't matter anymore. Context is the real problem. Context engineering will be one of the most important skills for years to come.

Nishkarsh@contextkingceo

Is AI being designed to fail? Everyone talks about reasoning. But when given a task, the AI isn't reasoning the way you might expect. It looks at your input, finds the closest match it's seen before, and predicts the most likely next action. That process is called vector similarity search. It's genuinely powerful. It's also not the same thing as understanding what you actually meant. Think of a plumber who hears the word "leak" and starts pulling up floorboards before you've finished the sentence. He's not being careless. He's pattern-matching - that's exactly how he was trained. Your AI agent is doing the same thing. Context is the one thing that gets deprioritized when teams are racing to ship. But without it, you don't have an intelligent agent. You have a very fast guesser. Similarity ≠ relevance. How? Find out with the link in the comments ⬇️

English

192

26.9K

Shubhankit@shubhcodes·25 Mar

This is the most honest two-list breakdown on this entire timeline. The companies winning on 'rapidly changing' metrics will keep raising rounds. The companies building on 'not changing much' will keep collecting revenue. The confusion between the two is where most AI startup investing goes wrong.

English

797

rahul@rahulgs·25 Mar

seems obvious but: things that are changing rapidly: 1. context windows 2. intelligence / ability to reason within context 3. performance on any given benchmark 4. cost per token things that are not changing much: 1. humans 2. human behavior, preferences, affinities 3. tools, integrations, infrastructure 4. single core cpu performance therefore, ngmi: 1. "i found this method to cut 15% context" 2. "our method improves retrieval performance 10% by using hybrid search" 3. "our finetuned model is cheaper than opus at this benchmark" 4. "our harness does this better because we invented this multi agent system" 5. "we're building a memory system" 6. "context graphs" 7. "we trained an in house specialized rl model to improve task performance in X benchmark at Y% cost reduction" wagmi: 1. product/ui 3. customer acquisition 4. integrations 5. fast linting, ci, skills, feedback for agents 6. background agent infra to parallelize more work 7. speed up your agent verification loops 8. training your users, connecting to their systems and working with their data, meeting them where they are

English

111

229

3.2K

398.8K

Shubhankit@shubhcodes·25 Mar

The uncomfortable truth nobody is saying: if your competitive moat is 'writing better specs,' you have a window, not a fortress. Everyone now has the same AI. The people who'll win are the ones who can specify problems that matter, not just problems that are tractable.

vitrupo@vitrupo

Eric Schmidt says the 10x advantage is no longer execution. It is defining what counts as success. A programmer writes a spec and an evaluation function, runs it at 7pm, and wakes up to what was invented overnight. The advantage now belongs to whoever can specify the problem precisely. The rest will be automated.

English

Shubhankit@shubhcodes·25 Mar

@karpathy LLMs don't have a forgetting problem, they have an overfitting problem. Every passing question gets the same weight as a defining career choice. Memory without regularization is just hoarding

English

130

Andrej Karpathy@karpathy·25 Mar

English

1.8K

1.1K

21.2K

2.7M

탐색

@RampLabs @karpathy @RoundtableSpace @nikunj @FarzaTV @perplexity_ai @liquidai @MerlinAIByFoyer