Shubhankit

1.1K posts

Shubhankit banner
Shubhankit

Shubhankit

@shubhcodes

building ai agents

127.0.0.1 가입일 Ekim 2022
762 팔로잉352 팔로워
고정된 트윗
Shubhankit
Shubhankit@shubhcodes·
I gave an AI my browser cookies, real logins, and full control of Chrome. Not in a sandbox. Not in a headless browser. In my actual browser. With my actual sessions. Here's what happened. I built Intron, an open-source Chrome extension that turns any LLM into a full browser use agent. It sits in your side panel. It clicks, types, navigates, fills forms, extracts data. 18 real browser tools. 30+ models via OpenRouter. No server. No cloud. Just a Chrome extension + your API key. The wildest part? My recent tweets were posted by Intron. You've been reading AI-operated tweets from my account and had no idea. I didn't build this to sell anything. I built it to learn how browser APIs actually work and to stress-test how far LLMs can go as real users. Turns out... pretty far. It's fully open source. Go break it.
Shubhankit tweet media
English
5
4
18
624
Shubhankit
Shubhankit@shubhcodes·
@RampLabs This makes way more sense than bloating context with summaries.
English
0
0
2
1.3K
Ramp Labs
Ramp Labs@RampLabs·
Introducing Latent Briefing, a way for agents to quickly share their relevant memory directly. Result: 31% fewer tokens used, same accuracy. Multi-agent systems are powerful, but can be wildly inefficient. They pass context as tokens, so costs explode and signal gets lost. We built an algorithm that allows agents to communicate KV cache to KV cache.
English
27
54
1.2K
280.7K
Shubhankit
Shubhankit@shubhcodes·
@karpathy People are benchmarking different AIs and calling it the same thing.
English
0
0
3
25
Andrej Karpathy
Andrej Karpathy@karpathy·
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English
914
2.3K
19K
3.6M
Shubhankit
Shubhankit@shubhcodes·
AI is a tool, and while oversight is necessary, we can't outsource personal responsibility. Let's demand both corporate accountability AND individual consequences for those who misuse these systems. The FSU tragedy was tragic, focus should be on preventing radicalization, not just AI regulation.
English
0
0
4
1.2K
Attorney General James Uthmeier
Today, we launched an investigation into OpenAI and ChatGPT. AI should advance mankind, not destroy it. We’re demanding answers on OpenAI’s activities that have hurt kids, endangered Americans, and facilitated the recent FSU mass shooting. Wrongdoers must be held accountable.
English
862
2.6K
12.4K
1.3M
Shubhankit
Shubhankit@shubhcodes·
@RoundtableSpace This highlights why we need versioned AI models. We can't build production systems on a foundation that changes silently. Anthropic should offer pinned model versions with guarantees, or risk losing serious users to competitors who do.
English
0
0
6
512
0xMarioNawfal
0xMarioNawfal@RoundtableSpace·
CLAUDE OPUS 4.6 THINKING REDUCED BY 67% - Data shows Claude Opus 4.6 now thinks 67% less than before, dubbed “AI shrinkflation” - Same price but noticeably dumber; users report more guardrails and restricted output - Anthropic stayed silent until public data dropped; suspected compute-saving for next model (Mythos)
0xMarioNawfal tweet media
English
307
439
4.2K
738.9K
Nikunj Kothari
Nikunj Kothari@nikunj·
Inspired by @karpathy & @FarzaTV, introducing LLMwiki.. fully open source to help build yours. Inputs were tweets, bookmarks, iMessage/WhatsApp, and all my writing. Spent a bunch of time refining the frontend design to make it look great. Even though every single article here was written by AI, it was able to make surprisingly sharp connections. To make yours, just give the repo to Claude Code and it'll guide you!
English
18
12
317
30.2K
Shubhankit
Shubhankit@shubhcodes·
@perplexity_ai 8 weeks to a billion? Finally, a realistic and not at all deranged timeline.
English
0
1
3
375
Perplexity
Perplexity@perplexity_ai·
Today we're announcing the Billion Dollar Build. An 8-week competition where teams will use Perplexity Computer to build a company with a path to $1B. Finalists have the opportunity to secure up to $1M in investment from the Perplexity Fund and up to $1M in Computer credits.
Perplexity tweet media
English
309
549
6.4K
2.3M
Shubhankit
Shubhankit@shubhcodes·
@liquidai 240ms structured vision output on-device is the real unlock here. If reliability holds outside curated demos, this makes always-on mobile and edge agents much more practical.
English
0
0
3
971
Liquid AI
Liquid AI@liquidai·
Today, we release LFM2.5-VL-450M, a vision-language model built for real-time reasoning on edge devices. It processes a 512×512 image and returns structured outputs in ~240ms on-device.
Liquid AI tweet media
English
24
132
1.1K
110.1K
Shubhankit
Shubhankit@shubhcodes·
@sa_vatsa Can confirm the 'keeping it dead simple' part is what keeps us up at night. It's weirdly harder to remove a feature than to build one.
English
1
0
4
53
Sachin Tyagi
Sachin Tyagi@sa_vatsa·
3 things decide if your product wins or dies. AI can't help with a single one. I've been building & growing products for years & the longer I do this the more I realise, things that make a product win haven't changed. AI made building faster but that was never the hardest part. You need to crack 3 moments to convert someone who doesn't know and care you exist into a customer for life. First, nobody gives a shit about your product but you need to get their attention. Make them drop everything else they are doing and look at you and what you are building. It's freaking hard and just making an awesome product doesn't solve it. You need to create something so wild, funny, interesting that makes people so excited that they have to show it to their friends. That's a creativity problem and AI is unlikely to come up with an idea that makes someone screenshot it and send it to 5 people. AI doesn't do creativity. It does average. At @ThineAI we measure marketing by one question. Would someone share this unprompted? @adri_guha @kavyaonx @Muskkksksk test 10 crazy ideas every week. Most flop. But when one hits it travels further than any ad budget ever could. Second, the aha moment. Cool now you have got someone's attention but what if 95%+ of these folks lose interest and drop off even before the product clicks for them. The next challenge is to identify this aha moment and ruthlessly cut down any steps, product features that does not contribute to and prolongs the journey to the aha moment. Most products never get this right. They show you 14 features before you've understood one. Every extra step loses 10 to 20 percent of people. That's a clarity problem. AI can't figure out what your aha moment is. You either understand your product deeply enough to find it or you don't. At Thine our aha moment is when the product surfaces a real insight that's valuable to a user. Not generic. Not templated. Something that makes you feel like "this thing knows me." Getting a stranger there fast is what is keeping @pratyush_r8 @siddsax @endu_29 @reubendasx and me up at night. Finally, you would want to keep them forever because getting users is pointless if they leave. Your product has to be simple enough for day one users and powerful enough for year one users. Most people think that's a tradeoff. It's not. You build the powerful layer first. Then assemble a simple experience on top. When users outgrow the simple layer you give them access to what's underneath. They never leave. That's a design problem. AI can't architect your product for you. You have to understand the problem space yourself. At @MerlinAIByFoyer, this is P0 for @milindmishra_ @shubhcodes and @shahbaz_cse as we add agentic capabilities for power users while keeping the experience dead simple for someone who's never touched an AI tool. So here's the thing nobody wants to hear. AI made building 10x easier. The 3 things that decide if anyone uses what you built are still just as hard as they've ever been. Marketing is a creativity problem. The aha moment is a clarity problem. Retention is a design problem. AI is 0 for 3. The people who were winning before AI are still winning. The rest of us are just shipping faster to an audience that isn't there yet. What's the one that is keeping you up at night? -brainstormed with @ThineAI
English
6
7
20
989
Shubhankit
Shubhankit@shubhcodes·
Shipping fast was never the hard part. Getting someone to stay is.
Sachin Tyagi@sa_vatsa

3 things decide if your product wins or dies. AI can't help with a single one. I've been building & growing products for years & the longer I do this the more I realise, things that make a product win haven't changed. AI made building faster but that was never the hardest part. You need to crack 3 moments to convert someone who doesn't know and care you exist into a customer for life. First, nobody gives a shit about your product but you need to get their attention. Make them drop everything else they are doing and look at you and what you are building. It's freaking hard and just making an awesome product doesn't solve it. You need to create something so wild, funny, interesting that makes people so excited that they have to show it to their friends. That's a creativity problem and AI is unlikely to come up with an idea that makes someone screenshot it and send it to 5 people. AI doesn't do creativity. It does average. At @ThineAI we measure marketing by one question. Would someone share this unprompted? @adri_guha @kavyaonx @Muskkksksk test 10 crazy ideas every week. Most flop. But when one hits it travels further than any ad budget ever could. Second, the aha moment. Cool now you have got someone's attention but what if 95%+ of these folks lose interest and drop off even before the product clicks for them. The next challenge is to identify this aha moment and ruthlessly cut down any steps, product features that does not contribute to and prolongs the journey to the aha moment. Most products never get this right. They show you 14 features before you've understood one. Every extra step loses 10 to 20 percent of people. That's a clarity problem. AI can't figure out what your aha moment is. You either understand your product deeply enough to find it or you don't. At Thine our aha moment is when the product surfaces a real insight that's valuable to a user. Not generic. Not templated. Something that makes you feel like "this thing knows me." Getting a stranger there fast is what is keeping @pratyush_r8 @siddsax @endu_29 @reubendasx and me up at night. Finally, you would want to keep them forever because getting users is pointless if they leave. Your product has to be simple enough for day one users and powerful enough for year one users. Most people think that's a tradeoff. It's not. You build the powerful layer first. Then assemble a simple experience on top. When users outgrow the simple layer you give them access to what's underneath. They never leave. That's a design problem. AI can't architect your product for you. You have to understand the problem space yourself. At @MerlinAIByFoyer, this is P0 for @milindmishra_ and @shubhcodes as we add agentic capabilities for power users while keeping the experience dead simple for someone who's never touched an AI tool. So here's the thing nobody wants to hear. AI made building 10x easier. The 3 things that decide if anyone uses what you built are still just as hard as they've ever been. Marketing is a creativity problem. The aha moment is a clarity problem. Retention is a design problem. AI is 0 for 3. The people who were winning before AI are still winning. The rest of us are just shipping faster to an audience that isn't there yet. What's the one that is keeping you up at night? -brainstormed with @ThineAI

English
0
0
1
22
Shubhankit 리트윗함
Thine
Thine@ThineAI·
Not all press is good press. But apparently, all press is recruiting press 😌
English
4
10
41
5.5K
Shubhankit
Shubhankit@shubhcodes·
@samwhoo the real win isn't speed or size it's that your MacBook just became an inference machine
English
0
0
3
124
Shubhankit
Shubhankit@shubhcodes·
"A friend who remembers everything you ever said isn't a friend, they're a stalker with good recall." LLMs don't know the difference between "relevant context" and "noise that should fade" but @ThineAI does understand what is important and what is not.
Andrej Karpathy@karpathy

One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.

English
3
0
1
101
Shubhankit 리트윗함
Siddhartha Saxena
Siddhartha Saxena@siddsax·
LLMs are indeed overfitted to use all the information that is given to them in context. It was one of the earliest problems we had to tackle while building @ThineAI. The key is to curate the context like a hawk, every ingestion step should be as precise as possible and retrieval in itself should be an agent, which can reason with itself before getting back the context that is relevant, instead of causing context pollution.
Andrej Karpathy@karpathy

One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.

English
0
2
14
506
Shubhankit
Shubhankit@shubhcodes·
@svpino Prompting is just the interface; context is the infrastructure. Most people are trying to build fancy front doors while the foundation is still made of sand. We’re moving from 'how do I say this' to 'what does the model need to know to not hallucinate'.
English
1
0
2
118
Shubhankit
Shubhankit@shubhcodes·
This is the most honest two-list breakdown on this entire timeline. The companies winning on 'rapidly changing' metrics will keep raising rounds. The companies building on 'not changing much' will keep collecting revenue. The confusion between the two is where most AI startup investing goes wrong.
English
0
0
2
797
rahul
rahul@rahulgs·
seems obvious but: things that are changing rapidly: 1. context windows 2. intelligence / ability to reason within context 3. performance on any given benchmark 4. cost per token things that are not changing much: 1. humans 2. human behavior, preferences, affinities 3. tools, integrations, infrastructure 4. single core cpu performance therefore, ngmi: 1. "i found this method to cut 15% context" 2. "our method improves retrieval performance 10% by using hybrid search" 3. "our finetuned model is cheaper than opus at this benchmark" 4. "our harness does this better because we invented this multi agent system" 5. "we're building a memory system" 6. "context graphs" 7. "we trained an in house specialized rl model to improve task performance in X benchmark at Y% cost reduction" wagmi: 1. product/ui 3. customer acquisition 4. integrations 5. fast linting, ci, skills, feedback for agents 6. background agent infra to parallelize more work 7. speed up your agent verification loops 8. training your users, connecting to their systems and working with their data, meeting them where they are
English
111
229
3.2K
398.8K
Shubhankit
Shubhankit@shubhcodes·
The uncomfortable truth nobody is saying: if your competitive moat is 'writing better specs,' you have a window, not a fortress. Everyone now has the same AI. The people who'll win are the ones who can specify problems that matter, not just problems that are tractable.
vitrupo@vitrupo

Eric Schmidt says the 10x advantage is no longer execution. It is defining what counts as success. A programmer writes a spec and an evaluation function, runs it at 7pm, and wakes up to what was invented overnight. The advantage now belongs to whoever can specify the problem precisely. The rest will be automated.

English
0
0
1
80
Shubhankit
Shubhankit@shubhcodes·
@karpathy LLMs don't have a forgetting problem, they have an overfitting problem. Every passing question gets the same weight as a defining career choice. Memory without regularization is just hoarding
English
0
0
4
130
Andrej Karpathy
Andrej Karpathy@karpathy·
One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.
English
1.8K
1.1K
21.2K
2.7M