Bin Rong

2.5K posts

Bin Rong banner
Bin Rong

Bin Rong

@brong78

https://t.co/vAatY6RIwn

Katılım Nisan 2011
5.9K Takip Edilen197 Takipçiler
Bin Rong
Bin Rong@brong78·
@kwindla @nilesh__hirani Not for complicated workflows — it seems to be an impossible triangle between latency (under 700-800ms), intelligence, and price.
English
0
0
0
15
kwindla
kwindla@kwindla·
@nilesh__hirani GPT-4.1, Haiku 4.5. Or for simple workflows you can prompt very cleanly, self-hosted Nemotron, Qwen, or Gemma 4.
English
1
0
0
262
kwindla
kwindla@kwindla·
Gemini 3.5 Flash is out today. Here are numbers from my main voice and task agent benchmarks. Some notes: All the Gemini 3 models so far are too slow to work well for voice agents. Gemini 2.5 Flash was a *great* model for voice agents, when it was SOTA. It was fast and good at instruction following. Its big weakness was tool calling. It was quite difficult to prompt Gemini 2.5 Flash to perform tool calling reliably in long context, multi-turn use cases. With Gemini 3, Google improved the tool calling issues a lot. But time to first token is ~1s. We really need TTFT down below 700ms. Google isn't alone in this. All the SOTA models released this year have been reasoning models that aren't optimized for low latency. Claude Haiku 4.5 (released last October) remains the best-performing model with a TTFT under 700ms. Gemini 3.5 Flash is the first Flash model in the 3 family to be released as "generally available." It's quite different from gemini-3-flash-preview, which was released last December. That model actually scored a bit better on my voice agent benchmark. This new model is the new overall top scorer on my task agent benchmark. This benchmark tests a multi-turn task, requiring that models achieve a P50 turn execution time faster than four seconds. Gemini 3.5 Flash with a "high" thinking budget scores significantly better than any other model I've tested. So even though the TTFT isn't what we'd like to see from this model, the overall generation speed makes up for it, and allows us to use the "high" thinking budget and still achieve a per-turn P50 under two seconds. Very impressive. This performance costs money, though. I had become accustomed to thinking of Gemini models as aggressively priced. But Gemini 3.5 Flash is actually more expensive than GPT-5.4 and Claude Sonnet 4.6 on this benchmark. Also note that lower reasoning settings don't always save money. Gemini 3.5 Flash "minimal" costs more, on this benchmark, than "high," because it makes more mistakes, so it uses more tokens to complete the task. Please note that performance of this model on your benchmarks might be very different. My voice agent and task agent results are often wildly out of line with the reported results on standard benchmarks in the model cards and release notes. The voice agent benchmark is 30 turns, and heavily tests tool calling in a long-context scenario. The task agent benchmark injects large streams of structured data events into the context, all tool calls are asynchronous, and the test task takes at least 32 turns to complete. (My motto for evals is "30 turns or it didn't happen.") Make your own benchmarks! (And post the source code and the results for different models, if you can.)
kwindla tweet mediakwindla tweet mediakwindla tweet media
English
14
9
114
14.1K
Desh Raj
Desh Raj@rdesh26·
@krandiash Great analogy! I have a similar mental model of how things will turn out: a small "interaction" model which is voice-native and can operate in real-time, and a larger "problem-solving" model which can do reasoning etc. and operate asynchronously.
English
1
0
10
1.8K
Karan Goel
Karan Goel@krandiash·
I personally subscribe to the idea that in the near-term model systems will be built with 2 tiers of models. I like to think of these 2 tiers as whales and dolphins (I'm sure there's a better analogy...). Whales are giant models that run deep inside the data center. They're slow, use massive compute resources and solve hard problems. They can access and use specialized knowledge and execute long-running workflows. Dolphins run on the user <-> system surface. Their job is to directly interface with humans, collaborate, strategize, carry context, communicate effectively and generally keep humans happy and satisfied. They are good at summarizing information, they are clever at using tools and harnessing compute-intensive whales to get things done. Dolphins need to be fast, have lower power usage, have the option to run on-device or on edge compute, and otherwise must be capable of being run all the time. Most of the models we have today are whale-ish. Dolphin models are basically non-existent today (not fast enough, not enough context, use too much energy, not multimodal enough, can't interact very effectively with humans, and small models aren't smart enough). Dolphins offloading work onto whales is similar to humans offloading reasoning, using tools and databases -- computers, notepads, etc etc. All of this is about the relative intelligence of these two kinds of models and where they might sit. It would be a mistake to assume that dolphin models will be unintelligent (absolute sense), they will be much smarter than today's frontier models. (So yes voice LMs should know what to say.) To build a single model that can do the job of both, you would need a very big energy source and some pretty big advancements in accelerators and model architectures so everything can be done on your person. That would also change my thinking about this by a lot (and I would be influenced by some subset of those things happening). (We're working on the dolphins.)
Vinod Khosla@vkhosla

But the voice LLM still has to call a large LLM to have the intelligence to k ow what to say.

English
20
11
198
26.5K
Bin Rong retweetledi
Astro Greek
Astro Greek@astro_greek·
"There are way easier places to work, but nobody ever changed the world on 40 hours a week" - Elon Musk
Astro Greek tweet media
English
58
126
846
15.2K
Bin Rong retweetledi
Dealroom.co
Dealroom.co@dealroomco·
From Seed to $100M revenue: the funds that pick the winners. Unicorn valuations tell one story. Revenue tells another. Y Combinator leads with 94 companies, ahead of SV Angel (70) and 500 Global (36). Featuring in this top 20 (and ties) list represents 99th percentile performance. Top 20 investors by $100M+ revenue companies backed at Seed 👇
Dealroom.co tweet media
English
10
29
222
74.6K
Bin Rong retweetledi
Dealroom.co
Dealroom.co@dealroomco·
Only 11% of investors have ever backed a Seed winner that crossed $100M in revenue. Just ~1.5% of VC-backed startups ever reach $100M+ in revenue, so backing even one is rare. Of the 19,581 investors in our dataset, 17,445 (89.1%) have never done it; another 8.2% have done it exactly once. Just 25 investors — the top 0.13% — have produced 11 or more such winners. That handful drives a disproportionate share of venture returns. Explore the full ranking: dealroom.co/power-law-inve…
Dealroom.co tweet media
English
3
12
120
328.1K
Bin Rong retweetledi
Lenny Rachitsky
Lenny Rachitsky@lennysan·
My biggest takeaways from Claude Code's Head of Product @_catwu: 1. Anthropic’s product development timelines have gone from six months to one month, sometimes one week, sometimes one day. Part of this acceleration is access to the latest models (i.e. Mythos). Another is shipping new products into “research preview,” making clear it's early, experimental, and might not be supported forever. Another is an evergreen "launch room "where engineers post ready features and marketing turns around announcements the next day. 2. The PM role is shifting from coordinating multi-month roadmaps to enabling teams to ship daily. As Cat puts it, “There should be less emphasis on making sure you are aligning your multi-quarter roadmaps with your partner teams and more emphasis on, OK, how can we figure out the fastest way to get something out the door?” 3. The most efficient shipping unit is an engineer with great product taste. On Cat’s team, many engineers go end-to-end—from seeing user feedback on Twitter to shipping a product by the end of the week—without a PM involved. Also, almost all the PMs on the Claude Code team have either been engineers or ship code themselves, and the designers have been front-end engineers. The roles are merging, and the most valuable skill is product taste, not job title. 4. Build products that are on the edge of working. Claude Code’s code review product failed multiple times because earlier models weren’t accurate enough. But because the prototype was already built, they could swap in Opus 4.5 and 4.6 and immediately test whether the gap was closed. Teams that wait for the model to be ready will always be a cycle behind. 5. The most underrated skill for building AI products is asking the model to introspect on its own mistakes. Cat regularly asks the model why it made an unexpected decision. The model will explain that something in the system prompt was confusing, or that it delegated verification to a subagent that didn’t check its work. This reveals what misled the model so the team can fix the harness. 6. Every model release forces their team to revisit existing products and audit their system prompt to remove features the model no longer needs. Claude Code’s to-do list was a crutch for earlier models that couldn’t track their own work. With Opus 4, the model handles it natively. Features built as scaffolding for weaker models become debt when the model catches up—so the team actively strips them. 7. Anthropic employees build custom internal tools instead of buying SaaS products. A sales team member built a web app that pulls from Salesforce, Gong, and call notes to auto-customize pitch decks—work that used to take 20 to 30 minutes now takes seconds. Their core stack is Claude Code, Cowork, and Slack. No Notion, no Linear, no Figma. 8. People underestimate how much Claude’s personality contributes to its success. As Cat describes it, “When you reflect on everyone you’ve worked with, there’s just some people where you’re like, I really like their energy, their vibe.” Claude is designed to be low-ego, positive, competent, and earnest—qualities that make it feel like a great coworker, not just a tool. This isn’t cosmetic; it’s what makes people want to use Claude for hours every day. The team has a dedicated person, Amanda, who “molds Claude’s character,” and it’s one of the hardest roles at the company because success is so subjective. 9. The future of work is managing fleets of AI agents, not doing the work yourself. Cat sees a clear progression: first, individual tasks become successful. Then people start running multiple tasks at the same time (multi-Clauding). Next, people will run 50 or 100 tasks simultaneously, which will require new infrastructure—remote execution, better interfaces for managing tasks, agents that fully verify their work, and self-improving systems that incorporate feedback. The human role shifts from doing the work to knowing which tasks to look into, verifying outputs, and giving feedback that makes the system better over time. 10. Hire people who lean into chaos and face every challenge with a smile. At Anthropic, there are weeks when a P0 on Sunday becomes a P00 by Monday and a P000 by Monday afternoon. If you get too stressed about any one thing, you’ll burn out. Their team looks for people who can look at a hard challenge and say, “Wow, that’s gonna be hard. But I’m excited to tackle it and I’m gonna do the best that I possibly can.” This mindset—optimism, resilience, and comfort with constant change—is increasingly essential as the pace of AI development accelerates. Don't miss the full conversation: youtube.com/watch?v=Pplmzl…
YouTube video
YouTube
Lenny Rachitsky@lennysan

How Anthropic’s product team moves faster than anyone else I sat down with @_catwu, Head of Product for Claude Code at @AnthropicAI, to get a peek into their unprecedented shipping pace, how AI is changing the PM role, and how to be the right amount of AGI-pilled. We discuss: 🔸 How Anthropic’s shipping cadence went from months to weeks to days 🔸 The emerging skills PMs need to develop right now 🔸 Why you should build products that don't work yet—then wait for the model to catch up 🔸 Why a 95% automation isn't really an automation 🔸 Cat’s most underrated AI skill (introspection) 🔸 What Cat actually looks for when hiring PMs now (hint: it's not traditional PM skills) Listen now 👇 youtu.be/PplmzlgE0kg

English
98
296
2.9K
841.4K
Bin Rong retweetledi
Damian Player
Damian Player@damianplayer·
this is why Elon’s companies move 10x faster than most. every founder should run their team like this: push back, ask, or execute.
Damian Player tweet media
English
181
1.2K
18.5K
963.3K
Bin Rong retweetledi
Aakash Gupta
Aakash Gupta@aakashgupta·
Paul Graham spent 20 years watching founders and found the visionary model was backwards. Bill Gates built a BASIC interpreter for a machine with a few thousand users. Mark Zuckerberg built a website so Harvard undergrads could stalk each other. Neither one knew what they were going to become. That is the opposite of how every startup school, pitch deck, and vision statement tells you to operate. You're supposed to walk in with a 10-year roadmap. TAM charts. A precise picture of the future you're building. The people who built the biggest companies didn't have a precise vision. They had a direction. Gates had "microcomputers are interesting." Zuckerberg had "Harvard undergrads will use this." That was the entire thesis. The math explains why. If your target is 10 years out and 100x bigger than anything that exists, every assumption in your model compounds error. Interest rates move. Hardware costs collapse. A competitor pivots. By year three the roadmap is a museum piece and you're optimizing for a world that never arrived. Graham's analogy is Columbus. Columbus didn't have a map of the New World. He had "there's something to the west" and a boat. The destination was wrong, the continent was wrong, the math on how far was wrong. The direction was right, and the direction was enough. The inversion every founder gets wrong: the popular image of the visionary is someone who sees the future precisely. Empirically, it's someone who sees it blurry and walks toward the blur while everyone else is drawing detailed maps of imaginary places. Gates didn't set out to dominate microcomputer software for four decades. Zuckerberg didn't set out to build a universal vacuum for human time. They started with something small that worked, and the opportunity to move came later. The VCs who fund vision decks and the founders who write them are playing the same game. The founders who actually built those companies weren't in the room.
English
15
70
447
46.9K
Bin Rong retweetledi
Ronit Pereira
Ronit Pereira@CAronitpereira·
“The secret of a long life that’s worked for me is, not to expect too much of human nature.” “And to have your life full of resentments and hatred’s is counterproductive. You’re punishing yourself.” - Charlie Munger . 2019
English
3
58
430
17.3K
Bin Rong retweetledi
Chris Pisarski
Chris Pisarski@chrispisarski·
one of the best sales advice we picked up during YC is the "McKinsey Model" a lot of deals at early-stage startups die for the same reason: your champion is afraid to advocate for your product if they push for it internally and it doesn't work out, their job is on the line so they never come back to you and hit you with the "we need to align internally first" that's why you need to be their McKinsey consultant: instead of them pitching, you personally take the blame after every demo, send them: - a one-pager - a security doc - an ROI calculator with their numbers - useful context/overview of your industry that can help with what they're struggling with right now - a pre-written slack message they can forward make it as easy as possible for your champion to forward your material without them feeling responsible for integrating your solution or "fighting" for it
English
52
106
2.4K
354K
Bin Rong retweetledi
Lenny Rachitsky
Lenny Rachitsky@lennysan·
.@rabois: Most companies hire ammunition when they really need barrels. "Most companies raise money, and then they hire a lot of people. And then the CEO, almost without exception, gets frustrated because they've hired a lot of people, their burn rate has increased, and they don't feel like more is getting accomplished per unit of time. The fundamental driver of this is the number of people that can independently drive an initiative from inception to success is very limited within most companies. If you hire more people without expanding the number of what I call 'barrels,' that can drive ideas from inception to success, all you're doing is stacking people behind the same initiatives." The ratio of barrels to ammunition is what determines the number of important things a company can pursue simultaneously.
Lenny Rachitsky@lennysan

"High performance machines don't have psychological safety. They're about winning." Keith Rabois (@rabois) was COO of Square, part of the PayPal Mafia, an early investor in Stripe, Palantir, Airbnb, DoorDash, and Ramp, and a 2x founder. He's spent 25 years obsessing over how to build world-class teams. In our in-depth conversation, we discuss: 🔸 How to identify undiscovered talent 🔸 Keith's barrels vs. ammunition hiring framework 🔸 The three traits of the best-performing companies right now 🔸 Why talking to customers is actively harmful for consumer products 🔸 Why the PM role is dying 🔸 The specific interview question he asks every senior candidate 🔸 Why CMOs (not engineers) are becoming the #1 consumer of AI tokens Watch now 👇 youtu.be/xCd9ykretlg

English
14
19
167
66.5K
Bin Rong retweetledi
Bin Rong retweetledi
Masato Ota
Masato Ota@ottamm_190·
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering arxiv.org/abs/2604.08224
Masato Ota tweet media
English
7
79
428
34.3K
Bin Rong retweetledi
Molly O’Shea
Molly O’Shea@MollySOShea·
BREAKING: Palantir CTO Shyam Sankar's "Hamburger Rule" Finally Revealed. "The hamburger rule is: bureaucrats & really anyone at a big company, they just want to do whatever it takes to go back and eat their lunch, eat their hamburger. So you need to make it easier for them to do what you want, so that they can get back to their hamburger than doing nothing. If you're dealing with bureaucrats, government, or large organizations, how do you make it easier for them to get back to eating their hamburger? Which is actually like a fundamental lesson in enterprise sales that is a really helpful articulation." @ssankar @CobiBGantz
Molly O’Shea@MollySOShea

ICYMI: Chapter (@askchapter) Raised $100M Series E at a $3B Valuation Led By Al Gore’s Generation Investment Management - 2X valuation <1 year - Total funding ~$285M - 3x revenue growth in 2025, surpassing $100M in ARR - "Past 4 rounds have been preempted" Founded by @CobiBGantz, Chapter is building the leading AI retirement platform. New investors: Fifth Down Capital & @8vc Existing investors: Stripes, XYZ VC (@fubini), Addition, Narya Capital (@JDVance), @SusaVentures, & Maverick Ventures. PS - Cobi also shares key lessons from Palantir about talent density.. including @ssankar's “hamburger rule.” 00:00 Big Funding Announcement 00:18 How Al Gore Joined 01:01 Bipartisan Backing 01:19 JD Vance Investor Era 01:48 Total Raised & Valuation 02:00 Vision for Retirees 02:53 What Makes Chapter Different 04:16 Hot Take: Healthcare > Defense 04:59 LA Medicare Fraud Explained 06:13 Cutting Waste in Medicare 07:05 Palantir Talent Lessons 08:10 Hamburger Rule Wisdom 09:19 Looking Ahead This Year

English
24
78
1.3K
394.2K
Anjney Midha
Anjney Midha@AnjneyMidha·
Yesterday @HarryStebbings asked me what it takes to win in frontier technology 1. Technical insight 2. Religious mission 3. Militaristic execution Only a handful of businesses in every generation will ever accomplish this at scale But when they do, they transform humanity
Anjney Midha tweet media
English
15
31
279
50K
Bin Rong retweetledi
Gili Raanan // Cyberstarts
Gili Raanan // Cyberstarts@giliraanan·
Consider it more of a high bar recommendation rather than a secret formula. If your company's NARR grows from $1m -> $4m -> $16m -> $64m -> $144m, you have a very nice company.... btw starting with $3m and following the same trend is even nicer...
Harry Stebbings@HarryStebbings

The Secret Formula to $144M ARR in 5 Years: @giliraanan 4x, 4x, 3x, 3x. Net new ARR. First year: $1M. Second Year: $4M. Third Year: $12M Fourth Year: $36M Fifth Year: $144M How have growth rate expectations on companies changed in the last 12 months and is the above even growing fast enough @kirbyman01 @chetanp @mmurph @jasonlk

English
4
5
49
17.9K
Bin Rong
Bin Rong@brong78·
@reah_ai Congrats Reah. Maybe make Genimi 3 flash faster, it slows down recently.
English
0
0
1
45
reah miyara
reah miyara@reah_ai·
two months ago, I (re)joined Google! as Senior Director, Google Models, my team's mission is to bridge frontier intelligence for the enterprise and transform work. simply put, shipping industry leading models built for real world use cases. the days of clapping for hillclimbing academic benchmarks are behind us :) DMs open for feedback on Gemini, NanoBanana, Veo3, Live, Lyria, AlphaGenome, and all other models.
English
38
9
499
31.2K
Bin Rong retweetledi
Peter Walker
Peter Walker@PeterJ_Walker·
Base salary at a VC-backed company worth $25M vs one worth $250M...honestly not that different. Equity? Often very different. Is the $250M company more likely to actually be viable in the end? Yes. But the risk/reward ratio is worth thinking about. Plus small startups are fun
Peter Walker tweet media
English
17
20
181
30.7K