Sam Selvanathan

2.7K posts

Sam Selvanathan banner
Sam Selvanathan

Sam Selvanathan

@samselvanathan_

I build AI agents, ship browser LLMs, and let AI drive my kid's toy car. engineer turned product manager, still building software, ex-PayPal, startups. SF.

San Francisco, CA Katılım Aralık 2009
370 Takip Edilen237 Takipçiler
Sam Altman
Sam Altman@sama·
you can sign in to openclaw with your chatgpt account now and use your subscription there! happy lobstering.
English
1.1K
1K
21K
2.2M
Sam Selvanathan
Sam Selvanathan@samselvanathan_·
@GaryMarcus @kareem_carr A naive undergrad with good tooling, human-in-the-loop checkpoints, and clearly scoped tasks outperforms what most orgs had before. The ceiling matters less than what you build around it.
English
0
0
0
173
Sam Selvanathan
Sam Selvanathan@samselvanathan_·
@emollick Benchmarks measure capability ceilings. Running models on-device under memory pressure, the benchmark rank and actual production rank diverge completely. Kimi's 26B MoE is fast in a datacenter, but that tradeoff becomes a real liability once you hit constrained hardware.
English
0
0
0
649
Ethan Mollick
Ethan Mollick@emollick·
I find that open weights models over-perform on benchmarks compared to actual real-world usage, and Kimi feels like no exception. For example, a small amount of use will show that Kimi is not as good as Claude Opus 4.6, which it beats on the benchmarks. Still a good model, tho!
Artificial Analysis@ArtificialAnlys

Moonshot’s Kimi K2.6 is the new leading open weights model. Kimi K2.6 lands at #4 on the Artificial Analysis Intelligence Index (54) behind only Anthropic, Google, and OpenAI (all 57) Key takeaways: ➤ Increase in performance on agentic tasks: @Kimi_Moonshot's Kimi K2.6 achieves an Elo of 1520 on our GDPval-AA evaluation, which is a marked improvement over Kimi K2.5’s Elo of 1309. GDPval-AA is our leading metric for general agentic performance, measuring the performance on knowledge work tasks such as preparing presentations and analysis. Models are given code execution and web browsing tools in an agentic loop via our open source reference agentic harness called Stirrup. This continues Kimi K2.6’s strength in tool use, maintaining a 96% score on τ²-Bench Telecom, placing it among other frontier models in this category. ➤ Low hallucination rate: Kimi K2.5 scores 6 on the AA-Omniscience Index, our knowledge evaluation measuring both accuracy and hallucination rate. This score is primarily driven by a comparatively low hallucination rate of 39% (reduced from Kimi K2.5’s 65%), indicating a greater capability to abstain rather than fabricate knowledge when the model is uncertain. Kimi K2.6’s low hallucination rate places it similarly to other models such as Claude Opus 4.7 (36%) and MiniMax-M2.7 (34%) ➤ High token usage: Kimi K2.6 demonstrates high token usage, but is in line with other frontier models in the same intelligence tier. To run the full Artificial Analysis Intelligence Index, Kimi K2.6 used ~160M reasoning tokens. This is slightly lower than Claude Sonnet 4.6 (~190M reasoning tokens) but much higher than GPT 5.4 (~110M reasoning tokens). ➤ Open weights: Kimi K2.6 is a Mixture-of-Experts (MoE) model with 1T total parameters and 32B active, same as the previous two generations of models Kimi K2 Thinking and Kimi K2.5. Kimi K2.6 again pushes the open weights frontier in intelligence. ➤ Third Party Access: Kimi K2.6 is accessible through Moonshot’s First Party API as well as third party API providers Novita, Baseten, Fireworks, and Parasail ➤ Multimodality: Kimi K2.6 supports Image and Video input and text output natively. The model’s max context length remains 256k. Further analysis in the threads below.

English
68
14
458
58.8K
Sam Selvanathan
Sam Selvanathan@samselvanathan_·
Access to builders was the real bottleneck, not ideas or capital. Wabi CEO Eugenia Kuyda at a16z: software was gated by 20M professional developers until last year. Good ideas died waiting for engineers. Curious what product orgs look like when that gate disappears.
Rohan Paul@rohanpaul_ai

Software used to be gated by roughly 20 million professional developers up until last year. Good ideas still needed engineers, co-founders, time, and months of app work. Now, anyone can build. ~ Wabi CEO Eugenia Kuyda

English
0
0
0
121
Sam Selvanathan
Sam Selvanathan@samselvanathan_·
Skills as markdown files that teach agents their own tooling is the right abstraction. Two npm commands adds live web search, browser control, and URL fetch to Claude Code with no backend. via @svpino
English
0
0
0
74
Sam Selvanathan
Sam Selvanathan@samselvanathan_·
The unlock in physical AI isn't locomotion. Gemini Robotics-ER 1.6 with Spot reading complex industrial gauges and answering spatial queries with chain-of-thought reasoning is what actually bridges "robot follows path" to "robot understands space." @GoogleDeepMind
English
0
0
0
45
Sam Selvanathan
Sam Selvanathan@samselvanathan_·
@emollick The bottleneck released is usually eval, not capability. Most teams ship with the same model for months, then suddenly ship a 10x product because someone figured out what "correct" actually means for that task. The leap is internal. The model just waited.
English
0
0
0
436
Ethan Mollick
Ethan Mollick@emollick·
Soon, at each gradual improvement level of AI, you will start to see large discrete jumps in ability in economically important areas, because the previous AI ability level in some aspect of the job bottlenecked progress. When bottlenecks are released, it looks like a leap forward
English
52
28
546
31.9K
Sam Selvanathan
Sam Selvanathan@samselvanathan_·
Curious how you're handling stale trace rot. Shared memory across sessions sounds great until a wrong decision from last week outranks the right one this week. We hit this building internal agents. Without a TTL or confidence decay on traces, the "searchable brain" becomes a confidence trap.
English
0
0
0
137
Santiago
Santiago@svpino·
My agent already forgot everything we did last week. That sucks. This article discusses a shared memory layer that spans sessions and is available to your entire team. Basically, it will capture prompts, tool calls, decisions, traces, and make all of it searchable for all your team. We need infrastructure like this across the board now.
Davit@DBuniatyan

x.com/i/article/2043…

English
25
18
176
34.7K
Sam Selvanathan
Sam Selvanathan@samselvanathan_·
@AndrewYNg Calling it a PM bottleneck undersells what's actually broken. The constraint isn't deciding what to build. It's knowing what "correct" looks like after an agent ships your spec in 2 hours, perfectly wrong. Acceptance criteria just became the hardest part of the job.
English
0
0
0
544
Andrew Ng
Andrew Ng@AndrewYNg·
As AI agents accelerate coding, what is the future of software engineering? Some trends are clear, such as the Product Management Bottleneck, referring to the idea that we are more constrained by deciding what to build rather than the actual building. But many implications, like AI’s impact on the job market, how software teams will be organized, and more, are still being sorted out. The theme of our AI Developer Conference on April 28-29 in San Francisco is The Future of Software Engineering. I look forward to speaking about this topic there, hearing from other speakers on this theme, and chatting with attendees about it. We’re shaping the future, and I hope you will join me there! It is currently trendy in some technology and policy circles to forecast massive job losses due to AI. Even if they have not yet materialized, these losses certainly must be just over the horizon! I have a contrarian view that the AI jobpocalypse — the notion that AI will lead to massive unemployment, perhaps even rioting in the streets — won’t be nearly as bad as dire forecasts by pundits, especially pundits who are trying to paint a picture of how powerful their AI technology is. Among professions, AI is accelerating software engineering most, given the rise of coding agents. According to a new report by Citadel Research, software engineering job postings are rising rapidly. So if software engineering is a harbinger of the impact AI will have on other professions, this expansion of software engineering jobs is encouraging. Yes, fresh college graduates are having a hard time finding jobs. And yes, there have been layoffs that CEOs have attributed to AI, even if a large fraction of this was “AI washing,” where businesses choose to attribute layoffs to AI, even though AI has not changed their internal operations much yet. And yes, there is a subset of job roles, such as call center operator, that are more heavily impacted. Many people are feeling significant job insecurity, and I feel for everyone struggling with employment, whether or not the cause is AI-related. And many other factors, such as over-hiring during the pandemic and high interest rates, have contributed to the slowdown in the labor market, and the notion that AI is leading to unemployment is oversimplified. In software engineering, I see a lot of exciting work ahead to adapt our workflows. It is already clear that: (i) As AI makes coding easier, a lot more people will be doing it. (ii) Writing code by hand and even reading (generated) code is not that important, because we can ask an LLM about the code and operate at a higher level than the raw syntax (although how high we can or should go is rapidly changing). (iii) There will be a lot more custom applications, because now it’s economical to write software for smaller and smaller audiences. (iv) Deciding what to build, more than the actual building, is becoming a bottleneck. (v) The cost of paying down technical debt is decreasing (since AI can refactor for you). At the same time, there are also a lot of open questions for our profession, such as: - In the future, what will be the key skills of a senior software engineer? And for junior levels, what should be the new Computer Science curriculum? - If everyone can build features, what skills, strategies, or resources create competitive advantage for individuals and for businesses? - What are the new building blocks (libraries, SDKs, etc.) of software? How do we organize coding agents to create software? - What should a software team look like? For example, how many engineers, product managers, designers, and so on. What tooling do we need to manage their workflow? - How do AI agents change the workflow of machine learning engineers and data scientists? For example, how can we use agents to accelerate exploring data, identifying hypotheses, and testing them? I’m excited to explore these and other questions about the future of software engineering at AI Dev. I expect this to be an exciting event. Please join us! [Original text: The Batch newsletter.] ai-dev.deeplearning.ai
English
141
158
886
110.1K
Sam Selvanathan
Sam Selvanathan@samselvanathan_·
The race shifted from companies to countries. Sundar on 60 Minutes framing AI as a national imperative means regulation, talent, and infrastructure decisions now happen at a different level. via @60Minutes
English
0
0
3
28
Sam Selvanathan
Sam Selvanathan@samselvanathan_·
Claude inside Word as a native sidebar is the right product call. Edits surface as tracked changes you accept or reject with normal Word controls, no separate chat window. The shared context across Word, Excel, and PowerPoint is the real unlock. via @rohanpaul_ai
English
0
0
2
51
WitnezMe
WitnezMe@DC_CowboyHouse·
@samselvanathan_ @svpino Yes such inaccurate comparisons lead to a lot of wasted time and frustration for people. Tired of reading marketing
English
1
0
0
16
Santiago
Santiago@svpino·
If you ask me, Gemma 4 is one of the best models out there for a single reason: You can run it locally and it’s really, really good (probably Sonnet level?) And, of course, ya can now use it to power OpenClaw and show a middle finger to the company who doesn’t like it.
atomic.chat@atomic_chat_hq

Run OpenClaw with Gemma 4 and Atomic Chat MacBook Air M4 · 16 GB RAM · 25 tok/s No cloud! No subscription fees! Open-source local model. Runs on your regular device

English
34
22
444
48.7K