Advait Raykar

960 posts

Advait Raykar banner
Advait Raykar

Advait Raykar

@AdvaitRaykar

Wrangling LLMs to help create more ethical supply chains. Cofounder and CEO at Elm AI | @Cornell CS

NYC Katılım Haziran 2012
369 Takip Edilen305 Takipçiler
Advait Raykar
Advait Raykar@AdvaitRaykar·
was encouraging my phd friend to bail on whatever he is working on in ai, and re-focus on evals I have become desensitized to evals, because I really don't know if they mean a model is good. eg: I don't know if M2.7 or composer 2 is really as good as the evals say, there is unfortunately only one way to find out (and I don't have time for that sadly)
Federico Cassano@ellev3n11

we also released swe bench and terminal bench this time. pls no more obama medal memes

English
0
0
0
74
Advait Raykar
Advait Raykar@AdvaitRaykar·
1. anthropic cut off access to openai; will they do that to cursor? is the revenue from cursor more valuable than the revenue they will cannibalize? 2. the only conculsion i can draw is that this model exists to save business, rather than capture new business. As a primary codex user, with some claude code, I don't think the cheap pricing matters (I am already heavily subsidized)
Bloomberg@business

Cursor is taking on Anthropic and OpenAI with a new AI coding model bloomberg.com/news/articles/…

English
0
0
1
68
Advait Raykar
Advait Raykar@AdvaitRaykar·
for people in replies asking: - it's easy to set up workflows and playbooks in devin - the VM and browser use works quite well - multi-repository work - really good linear integration common workflows are: - bug triage (sees sentry, langfuse, etc) - pm review - well scoped tasks
English
0
0
11
696
Advait Raykar
Advait Raykar@AdvaitRaykar·
Our Devin usage is soaring too. The product is improving rapidly and is very good right now. I tried it last year, churned in a week. I even got on call with an engineer to give them feedback, but didn't even know where to begin. Imo, it's the best product in the category right now. Claude code and Codex are not comparable, since they don't have feature parity or the maturity Devin has for autonomous development. Maybe they will get there, but at the moment, Devin is in a league of its own.
Scott Wu@ScottWu46

Interesting stat - our enterprise customers have already done more Devin sessions (and more merged Devin PRs) in 2026 than in all of 2025. Not bad for 2-ish months into the year!

English
15
12
168
68K
Advait Raykar
Advait Raykar@AdvaitRaykar·
One of the things I have learned running a company, it's incredibly easy to fall into the trap of "fake work" that feels like real work. Big boy acronyms, fancy metrics. I also fall for it. But I am learning to see through it more clearly. I am also, learning from first principles why big companies function the way they do. So much fake work.
English
0
0
0
92
Advait Raykar
Advait Raykar@AdvaitRaykar·
real world ai usage at early stage startup #6 with the adoption of ai increasing we needed to agree on some principles, so we did an internal session on some important things, attaching the slides from the deck 1. when ai output looks polished, people stop checking 2. without domain intuition, AI output drops exponentially, and so does your work output 3. conversely with domain intuition, AI output can improve exponentially. 4. the barrier to producing has collapsed, curation and iteration is the new skill that needs to be honed when using ai
Advait Raykar tweet mediaAdvait Raykar tweet mediaAdvait Raykar tweet media
English
0
0
1
79
Convex
Convex@convex·
the slop is starting to really slop out there. @mikeysee wonders: can the code review bots like coderabbit and greptile save us?
English
5
1
33
11.6K
Advait Raykar
Advait Raykar@AdvaitRaykar·
a message I sent internally, in a similar vein (with zero proof reading) I do think today AI is a slot machine, but the odds are good, and you have some influence over the outcomes. Of course, the only thing that matters is moving on the needle on company outcomes; with or without AI (but don't do it without AI -- it's stupid and you will handicap yourself.)
Advait Raykar tweet media
Karri Saarinen@karrisaarinen

I think we have lost some sense of judgment and moderation when it comes to product building currently. The moment you turn something into a universally celebrated metric, whether that is token burn, prototype count, or percentage of agent-written code, you start losing sight of what actually matters. I have felt the same way for a long time about overusing data and A/B testing to build products. The moment you reduce product quality or productivity to a metric, you stop shipping value and start shipping numbers. A lot of what people are doing with AI makes directional sense. The missing piece is counterbalance: 1. AI should help engineers build better products. Leaderboards and adoption metrics can be useful as directional signals. They do not tell you what is being built, whether it is good, or whether it should exist at all. 2. Users do not care what percentage of your code was written by agents. They care about the outcome. Faster output is useful. Like usually, faster doesn't seem to add to quality, clarity, or stability of products. Power to build should not become an excuse to lower quality bars. 3. LLM-generated prototypes can feel like late-night whiteboarding sessions. They look exciting in the moment and feel productive very quickly. Then a few days later you realize the idea was shallow, distracting, or simply wrong. The same trap shows up in jumping straight to code and solutions more broadly. You may just be building the wrong thing more efficiently. Prototyping has its place. So do clear thinking, good design, and a real understanding of the user’s problem. In terms of activities or momentum, the main quest and the side quest can both feel productive but only one actually moves the mission forward. 4. Adding more to products is still dangerous as ever even if time or effort to add it has gone down. Every addition creates complexity, maintenance cost, and user confusion. New features should be pushed back unless they clearly show it should exist and how it improves the product. 5. Not everything needs to be an agent shaped. A simple scheduled task does not need a full LLM sandbox. Making something agentic because it feels current or impressive does not make it right-sized, correct, or effective. The core ideas are: - even if you can, maybe you should not. - more power we have to build should not reduce our need to think, it should increase it.

English
0
0
0
108
Advait Raykar
Advait Raykar@AdvaitRaykar·
@ironcarbs my favorites are the short stories by ted chiang I'd start with "the stories of your life"
English
0
0
1
32
jan
jan@ironcarbs·
What are the most forward looking and inspiring scifi books you’ve read recently? Interested in novel ideas regarding AI, the future of work, robots and nature.
English
2
1
2
369
Advait Raykar
Advait Raykar@AdvaitRaykar·
@karrisaarinen @thdxr how is ai code gen used at linear? what gains have you seen and how do you reconcile it with your zero bugs policy? which tools are the most popular? we use linear at my company, big fan!
English
1
0
1
460
Karri Saarinen
Karri Saarinen@karrisaarinen·
@thdxr 1) What An AI coding founder openly acknowledging some challenges that come with AI coding. What is this. I thought we’re all here to make the token machine go brrr. More tokens we burn, the better the world.
English
4
0
176
15.3K
dax
dax@thdxr·
we spoke to a company today who's security team is so concerned by ai code they're considering banning ai tools your first reaction might be "they're gonna get left behind" but if you are practical their concerns aren't invalid if you are a huge multi national org with tens of thousands of employees and they just got a button that appears to do their work, it's gonna get pushed a lot and the process around knowing what is making it to production is totally melting being honest we're all getting a bit lazier see that kiro related aws outage as a real life example so they're genuinely arguing over how much this is going to be allowed esp since the net productivity gains for the average dev seem to be pretty low
English
181
110
2.5K
276.9K
Advait Raykar
Advait Raykar@AdvaitRaykar·
Even human intelligence is surprisingly jagged
English
0
0
0
39
Advait Raykar
Advait Raykar@AdvaitRaykar·
real world ai usage at early stage startup #5 we were not very ai native, even as early as august last year -- getting the team to a mindset of using the tools takes deliberate effort to make this systemic, including removing the friction to experiment we now give all our engineers their own spend card to "experiment" with ai tools, no questions asked! some fun facts: - cursor usage has dropped drastically in the past 2 months (3.5x reduction) - two factions: team codex and team claude code; team claude code hasn't tried codex yet. I am team codex (for now, i have no loyalty to corporations) - one of our engineers is an opencode stan!
Advait Raykar tweet media
English
0
0
2
68
Advait Raykar
Advait Raykar@AdvaitRaykar·
For those on the "cutting edge", it's blatantly obvious that "software engineering" as a job is practically over when viewed from the lens of the median* definition. Slow diffusion of this technology and people not shifting theirs mindsets is going to cause mass layoffs. This essay puts is really nicely: "You sit in a meeting where someone describes a vague problem, and you’re the one who figures out what they actually need. You look at a codebase and decide which parts to change and which to leave alone. You push back on a feature request because you know it’ll create technical debt that’ll haunt the team for years. You review a colleague’s PR and catch a subtle bug that would’ve broken production. You make a call on whether to ship now or wait for more testing. ... [about layoffs] But think about who stays and who goes in that scenario. It’s not random. The engineers who understand that programming isn’t the job, the ones who bring judgment, context, and the ability to figure out what to build, those are the ones who stay. The ones who only brought code output might be at risk" Read the entire essay here: terriblesoftware.org/2025/12/11/ai-… *based on vibes based survey I did
English
0
0
0
44
Advait Raykar
Advait Raykar@AdvaitRaykar·
@deedydas Cognition is scaling to the billions in ARR too, which is quite impressive!
Advait Raykar tweet media
English
0
0
0
69
Deedy
Deedy@deedydas·
To round out the space: there are some rumors Codex is at $1B. The next big players are Cognition / Copilot / Lovable / ReplIt at $300-600M. A conservative estimate with put the current AI coding market at $7.5-10B. Of course, a lot of API spend on models is for coding even though it isn't directly attributable which would push it even higher.
English
4
3
65
14.4K
Deedy
Deedy@deedydas·
Narrative violation. Cursor goes $1B to $2B in 3mos. Claude Code went $0 to $2.5B in 8mos. Everyone in the tech/X bubble think people are wholesale ditching Cursor, but enterprise diffusion is glacial. Most of the world just got a hold of it.
Deedy tweet media
English
99
55
1.1K
410.6K
Advait Raykar
Advait Raykar@AdvaitRaykar·
@swyx > the Cognition for Governments business is scaling to billions in ARR what?
English
1
0
1
243
swyx
swyx@swyx·
# unconstrained Opus-tier koding at ~1000tok/s if you read The Agent Labs Thesis, custom model training is the smallest part of what an ideal Agent Lab should do. i was initially a skeptic that cog should invest a lot more in swe-x other than as a defensive manuever - cant out-bitter-lesson the big labs, but for “most” or routing tasks you could still do it. I’ve genuinely been turned around on this. the Cognition for Governments business is scaling to billions in ARR. and having own model based on being the first and largest cloud coding agents biz is offense, not defense.
Cognition@cognition

We are sharing an early preview of our ongoing SWE-1.6 training run. It significantly improves upon SWE-1.5 while being post-trained on the same pre-trained model - and it runs equally as fast at 950 tok/s. On SWE-Bench Pro it exceeds top open-source models. The preview model still exhibits some undesirable behaviors like overthinking and excessive self-verification, which we aim to improve. We are rolling out early access to a small subset of users in Windsurf.

English
11
2
125
18.3K