maksim

313 posts

maksim

@ivanovm_

maksim @ agentic labs

fightertown us-east-1 Katılım Mart 2024

476 Takip Edilen145 Takipçiler

Sabitlenmiş Tweet

maksim@ivanovm_·7 Ağu

if you’re not on a payment plan with the IRS by the time you’re 25 - you didn’t take enough risk

English

846

maksim@ivanovm_·1d

@leonardtang_

QME

132

Leonard Tang@leonardtang_·1d

Best Startup Shoes Tier List

English

2.1K

maksim@ivanovm_·1d

@___4o____ @Austen What is an "institutional lead investor"? A lot of companies just close a party round with a few 6 fig checks

English

SPEC@___4o____·2d

@Austen What’s the truth then? I know dozens of yc founders from W24+ and have done my own independent research pointing me to the conclusion that only about 10-15% close their seed round with an institutional lead investor.

English

724

SPEC@___4o____·2d

Despite the propaganda an overwhelming majority of YC companies don’t ever find a lead investor for their seed round.

first check $500k-1M pre-seed@ajhodls

most of the YC founders I talked to hadn't come close to their fundraise target by demo day. i suspect this can be extrapolated to the median company in the batch as well.

English

213

32.8K

maksim@ivanovm_·2d

Yeah, frontier models + automatic prompt and harness optimization via e.g. gepa gives you a lot of the same hill-climbing without touching the weights For RLaaS, starting from weaker open models + factoring in inference hosting costs, you need a *massive* improvement to justify the investment for most buyers

English

417

abhijay@abhijaymrana·2d

There were a few RL-as-a-service companies that emerged this year (with one recently acquired by DoorDash for a very generous price). Some aspects of custom post-training are exciting, but the economics seem tricky here. Fundamentally, the RLaaS thesis is that enterprises should post-train models in-house to: 1. Outperform base models on uncommon, domain-specific tasks. - For DoorDash, this includes optimizing ad relevance and order recommendation models. Enterprises at DoorDash's scale have petabytes of proprietary data, and are evidently willing to spend hundreds of millions for talent alone. - Tangentially, these complex domain-specific tasks also make for great training data, which is why some of these companies make most of their revenue from data sales, not enterprise RL. - This is usually only necessary if the model is the bottleneck, which isn't the case for almost all agent deployments. Context/harness engineering go a long way. However, if you have you have a measurable objective + tight feedback loop (like Decagon), improvements start to become nonlinear and significant. 2. Derisk from frontier labs (long-term). - Owning your own model insulates you from OpenAI/Anthropic API pricing, which is why Cursor is racing for a SoTA coding model. - Regulated industries may also prefer local models for data security (especially if training truly becomes commoditized and they can justify the spend). In reality, as Michael mentions, there are ~zero businesses that are immediately ready for RL. Teams like Applied Compute spend most of their time transforming data and mapping processes (which can spiral into months of work) and even building agents for the enterprises before they can train. This feels slightly distracting—closer to McKinsey than OpenAI—but is also the only way to get AI in businesses today. But is this a good business? In some ways, it feels too early and hard to commoditize. Michael's write-up seems a bit frustrated. Also, there are a few existential questions here: - Is a six-figure training run + continual improvement costs worth it for most businesses? - It's almost like (beyond compute) hiring is the bottleneck to scale here. Sure, AC is innovating on RL infra but why won't businesses hire AI talent internally instead of paying AC millions on top of already high training costs? This doesn't seem sticky. - If Opus 5 is substantially better than Kimi K2.5 (which is already distilled from Opus 4.6 lol) does the fine-tuned model become obsolete? Also, the cost of the training run is amortized across the time period between runs (e.g. Cursor has to "pay off" Composer 1 in the period between launch and the Composer 2 release), which applies additional pressure. - Or, if they re-train on the next open-source model, will any of the initial post-training investment transfer? Maybe the data doesn't have to be re-processed, but unsure if training is cheap enough for this to be dismissible yet. Also, part of the reason these are still open questions is because the impact of post-training is difficult to quantify—there's no reliable way to attribute business outcomes to post-training improvements. - Sure, you can create a benchmark to track model improvement on some tasks. - But there's still a gap between [progress on an ad-recommendation benchmark] and [direct revenue growth for the business] that is latent (and hard to measure, given the lack of a single variable in a company's strategy). Realistic benchmarks are still an unsolved problem. In-house post-training is a no brainer for some businesses. Maybe DoorDash will benefit, and it's obvious Cursor and Decagon will. But broadly, I'm unsure how large or sticky this market really is.

Michael Chen@michaelzchen5

x.com/i/article/2037…

English

277

76.5K

maksim@ivanovm_·3d

@skeptrune when I grow up I want to delete linkedin

English

Nick Khami@skeptrune·4d

not posting on social media is truly the ultimate status symbol

English

159

6.8K

maksim@ivanovm_·5d

@katieruthmishra it’s the only modality with clean handoffs between agents and non-technical users. similar to tsla autopilot that asks the driver to take over in risky situations. underpriced

English

130

Katie Mishra // Khosla Ventures@katieruthmishra·6d

ppl are way too high on computer use tested this wknd and turns out I can automate 80% of my work with vibecoded data scraping and api calls. don’t need to wait for some general model to bitter-lesson me rly gotta learn fundamentals and change how you work asap to survive

English

227

26.6K

maksim retweetledi

dax@thdxr·6d

rlhf stands for reinforcement learning have fun

English

849

39.9K

maksim@ivanovm_·6d

@vhmth he did, unfortunately a lot of his other content is ccp shilling

English

268

Vinay Hiremath@vhmth·6d

This guy predicted everything that's happening in the Strait of Hormuz right now

English

7.6K

maksim@ivanovm_·22 Mar

@corsaren the men who don’t mind Indianapolis obviously do not move to New York City

English

6.8K

corsaren@corsaren·22 Mar

The nyc dating market is warped largely because women are dreamers and men are not Young women flock to nyc with visions of self-discovery and collective effervescence, and in doing so skew the gender ratio Meanwhile their male peers are like “what’s wrong with Indianapolis?”

English

2.7K

134.1K

maksim retweetledi

Josh Schlisserman@jslishi·20 Mar

I am surprised more VCs aren't talking about this. But, if you are a NYC founder with any type of liquidity event happening soon, consider relocating NOW! CC: @ethdaly @MaxwellAbram @ChanniGreenwall @SandroChess @Bfaviero @evanbfish @jackmmcclelland jdsupra.com/legalnews/new-…

English

581

317.7K

maksim@ivanovm_·21 Mar

@JoshPurtell @LakshyAAAgrawal @hensapir @SeanZCai @PrimeIntellect How is it to continuous rewards? One of the biggest challenges we’re facing is calibrating partial credit

English

282

Josh@JoshPurtell·20 Mar

@hensapir @SeanZCai @PrimeIntellect Thanks for the shoutout! Yeah GEPA works great for bootstrapping verifiers, one of the cleanest apps imo. GEPA on a single prompt or RLM backend is what we use nowadays

English

5.1K

Sean Cai@SeanZCai·20 Mar

Running @PrimeIntellect Lab GRPO on a hard-to-verify action-matching task. Judge inconsistency was generating phantom reward variance like same model output, different scores across rollouts and step 0 kept winning. Fixed it with a stronger judge (better SOTA OAI model, using all my thousands of old OAI hackathon credits) + response caching and got zero phantom variance, clean gradient signal. Anybody know what's the best off-the-shelf judge for semantic action matching in RL training without post-training a purpose-built one? What are people actually shipping with? Is there anybody working on purpose-built judge models for GRPO?

English

5.2K

maksim retweetledi

radbro@radbro·20 Mar

Stepped into a faraday cage and my internal monologue disappeared

English

179

1.3K

22.9K

396.5K

maksim@ivanovm_·19 Mar

@adrmtu congrats!!

English

Adrian Ziegler@adrmtu·19 Mar

Healthcare software was designed for humans. Multi-step, nuanced workflows: prior auth submissions, EHR note creation, eligibility verification. The kind of work that can't be reduced to an API call. That's what AI agents in healthcare are being asked to automate. And the infrastructure to do it reliably doesn't exist off the shelf. We build it: A coding agent to generate automation scripts, fully managed infrastructure to run them at scale, and a maintenance agent to keep them working as portals and EHRs change. Today, we're announcing our $5M seed round, backed by Floating Point, @MeridianStCap, Twine Ventures, @refractvc and angels like @zacharylipton (CTO, Abridge) and @dps (fmr. CTO, Stripe). If you're building AI agents that need to operate payer portals or EHRs, we'd love to talk. And we're hiring!

English

248

54.5K

maksim@ivanovm_·19 Mar

Even after the current Hormuz crisis ends, it's clear that neither the US Navy nor anyone else can fully secure global trade anymore. Pending a military innovation, anyone with many drones, a naval chokepoint, and a dream can disrupt shipping for whatever reason they want

English

maksim@ivanovm_·18 Mar

gundo should host live-fire war games for uas/cuas companies to compete in cc @jakobdiepen

English

maksim@ivanovm_·15 Mar

@cxgonzalez It’s a 2 mile wide choke point, the ships passing through are giant bombs, Iran can launch cruise missiles cheaper than a Honda civic for 2k miles in any direction. And they can be launched off a truck and manufactured in a shed. They really figured out asymmetric warfare

English

104

christian@cxgonzalez·14 Mar

can someone monitoring the situation ELI5 how “the world’s most power navy” can’t break a naval blockade by a middling power? what is going on

English

21.5K

maksim@ivanovm_·13 Mar

@skeptrune @RhysSullivan I miss tab, it was really good for flow state. Now the options are: 1. stare at the agent while it works, feels like a waste of time 2. go turbo-adhd with multiple agents doing different tasks 3. scroll x or go for a walk deep focus is an unsolved problem in this age

English

Nick Khami@skeptrune·13 Mar

@RhysSullivan those cursor tabs keys are going to be like the vhs tapes of vibecoding

English

1.4K

Rhys@RhysSullivan·13 Mar

remember pressing tab

English

5.4K

maksim@ivanovm_·13 Mar

@TurkMatthew @saurabh_shah2 Idk I would just faithfully simulate them

English

matt turk@TurkMatthew·13 Mar

RL envs are a subset of useful data, and they generate training tuples (state, action, reward, next state). But they only work when the world can be simulated. Most high-value domains (healthcare, enterprise workflows, multimodal reasoning) can’t be faithfully simulated, so models still need real-world datasets and evaluation benchmarks. RL envs are also mainly for post training and sit a layer above in abstraction whereas mid/pre training require other real world data and domain adaptation.

English

3.7K

Saurabh Shah@saurabh_shah2·13 Mar

Ohhh good point!! Since RL mostly works now RL data might be a thing people wanna sell. Is anyone doing this? Selling RL envs? Is there even a single company doing this?

Bobby Samuels@BobbySamuels

x.com/i/article/2030…

English

380

124.5K

maksim@ivanovm_·12 Mar

@garrytan p is for pring

English

Garry Tan@garrytan·12 Mar

We’re renaming the YC spring batches from X25 and (what was going to be) X26 to P25 and P26 — P for Primavera, which literally means “first spring” in Latin-derived languages. The original X was a cute programmer in-joke, but people kept asking “what does X stand for?”, so we’re switching to something that actually says “spring” while still keeping it to a single letter.

English

106

663

81.5K

maksim@ivanovm_·10 Mar

@andrew_n_carr polymarket for PRs blowing up prod

English