maksim

313 posts

maksim banner
maksim

maksim

@ivanovm_

maksim @ agentic labs

fightertown us-east-1 Katılım Mart 2024
476 Takip Edilen145 Takipçiler
Sabitlenmiş Tweet
maksim
maksim@ivanovm_·
if you’re not on a payment plan with the IRS by the time you’re 25 - you didn’t take enough risk
English
1
0
6
846
Leonard Tang
Leonard Tang@leonardtang_·
Best Startup Shoes Tier List
Leonard Tang tweet media
English
4
0
9
2.1K
maksim
maksim@ivanovm_·
@___4o____ @Austen What is an "institutional lead investor"? A lot of companies just close a party round with a few 6 fig checks
English
0
0
1
59
SPEC
SPEC@___4o____·
@Austen What’s the truth then? I know dozens of yc founders from W24+ and have done my own independent research pointing me to the conclusion that only about 10-15% close their seed round with an institutional lead investor.
English
2
0
8
724
maksim
maksim@ivanovm_·
Yeah, frontier models + automatic prompt and harness optimization via e.g. gepa gives you a lot of the same hill-climbing without touching the weights For RLaaS, starting from weaker open models + factoring in inference hosting costs, you need a *massive* improvement to justify the investment for most buyers
English
0
1
2
417
abhijay
abhijay@abhijaymrana·
There were a few RL-as-a-service companies that emerged this year (with one recently acquired by DoorDash for a very generous price). Some aspects of custom post-training are exciting, but the economics seem tricky here. Fundamentally, the RLaaS thesis is that enterprises should post-train models in-house to: 1. Outperform base models on uncommon, domain-specific tasks. - For DoorDash, this includes optimizing ad relevance and order recommendation models. Enterprises at DoorDash's scale have petabytes of proprietary data, and are evidently willing to spend hundreds of millions for talent alone. - Tangentially, these complex domain-specific tasks also make for great training data, which is why some of these companies make most of their revenue from data sales, not enterprise RL. - This is usually only necessary if the model is the bottleneck, which isn't the case for almost all agent deployments. Context/harness engineering go a long way. However, if you have you have a measurable objective + tight feedback loop (like Decagon), improvements start to become nonlinear and significant. 2. Derisk from frontier labs (long-term). - Owning your own model insulates you from OpenAI/Anthropic API pricing, which is why Cursor is racing for a SoTA coding model. - Regulated industries may also prefer local models for data security (especially if training truly becomes commoditized and they can justify the spend). In reality, as Michael mentions, there are ~zero businesses that are immediately ready for RL. Teams like Applied Compute spend most of their time transforming data and mapping processes (which can spiral into months of work) and even building agents for the enterprises before they can train. This feels slightly distracting—closer to McKinsey than OpenAI—but is also the only way to get AI in businesses today. But is this a good business? In some ways, it feels too early and hard to commoditize. Michael's write-up seems a bit frustrated. Also, there are a few existential questions here: - Is a six-figure training run + continual improvement costs worth it for most businesses? - It's almost like (beyond compute) hiring is the bottleneck to scale here. Sure, AC is innovating on RL infra but why won't businesses hire AI talent internally instead of paying AC millions on top of already high training costs? This doesn't seem sticky. - If Opus 5 is substantially better than Kimi K2.5 (which is already distilled from Opus 4.6 lol) does the fine-tuned model become obsolete? Also, the cost of the training run is amortized across the time period between runs (e.g. Cursor has to "pay off" Composer 1 in the period between launch and the Composer 2 release), which applies additional pressure. - Or, if they re-train on the next open-source model, will any of the initial post-training investment transfer? Maybe the data doesn't have to be re-processed, but unsure if training is cheap enough for this to be dismissible yet. Also, part of the reason these are still open questions is because the impact of post-training is difficult to quantify—there's no reliable way to attribute business outcomes to post-training improvements. - Sure, you can create a benchmark to track model improvement on some tasks. - But there's still a gap between [progress on an ad-recommendation benchmark] and [direct revenue growth for the business] that is latent (and hard to measure, given the lack of a single variable in a company's strategy). Realistic benchmarks are still an unsolved problem. In-house post-training is a no brainer for some businesses. Maybe DoorDash will benefit, and it's obvious Cursor and Decagon will. But broadly, I'm unsure how large or sticky this market really is.
Michael Chen@michaelzchen5

x.com/i/article/2037…

English
15
18
277
76.5K
maksim
maksim@ivanovm_·
@skeptrune when I grow up I want to delete linkedin
English
1
0
2
73
Nick Khami
Nick Khami@skeptrune·
not posting on social media is truly the ultimate status symbol
English
27
1
159
6.8K
maksim
maksim@ivanovm_·
@katieruthmishra it’s the only modality with clean handoffs between agents and non-technical users. similar to tsla autopilot that asks the driver to take over in risky situations. underpriced
English
0
0
1
130
Katie Mishra // Khosla Ventures
ppl are way too high on computer use tested this wknd and turns out I can automate 80% of my work with vibecoded data scraping and api calls. don’t need to wait for some general model to bitter-lesson me rly gotta learn fundamentals and change how you work asap to survive
English
21
9
227
26.6K
maksim retweetledi
dax
dax@thdxr·
rlhf stands for reinforcement learning have fun
English
26
48
849
39.9K
maksim
maksim@ivanovm_·
@vhmth he did, unfortunately a lot of his other content is ccp shilling
English
0
0
5
268
Vinay Hiremath
Vinay Hiremath@vhmth·
This guy predicted everything that's happening in the Strait of Hormuz right now
English
10
2
35
7.6K
maksim
maksim@ivanovm_·
@corsaren the men who don’t mind Indianapolis obviously do not move to New York City
English
2
0
31
6.8K
corsaren
corsaren@corsaren·
The nyc dating market is warped largely because women are dreamers and men are not Young women flock to nyc with visions of self-discovery and collective effervescence, and in doing so skew the gender ratio Meanwhile their male peers are like “what’s wrong with Indianapolis?”
English
83
36
2.7K
134.1K
Josh
Josh@JoshPurtell·
@hensapir @SeanZCai @PrimeIntellect Thanks for the shoutout! Yeah GEPA works great for bootstrapping verifiers, one of the cleanest apps imo. GEPA on a single prompt or RLM backend is what we use nowadays
English
2
3
20
5.1K
Sean Cai
Sean Cai@SeanZCai·
Running @PrimeIntellect Lab GRPO on a hard-to-verify action-matching task. Judge inconsistency was generating phantom reward variance like same model output, different scores across rollouts and step 0 kept winning. Fixed it with a stronger judge (better SOTA OAI model, using all my thousands of old OAI hackathon credits) + response caching and got zero phantom variance, clean gradient signal. Anybody know what's the best off-the-shelf judge for semantic action matching in RL training without post-training a purpose-built one? What are people actually shipping with? Is there anybody working on purpose-built judge models for GRPO?
English
5
0
51
5.2K
maksim retweetledi
radbro
radbro@radbro·
Stepped into a faraday cage and my internal monologue disappeared
English
179
1.3K
22.9K
396.5K
Adrian Ziegler
Adrian Ziegler@adrmtu·
Healthcare software was designed for humans. Multi-step, nuanced workflows: prior auth submissions, EHR note creation, eligibility verification. The kind of work that can't be reduced to an API call. That's what AI agents in healthcare are being asked to automate. And the infrastructure to do it reliably doesn't exist off the shelf. We build it: A coding agent to generate automation scripts, fully managed infrastructure to run them at scale, and a maintenance agent to keep them working as portals and EHRs change. Today, we're announcing our $5M seed round, backed by Floating Point, @MeridianStCap, Twine Ventures, @refractvc and angels like @zacharylipton (CTO, Abridge) and @dps (fmr. CTO, Stripe). If you're building AI agents that need to operate payer portals or EHRs, we'd love to talk. And we're hiring!
Adrian Ziegler tweet media
English
40
22
248
54.5K
maksim
maksim@ivanovm_·
Even after the current Hormuz crisis ends, it's clear that neither the US Navy nor anyone else can fully secure global trade anymore. Pending a military innovation, anyone with many drones, a naval chokepoint, and a dream can disrupt shipping for whatever reason they want
English
0
0
1
40
maksim
maksim@ivanovm_·
gundo should host live-fire war games for uas/cuas companies to compete in cc @jakobdiepen
English
0
0
1
56
maksim
maksim@ivanovm_·
@cxgonzalez It’s a 2 mile wide choke point, the ships passing through are giant bombs, Iran can launch cruise missiles cheaper than a Honda civic for 2k miles in any direction. And they can be launched off a truck and manufactured in a shed. They really figured out asymmetric warfare
English
0
0
1
104
christian
christian@cxgonzalez·
can someone monitoring the situation ELI5 how “the world’s most power navy” can’t break a naval blockade by a middling power? what is going on
English
31
1
67
21.5K
maksim
maksim@ivanovm_·
@skeptrune @RhysSullivan I miss tab, it was really good for flow state. Now the options are: 1. stare at the agent while it works, feels like a waste of time 2. go turbo-adhd with multiple agents doing different tasks 3. scroll x or go for a walk deep focus is an unsolved problem in this age
English
0
0
1
13
Nick Khami
Nick Khami@skeptrune·
@RhysSullivan those cursor tabs keys are going to be like the vhs tapes of vibecoding
English
4
0
48
1.4K
Rhys
Rhys@RhysSullivan·
remember pressing tab
English
10
2
74
5.4K
matt turk
matt turk@TurkMatthew·
RL envs are a subset of useful data, and they generate training tuples (state, action, reward, next state). But they only work when the world can be simulated. Most high-value domains (healthcare, enterprise workflows, multimodal reasoning) can’t be faithfully simulated, so models still need real-world datasets and evaluation benchmarks. RL envs are also mainly for post training and sit a layer above in abstraction whereas mid/pre training require other real world data and domain adaptation.
English
2
1
8
3.7K
Garry Tan
Garry Tan@garrytan·
We’re renaming the YC spring batches from X25 and (what was going to be) X26 to P25 and P26 — P for Primavera, which literally means “first spring” in Latin-derived languages. The original X was a cute programmer in-joke, but people kept asking “what does X stand for?”, so we’re switching to something that actually says “spring” while still keeping it to a single letter.
English
106
14
663
81.5K
Andrew Carr 🤸
Andrew Carr 🤸@andrew_n_carr·
LGTM on AI PRs should mean let's gamble, try merging
English
1
0
3
671
Nick Khami
Nick Khami@skeptrune·
@beaversteever bro i founded a RAG api company in 2023 😭
Nick Khami tweet media
San Jose, CA 🇺🇸 English
8
1
175
17.1K
Steve the Beaver
Steve the Beaver@beaversteever·
incredible that we built all this RAG and vector database stuff and it turns out that grep from 1973 works better than all that
English
181
362
8.6K
503.7K