Sunil Rao

1.7K posts

Sunil Rao banner
Sunil Rao

Sunil Rao

@sunilkrao

founder @ Tribble, ex-GM @Salesforce & @SAP building products. 🇮🇳🇸🇦🇨🇦🇺🇸

San Francisco, CA Katılım Şubat 2009
1.5K Takip Edilen862 Takipçiler
Sunil Rao
Sunil Rao@sunilkrao·
Vendor Roulette is the new Silent Churn your vendor quietly swaps the underlying model. your answers change. your eval was one-shot at pilot and nobody runs it anymore you dont get notified. you dont get a SLA. you just notice your agent 'feels worse' and spend 3 sprints debugging your own code the vendor calls it 'continuous improvement'. the buyer is just spinning the wheel
English
0
0
0
9
Sunil Rao
Sunil Rao@sunilkrao·
DeepSeek V4 just reset every enterprise inference budget V4 Flash at 284B benches within ~2pt of opus 4.6 on reasoning. V4 Pro at 1.6T goes toe-to-toe with the frontier thats per-token pricing at 1/10th. self-host on an 8xH100 matches the throughput youre paying $60-80k/mo for via API every RFP with '2025 per-token rates' is now a renegotiation waiting to happen the hard part isnt the model. its telling legal you want to self-host a chinese-origin weights file in prod
English
1
0
0
46
Sunil Rao
Sunil Rao@sunilkrao·
the fix isnt better vendor trust. its continuous eval on your own workload. run the same 100 prompts every monday, alert when pass-rate drops 5%. this is 2 hours of work that most enterprise ai teams still wont build
English
0
0
0
20
Sunil Rao
Sunil Rao@sunilkrao·
'scale to mass intelligence' reads as cover for 'we figured out how to train cheaper than anthropic'. the interesting eng isnt the 862B pro model, its the 158B flash that gets within 2pt of opus 4.6 on reasoning. thats the enterprise deployable
English
0
0
0
18
Sunil Rao
Sunil Rao@sunilkrao·
cheap and frontier-tier is fine for devs. for enterprise buyers the bigger question is SLAs and data residency on a china-hosted model. most regulated industries cant touch V4 without a self-host plan. which is where unsloth-style tooling gets interesting
English
0
0
0
13
Sunil Rao
Sunil Rao@sunilkrao·
a 284B Flash that benches near opus 4.6 max is the enterprise inference bombshell. anyone with an 8xH100 or 8xMI300X can self-host frontier-tier without burning $80k/mo on API. the self-host ROI math just flipped again
English
0
0
0
22
Sunil Rao
Sunil Rao@sunilkrao·
26 tool calls without rails falling off is the real number here. most enterprise agent pilots die at 3-4 tool calls because the eval never covered the 5th. unsloth showed what a tuned eval loop looks like
English
1
1
2
17
Sunil Rao
Sunil Rao@sunilkrao·
the enterprise version is smaller and already here. procurement portals, sales intake forms, CRM notes. one poisoned document in the retrieval index and your agent makes decisions the attacker wrote. most RAG deployments are wide open
English
0
0
0
20
Sunil Rao
Sunil Rao@sunilkrao·
@levelsio enterprise buyers treat this as a feature, not a bug. RFPs never ask for 'model consistency over time'. vendors can silently A/B their production models and your pilot quietly degrades. nobody's eval is continuous
English
0
0
7
3.7K
Sunil Rao
Sunil Rao@sunilkrao·
the real story isnt peak tok/s. its that AMD consumer cards are 2x-price competitive with nvidia only when the driver is custom. stock rocm still leaves 40% on the table. most enterprise vendors skip AMD because of that driver gap
English
0
0
1
44
Sunil Rao
Sunil Rao@sunilkrao·
@__tinygrad__ enterprise procurement teams have been asking for 'AMD vs nvidia tco' for 18 months and the answer was 'same compute, worse tools'. SQTT kills the tooling excuse. the on-prem story now has real alternatives
English
0
0
4
423
the tiny corp
the tiny corp@__tinygrad__·
Giving us the MI300X boxes marked the turning point. Since then, AMD open sourced their SQTT (low level profiling) format, which is a commitment to open source beyond what I expected. If good decisions like this keep being made, this is just the beginning. Hyped for RDNA5.
the tiny corp tweet media
English
22
23
656
26.3K
Sunil Rao
Sunil Rao@sunilkrao·
the time arbitrage is wild. labs compress 6 months of decision into 6 hours. enterprise procurement expands 6 weeks of decision into 6 quarters. vendors who help buyers catch up win the next cycle
English
0
0
0
18
Sunil Rao
Sunil Rao@sunilkrao·
@patio11 opus 4.7 doing literature survey well is the quiet unlock. most enterprise 'research copilot' vendors are about to find out their moat was just 'gpt-4 couldn't reliably ground-truth'. that's evaporating
English
0
0
11
5.9K
Patrick McKenzie
Patrick McKenzie@patio11·
An anecdote for you from LLM land: I sent Opus 4.7 out to do a literature survey of various government law enforcement internal guidelines w/r/t AML and KYC usage in prosecutions. It was much more successful at finding these docs than I would have expected. I then told it:
English
6
4
293
82.9K
Sunil Rao
Sunil Rao@sunilkrao·
the Engagement Layer Tax, second time around sap ran this play when salesforce became the engagement layer over their records in 2015. flipped to api-metered pricing, bills jumped for the same workflow headless 360 is the setup. the "agent access to records" sku is the punchline. 12 months max
English
1
0
3
71
Sunil Rao
Sunil Rao@sunilkrao·
Pilot Purgatory isnt a bug, its the product 6-month pilot. nobody defined "done". the champion rotates. the tool stays in quarterly review forever the vendor calls it "land and expand". the buyer calls it "still evaluating"
English
0
0
0
30
Sunil Rao
Sunil Rao@sunilkrao·
pricing A/Bs run worldwide by accident is the new 'we found a bug in prod'. procurement teams just watched their forecast model rip up twice in 4 hours
English
0
0
0
29
Sunil Rao
Sunil Rao@sunilkrao·
waitlists work for consumer hype. for enterprise buyers its a red flag. 'cant give it to everyone' really means 'cant price it yet'
English
0
0
1
28
Sunil Rao
Sunil Rao@sunilkrao·
AI Seat Tax is the stealth version of shelfware pay for 50 seats. 3 users actually touch the thing. procurement gets CYA. the vendor gets a line item that never churns the tool does not have to work. it just has to stay bought
English
0
0
0
23
Sunil Rao
Sunil Rao@sunilkrao·
@neural_avb taxonomy debate is fun but 99% of teams shipping RLHF can't articulate their reward signal in one sentence. call it whatever. without that its just expensive vibes
English
0
0
0
6
AVB
AVB@neural_avb·
My next video is on RLHF and preference tuning. Seeing allegations around that: “RLHF isn’t RL” “RLHF is RL” “RLHF is inverse RL” I do have a clear favorite answer/opinion, but I really wanna understand what everyone’s perspective is. Share your intuition!
English
4
1
34
990