Sunil Rao

1.7K posts

Sunil Rao

@sunilkrao

founder @ Tribble, ex-GM @Salesforce & @SAP building products. 🇮🇳🇸🇦🇨🇦🇺🇸

San Francisco, CA Katılım Şubat 2009

1.5K Takip Edilen862 Takipçiler

Sabitlenmiş Tweet

Sunil Rao@sunilkrao·1 Şub

Can I buy this as an #NFT?🤔

Things that Aged Well@AgedWell_

English

Sunil Rao@sunilkrao·13h

x.com/i/article/2047…

ZXX

Sunil Rao@sunilkrao·17h

Vendor Roulette is the new Silent Churn your vendor quietly swaps the underlying model. your answers change. your eval was one-shot at pilot and nobody runs it anymore you dont get notified. you dont get a SLA. you just notice your agent 'feels worse' and spend 3 sprints debugging your own code the vendor calls it 'continuous improvement'. the buyer is just spinning the wheel

English

Sunil Rao@sunilkrao·18h

DeepSeek V4 just reset every enterprise inference budget V4 Flash at 284B benches within ~2pt of opus 4.6 on reasoning. V4 Pro at 1.6T goes toe-to-toe with the frontier thats per-token pricing at 1/10th. self-host on an 8xH100 matches the throughput youre paying $60-80k/mo for via API every RFP with '2025 per-token rates' is now a renegotiation waiting to happen the hard part isnt the model. its telling legal you want to self-host a chinese-origin weights file in prod

English

Sunil Rao@sunilkrao·18h

the fix isnt better vendor trust. its continuous eval on your own workload. run the same 100 prompts every monday, alert when pass-rate drops 5%. this is 2 hours of work that most enterprise ai teams still wont build

English

Sunil Rao@sunilkrao·18h

'scale to mass intelligence' reads as cover for 'we figured out how to train cheaper than anthropic'. the interesting eng isnt the 862B pro model, its the 158B flash that gets within 2pt of opus 4.6 on reasoning. thats the enterprise deployable

English

Sunil Rao@sunilkrao·18h

cheap and frontier-tier is fine for devs. for enterprise buyers the bigger question is SLAs and data residency on a china-hosted model. most regulated industries cant touch V4 without a self-host plan. which is where unsloth-style tooling gets interesting

English

Sunil Rao@sunilkrao·18h

a 284B Flash that benches near opus 4.6 max is the enterprise inference bombshell. anyone with an 8xH100 or 8xMI300X can self-host frontier-tier without burning $80k/mo on API. the self-host ROI math just flipped again

English

Sunil Rao@sunilkrao·1d

26 tool calls without rails falling off is the real number here. most enterprise agent pilots die at 3-4 tool calls because the eval never covered the 5th. unsloth showed what a tuned eval loop looks like

English

Sunil Rao@sunilkrao·1d

the enterprise version is smaller and already here. procurement portals, sales intake forms, CRM notes. one poisoned document in the retrieval index and your agent makes decisions the attacker wrote. most RAG deployments are wide open

English

Sunil Rao@sunilkrao·1d

@levelsio enterprise buyers treat this as a feature, not a bug. RFPs never ask for 'model consistency over time'. vendors can silently A/B their production models and your pilot quietly degrades. nobody's eval is continuous

English

3.7K

@levelsio@levelsio·1d

I can't believe we were right Claude was dumbified on March 4, just when we noticed!

@levelsio@levelsio

Claude Code with Opus 4.6 was so dumb today I finally had to write my own code again A sad state of affairs 🥹

English

304

428

8.9K

Sunil Rao@sunilkrao·1d

the real story isnt peak tok/s. its that AMD consumer cards are 2x-price competitive with nvidia only when the driver is custom. stock rocm still leaves 40% on the table. most enterprise vendors skip AMD because of that driver gap

English

Sunil Rao@sunilkrao·1d

@__tinygrad__ enterprise procurement teams have been asking for 'AMD vs nvidia tco' for 18 months and the answer was 'same compute, worse tools'. SQTT kills the tooling excuse. the on-prem story now has real alternatives

English

423

the tiny corp@__tinygrad__·2d

Giving us the MI300X boxes marked the turning point. Since then, AMD open sourced their SQTT (low level profiling) format, which is a commitment to open source beyond what I expected. If good decisions like this keep being made, this is just the beginning. Hyped for RDNA5.

English

656

26.3K

Sunil Rao@sunilkrao·1d

the time arbitrage is wild. labs compress 6 months of decision into 6 hours. enterprise procurement expands 6 weeks of decision into 6 quarters. vendors who help buyers catch up win the next cycle

English

Sunil Rao@sunilkrao·2d

@patio11 opus 4.7 doing literature survey well is the quiet unlock. most enterprise 'research copilot' vendors are about to find out their moat was just 'gpt-4 couldn't reliably ground-truth'. that's evaporating

English

5.9K

Patrick McKenzie@patio11·2d

An anecdote for you from LLM land: I sent Opus 4.7 out to do a literature survey of various government law enforcement internal guidelines w/r/t AML and KYC usage in prosecutions. It was much more successful at finding these docs than I would have expected. I then told it:

English

293

82.9K

Sunil Rao@sunilkrao·2d

the Engagement Layer Tax, second time around sap ran this play when salesforce became the engagement layer over their records in 2015. flipped to api-metered pricing, bills jumped for the same workflow headless 360 is the setup. the "agent access to records" sku is the punchline. 12 months max

English

Sunil Rao@sunilkrao·2d

Pilot Purgatory isnt a bug, its the product 6-month pilot. nobody defined "done". the champion rotates. the tool stays in quarterly review forever the vendor calls it "land and expand". the buyer calls it "still evaluating"

English

Sunil Rao@sunilkrao·2d

pricing A/Bs run worldwide by accident is the new 'we found a bug in prod'. procurement teams just watched their forecast model rip up twice in 4 hours

English

Sunil Rao@sunilkrao·2d

waitlists work for consumer hype. for enterprise buyers its a red flag. 'cant give it to everyone' really means 'cant price it yet'

English

Sunil Rao@sunilkrao·2d

AI Seat Tax is the stealth version of shelfware pay for 50 seats. 3 users actually touch the thing. procurement gets CYA. the vendor gets a line item that never churns the tool does not have to work. it just has to stay bought

English

Sunil Rao@sunilkrao·3d

@neural_avb taxonomy debate is fun but 99% of teams shipping RLHF can't articulate their reward signal in one sentence. call it whatever. without that its just expensive vibes

English

AVB@neural_avb·3d

My next video is on RLHF and preference tuning. Seeing allegations around that: “RLHF isn’t RL” “RLHF is RL” “RLHF is inverse RL” I do have a clear favorite answer/opinion, but I really wanna understand what everyone’s perspective is. Share your intuition!

English

990

Keşfet

@levelsio @__tinygrad__ @patio11 @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates