Usman Ghani

2.2K posts

Usman Ghani banner
Usman Ghani

Usman Ghani

@usmanghani

CTO @avencard. CTO @Scotty_Labs (acq by DoorDash). Eng Director @Zenefits. distro sys @Platfora. founding engineer @Azure @Microsoft.

San Francisco, CA Katılım Ocak 2009
5K Takip Edilen1.4K Takipçiler
Usman Ghani retweetledi
Sadi
Sadi@SadiSKhan·
Machine Banking is coming. I spoke with @rabois about how fintech is moving to software-driven systems. We discuss automation and agentic-systems as a core advantage, underwriting and infrastructure as products, and why talent density is what will matter most. Watch the full interview: youtube.com/watch?v=NlXfXT…
YouTube video
YouTube
English
4
12
56
23K
Usman Ghani retweetledi
TFTC
TFTC@TFTC21·
TFTC 740 w/ @SadiSKhan: "The tools that have been only for the uber wealthy... We're bringing that to everyday consumers." We discuss: ⚡ Aven's Bitcoin-backed Visa card ⚡ Borrowing against BTC without selling ⚡ Why BTC should be the lowest-cost collateral
English
4
10
31
6K
Usman Ghani retweetledi
Aven
Aven@AvenCard·
Can you use your Bitcoin without selling? Yes. Now you can. Introducing The Aven Bitcoin Card. Up to 10 year fixed plans. Starting at 7.99% Variable & Fixed No Teaser APR. Up to $1M Line Size. Apply now at aven.com/bitcoin Terms Apply. See the Aven website for more details and disclosures.
English
1
11
18
2.4K
Usman Ghani retweetledi
Aven
Aven@AvenCard·
Even in the tech industry, young talent is sometimes overlooked. Keith Rabois, an Aven board member and managing director at Khosla Ventures, and Sadi Khan, founder and CEO of Aven Financial, Inc weigh in. View the full conversation here: bit.ly/4vLOifC
English
1
2
4
515
Usman Ghani retweetledi
Sadi
Sadi@SadiSKhan·
Just had a powerful convo with former Fed Governor Kevin Warsh about AI, growth, and what founders should really be focused on right now. Kevin explains why most founders are thinking about AI backwards, how crypto got its branding wrong and why they took 10 years to fix it, and how AI can expand businesses and ship new products now. Watch the full interview if you’re building now: youtube.com/watch?v=6LtRcC…
YouTube video
YouTube
English
4
8
24
2.8K
Rohan Paul
Rohan Paul@rohanpaul_ai·
The image sums up the idea that alternating prover-verifier updates transform opaque logic into proofs both humans and small models can audit. The picture tracks a math word-problem solution as it passes through checkability training. Each column is a different training round. You can see the answer stays at 45, yet the explanation grows from a terse latex-style note in “init” to a clear, step-by-step derivation in round 5 where every computation is bracketed like <<3*3=9>>. Those brackets are breadcrumbs a tiny verifier can scan, so the image literally shows legibility rising with each training pass. “Prover-Verifier Games Improve Legibility of LLM Outputs” argues that output clarity matters as much as raw correctness. The authors pit a strong prover against a weaker verifier and alternate updates: the verifier learns to spot errors, the prover learns to write answers the verifier accepts. The left panel shows the raw chain of thought with dense math markup and no step tags. The middle and right panels come after 1 and 5 training rounds; each rewrite breaks the reasoning into short lines and inserts bracketed calculations that a lightweight verifier can parse. The numeric answer stays 45, yet readability jumps because every sub-result, like <<3*3=9>>, is spelled out. The same idea now drives broader reinforcement-learning work. RLPR (Reward from Language Model Probability) swaps the external checker for the model’s own token-probability score, keeping the reward signal while cutting cost. Overall this is what the paper says as well, iterative feedback forces the model to write proofs that even a pocket-size checker can follow, turning opaque chains of thought into straightforward, verifiable steps.
Rohan Paul tweet media
English
1
0
12
2.1K
Rohan Paul
Rohan Paul@rohanpaul_ai·
From various reports OpenAI really did bolt a “Universal Verifier” onto the GPT-5 training loop. And here's that paper, that OpenAI published earlier. "Prover-Verifier Games Improve Legibility of LLM", showing a production-ready pipeline where a verifier model scores each reasoning chain and feeds that reward back into policy updates. The paper is explicit that the verifier is small enough for large-scale rollout and is “designed for future GPT deployments”. 🔄 How the prover-verifier game works Think of two personalities living in one model. The “helpful” persona solves a problem and tries to convince a lightweight verifier network that the answer is sound. The “sneaky” persona deliberately sneaks in wrong conclusions yet still aims to fool the same verifier. By alternating roles, the big model learns to write solutions that are harder to fake, while the small verifier sharpens its ability to flag errors. --- An Aug-2024 Wired article explained how OpenAI swapped some human feedback with model-based critics when fine-tuning GPT-4 code helpers, noting the system “will be folded into RLHF for future mainline models” GPT-5 is that next mainline model.
Rohan Paul tweet media
Rohan Paul@rohanpaul_ai

📐 OpenAI GPT-5 will be a steady step that lifts coding, math, and agent control, not another giant jump from GPT-3 to GPT-4. According theinformation report. OpenAI hit three big snags at once: fresh data dried up, reinforcement learning runs kept wobbling, and the Orion model never lived up to its hype. To keep things on track, the team built a universal verifier, an extra model that grades every answer during reinforcement learning and only lets the solid ones loop back into training so the next model starts from cleaner, more reliable examples. OpenAI spent early 2024 training a bigger model called Orion to replace GPT-4, but the tweaks that helped small test runs failed to scale and clean new data was scarce, so results stayed close to GPT-4 while costs kept rising. Because of that, the company rebranded Orion as GPT-4.5 rather than GPT-5 and shifted its focus to other training tricks Teams pivoted to o-series reasoning models, adding more NVIDIA GPUs and code search; raw problem solving rose but quality dipped once the model had to chat in plain English. GPT-5 folds those lessons together: it scales compute per query, writes cleaner interfaces, handles tricky refunds, and lets the universal verifier grade thousands of synthetic answers.

English
7
47
257
34.1K
DSPy
DSPy@DSPyOSS·
Yes, this is a description of how the dspy.SIMBA optimizer works. > a review/reflect stage along the lines of "what went well? what didn't go so well? what should I try next time?" etc. and the lessons from this stage feel explicit, like a new string to be added to the system prompt for the future, optionally to be distilled into weights (/intuition) later a bit like sleep. Distilling through the weights is the dspy.BetterTogether strategy.
DSPy tweet media
Andrej Karpathy@karpathy

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly increase (/decrease) the probability of every action I took for the future". You get a lot more leverage from verifier functions than explicit supervision, this is great. But first, it looks suspicious asymptotically - once the tasks grow to be minutes/hours of interaction long, you're really going to do all that work just to learn a single scalar outcome at the very end, to directly weight the gradient? Beyond asymptotics and second, this doesn't feel like the human mechanism of improvement for majority of intelligence tasks. There's significantly more bits of supervision we extract per rollout via a review/reflect stage along the lines of "what went well? what didn't go so well? what should I try next time?" etc. and the lessons from this stage feel explicit, like a new string to be added to the system prompt for the future, optionally to be distilled into weights (/intuition) later a bit like sleep. In English, we say something becomes "second nature" via this process, and we're missing learning paradigms like this. The new Memory feature is maybe a primordial version of this in ChatGPT, though it is only used for customization not problem solving. Notice that there is no equivalent of this for e.g. Atari RL because there are no LLMs and no in-context learning in those domains. Example algorithm: given a task, do a few rollouts, stuff them all into one context window (along with the reward in each case), use a meta-prompt to review/reflect on what went well or not to obtain string "lesson", to be added to system prompt (or more generally modify the current lessons database). Many blanks to fill in, many tweaks possible, not obvious. Example of lesson: we know LLMs can't super easily see letters due to tokenization and can't super easily count inside the residual stream, hence 'r' in 'strawberry' being famously difficult. Claude system prompt had a "quick fix" patch - a string was added along the lines of "If the user asks you to count letters, first separate them by commas and increment an explicit counter each time and do the task like that". This string is the "lesson", explicitly instructing the model how to complete the counting task, except the question is how this might fall out from agentic practice, instead of it being hard-coded by an engineer, how can this be generalized, and how lessons can be distilled over time to not bloat context windows indefinitely. TLDR: RL will lead to more gains because when done well, it is a lot more leveraged, bitter-lesson-pilled, and superior to SFT. It doesn't feel like the full story, especially as rollout lengths continue to expand. There are more S curves to find beyond, possibly specific to LLMs and without analogues in game/robotics-like environments, which is exciting.

English
8
44
379
77.3K
Usman Ghani
Usman Ghani@usmanghani·
@noahmacca How do you detect from the realtime output that you need the supervisor
English
0
0
0
53
Noah MacCallum
Noah MacCallum@noahmacca·
I've added a full demo and documentation to our open-source openai-realtime-agents repo, so you can try it out yourself and adapt to your own prompts. Please let me know what you think! github.com/openai/openai-…
English
4
9
74
5.3K
Noah MacCallum
Noah MacCallum@noahmacca·
I've been building voice agents for the last 6mo and I think the chat-supervisor pattern is a game changer. Stitched model (STT-LLM-TTS) is slow, but realtime audio models aren't (yet) as smart as text. This has the best of both worlds. Here's how it works:
English
36
66
679
73.3K
Sheel Mohnot
Sheel Mohnot@pitdesi·
I installed @comma_ai in my car Their tagline is “make driving chill” which is accurate! I have pretty good confidence relying on it to drive most of the time, feels way “chiller” than Tesla autopilot (I haven’t driven the new FSD) It’s awesome that you can self install (~30min) a $1450 device with 3 cameras from a company of ~20 ppl and mostly be able to let it drive for you. It taps into the cars existing driver assist system and takes it over. There are a bunch of different forks; I’ve been using Sunnypilot. It’s not plug and play and there is stuff to figure out but it’s INCREDIBLE what this little device and small team can do. It’s actually quite disappointing that it just isn’t built into most cars.
English
56
42
715
326.8K
Dino Becirovic
Dino Becirovic@dinobecirovic·
@gokulr @jaminball The irony is a lot of the people saying this now were themselves complicit in 2021. Actions > words
English
4
0
31
8.1K
Gokul Rajaram
Gokul Rajaram@gokulr·
“An acquisition at $200m can be life changing for founders, or it can be worth nothing if they raised $250m at $1B valuation back in 2021.” Sobering, wise words from @jaminball Corollary: “Founders should always be aware of what metrics would be required to exit at the valuation they’re raising at under standard exit multiple assumptions.”
English
13
37
385
119.2K
Sarah Catanzaro
Sarah Catanzaro@sarahcat21·
Why is Apple Notes far superior to every other app?
English
7
1
24
8.2K
Ohad
Ohad@ohadsamet·
is there a digital signing service that makes a doc look like it's a wet signature?
English
6
0
4
2.1K