Cameron Fagan

38 posts

Cameron Fagan banner
Cameron Fagan

Cameron Fagan

@camfagan

Building https://t.co/FPQWG32CHO — an AI podcast for all 32 NFL teams and every fantasy team. Free, every week. Also https://t.co/xKdCuJNKAJ.

San Francisco Katılım Temmuz 2009
985 Takip Edilen76 Takipçiler
Sabitlenmiş Tweet
Cameron Fagan
Cameron Fagan@camfagan·
Most AI sports content is slop because nobody checks the numbers. The model will tell you a QB's CPOE was +12.2 when it was actually +9.14 — confidently — and it ships. So when I built muffed.ai (an AI podcast for all 32 NFL teams — and every fantasy team — solo, with Claude), I gave it a rule: it has to grep every stat against the raw nflverse data before it's allowed to say it out loud. The AI writes the script. A verification layer holds the veto. It has caught itself being wrong more often than I'm comfortable admitting. That's the whole product, honestly — not "AI made a podcast," but "AI made a podcast that doesn't make up the stats." Free, every week: muffed.ai
English
0
0
1
94
@levelsio
@levelsio@levelsio·
Anyone know what's the best Thai massage in SF? I need massage after deadlift for lower back a bit, also foot massage would be nice
English
80
2
205
117.5K
Cameron Fagan retweetledi
Muffed
Muffed@muffedai·
We pointed Anthropic's brand-new Claude Fable model at every NFL season from 2016–2025 — every play, every target, every fantasy point — and asked it one question: Where is 2026 ADP wrong? It found six patterns that repeated in BOTH halves of the decade, graded the current board, then we news-checked every name. Every number passed 183 automated verification checks. Priced too LOW: 🟢 Justin Jefferson (WR5) — 2 TDs all season, but 8.3 targets/gm. Targets repeat year-over-year (r=.79). TDs don't (r=.52). That's a discount, not a decline. 🟢 CeeDee Lamb (WR6) — same story: 9.0 targets/gm, only 9% of his points from TDs. The ankle's healed; the volume never left. 🟢 Jalen Hurts (QB7) + Jayden Daniels (QB5) — rushing QBs repeat top-6 finishes 61% of the time. Pocket QBs: 24%. The stickier archetype is the one on sale. 🟢 Wan'Dale Robinson (WR48) — 8.8 targets/gm, 13.6 PPG… then the Titans paid him $78M to reunite with Daboll, the coach who fed him that volume. The NFL repriced him. Fantasy hasn't. 🟢 Patrick Mahomes (QB14, pick 95) — two discounts in one price: the knee (real — ACL/LCL, targeting Week 1) and the 6-11 record (not real — KC went 1-9 in one-score games, a stat with near-zero year-to-year correlation). He averaged 20.4 PPG before the injury. If camp reports stay clean, you're getting paid for a discount that doesn't exist. Priced too HIGH: 🔴 Davante Adams (WR23) — 38% of his points came from TDs, the most TD-dependent WR on the board. That profile has shed ~2 PPG the following season, all decade. 14 TDs in season 13 is a heater, not a floor. 🔴 Derrick Henry (RB12) — 34% TD-dependent, 322 touches, season 11. TD-heavy RBs fade −3 PPG the next year. In both halves of the decade. 🔴 Tee Higgins (WR18) — 31% TD share on just 6.5 targets/gm. Paying for the stat that doesn't repeat, without the one that does. 🔴 The year-2 hype tax — McMillan (WR19), Egbuka (WR20), Burden (WR21). Rookie WRs who hit don't leap: −0.3 PPG on average in year 2. All three are priced for the exception, not the base rate — Burden averaged 8.5 PPG and is going WR21 on a role he hasn't run yet. 🔴 The injury-return cluster — Nabers (WR14, second knee surgery, Week 1 not guaranteed), Garrett Wilson (WR17). Players coming off a ≤10-game season return at a median 70% of their old PPG. Only 1 in 3 ever see 85% again. Demand a bigger discount. The model also killed two myths on the way: there's no automatic year-2 WR leap, and the 300-touch curse is mostly regression dressed up as breakdown. We turned this engine into a product: a weekly podcast about YOUR fantasy players — their real football, not just box scores. Build yours at muffed.ai
English
0
1
1
330
Cameron Fagan
Cameron Fagan@camfagan·
I let an AI rank all 32 starting QBs on 2025 data alone — no narratives, no reputation. The #1 wasn't Mahomes, Allen, or Burrow. It was Drake Maye: +9.14 CPOE, a full 4 points clear of #2 (Purdy, +5.07). Most accurate passer in the league over expectation. Every number checked against nflverse before it shipped. muffed.ai
English
0
0
0
35
Cameron Fagan
Cameron Fagan@camfagan·
"AI-generated" is doing two completely different jobs right now. One: type a prompt, post whatever comes out, never read it. The other: build the guardrails, verify the facts, throw away the 80% that's wrong, ship the 20% that's true. Same label, opposite product. I'm building muffed.ai entirely in the second category.
English
0
0
0
18
Cameron Fagan
Cameron Fagan@camfagan·
This matches what I hit building solo. The truest line: "every failed agent run becomes a permanent harness rule." Mine did exactly that — the AI shipped a stat as +12.2 when it was +9.14, so it became a hard check that greps every number against source before shipping. Each catch becomes a new rule. Bug count trends to zero.
English
1
0
0
55
Aakash Gupta
Aakash Gupta@aakashgupta·
OpenAI built a system where PMs ship 100,000+ lines of code. Here's the roadmap to get there. Engineers are banned from typing production code. Their entire job is building the harness, a set of guardrails, tests, and documentation that prevents the AI agent from writing bad code in the first place. The first month was 10x slower than doing it the old way. By month three, a PM wrote a PRD on Monday and shipped a working pull request by Friday without a single engineer touching the code. The harness has five components that make this possible. All tribal knowledge from Slack threads, wikis, and engineers' heads gets consolidated into a centralized prompt catalog the agent reads before every task. Every failed agent run gets converted into a permanent harness rule that eliminates that failure class forever, so the bug count trends toward zero over time. Prompt files get formal code review and regression tests because they control app behavior the same way source code does. 250,000 lines of markdown in this repo, all reviewed and version-controlled. The result is a codebase where PMs, designers, and ops teams all contribute directly, because the harness enforces quality regardless of who triggered the work. Engineers stop producing features and start producing leverage. The guardrails produce the app.
Aakash Gupta tweet media
Aakash Gupta@aakashgupta

Ryan Lopopolo leads a team at OpenAI where the PM writes a PRD on Monday and ships a pull request by Friday. No human writes the code. @_lopopolo broke it all down: 0:00 - "Code is a liability" 3:23 - Why your most expensive asset is now free 6:01 - "What's the point of roles anymore?" 8:04 - What replaces the PM/design/eng triangle 13:10 - 1M lines of code, zero written by humans 16:05 - Engineers can't touch the keyboard 18:13 - First month was 10x slower than solo 20:07 - Recursing 8 levels deep for one primitive 20:47 - PM writes PRD Monday, ships PR Friday 25:06 - The feature they had to trash 28:02 - How designers ship UI without a backend 31:40 - What's actually inside the harness 37:03 - Failing the build over curly quotes 40:02 - Inside Ryan's actual Codex setup 46:25 - The codebase that grades itself 50:49 - "A billion tokens a day or you're negligent" 52:19 - 350M tokens on a single PR 53:46 - GPT 5.2 changed everything overnight 57:00 - Every engineer is now a staff engineer 59:19 - The ego problem nobody talks about 1:00:39 - Monday morning roadmap for normal teams 1:08:19 - One skill to build this weekend 1:10:57 - Why one agent beats multi-agent

English
8
0
19
5.8K
Cameron Fagan
Cameron Fagan@camfagan·
Been doing exactly this — building an AI sports podcast, its "memory" is a versioned facts file + an instructions doc I own, not anything platform-side. Switch harnesses tomorrow, it comes with me. The underrated benefit of memory-as-a-file-you-control: you can audit it. Platform memory you can't.
English
0
0
1
346
Garry Tan
Garry Tan@garrytan·
You should want to control and host your own memory It’s the one thing that you should be able to take to any platform Watch for this to be a defining battle in the new browser war: the AI harness wars of 2027
Pejman Pour-Moezzi@pejmanjohn

x.com/i/article/2060…

English
155
197
2.2K
319.9K
Cameron Fagan
Cameron Fagan@camfagan·
Best version of this diagram I've seen — and one extension from building it for real (an AI sports podcast): when the output is factual, the gate doesn't need to be an LLM scoring 0-1. Every stat gets grepped against source data, ships only if it matches. The model gets no vote on whether it's right. The check half IS the system.
English
0
0
1
1.1K
Cameron Fagan
Cameron Fagan@camfagan·
The tell: performed intensity is about the founder, but the real work is unglamorous and unpostable. The thing that moved my project most this year was a boring verification layer nobody would ever tweet — "made it stop making up numbers" doesn't perform like "pulled an all-nighter." The quiet builders live in that second category.
English
0
0
0
690
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Finally, someone said it on grindmaxxing: "There is a growing cliché in startup culture where founders and startups feel the need to perform intensity publicly. How hard they work, how little they sleep (...) You almost never see this from the most successful companies/people."
Karri Saarinen@karrisaarinen

I get that business insurance is similar Nobel level type of pursuit as ground breaking physics and the Manhattan project. Hopefully the blast radius will be contained. I don’t think the disagreement is whether hard problems require intensity. The disagreement is whether intensity has to become a permanent operating model, and whether working seven days a week is the thing that compounds. My argument is that for most startups, the real compounding advantage is not raw hours. It is clearer thinking, better judgment, learning, and a team that can sustain high-quality work for a long time. You can always spend a lot of time working, but the PMF might never arrive. There are moments where extraordinary effort is necessary. Launches, incidents, existential deadlines, customer commitments. Those moments matter, and great teams rise to them. But if the company requires heroics every day of the eek, that usually points to a system problem. It means the operating model depends on burning reserve capacity instead of building it. Company that is constantly on fire is company that is not operating well. Whenever you put something out there, people will argue and people can argue the way I run Linear. The reason I comment on these things to offer some counter point. There is a growing cliché in startup culture where founders and startups feel the need to perform intensity publicly. How hard they work, how little they sleep, how many tokens they spend, how busy they are, how much personal sacrifice they make. You almost never see this from the most successful companies or people. Even if they work that way, they usually don’t make it the story, because they have more important things to talk about, like the product, the customers, the insight, the strategy, the quality of the work. That’s my issue with the narrative and why I think startups shouldn't blindly follow it. Not that is bad to work hard but grindmaxxing narrative can become the greater goal and become counterproductive. The performative intensity becomes the thing, and loosing sight of what actually matters. Lets check back in 7 years.

English
51
59
1.4K
120K
Cameron Fagan
Cameron Fagan@camfagan·
The next thing I'm building on muffed: a weekly podcast about your exact fantasy roster — your starts, your sits, your waiver targets — generated fresh every week. Same rule as the team shows: it doesn't get to make up a number. Every stat about your players is checked against the source data before it's allowed to say it. Most fantasy advice is confident and generic. I'm going for the opposite — specific to your team, and actually right about the facts. muffed.ai
English
0
0
0
28
Cameron Fagan
Cameron Fagan@camfagan·
Honestly the opposite of what you'd expect — it's a podcast, so the "UI" is mostly your podcast app and the audio itself. I kept the screens deliberately minimal; the design effort went into the part you can't see, the accuracy layer from the post. The interface I obsess over is the listening experience, not a dashboard.
English
0
0
0
116
Cameron Fagan
Cameron Fagan@camfagan·
Weekend build log: the unglamorous part of an AI podcast isn't the voice or the audio. It's the verification layer — the boring code that greps every stat against source data so the thing can't confidently lie to you. Spent more time on that than anything else. It's the whole reason I'd trust it. muffed.ai
English
2
0
1
51
Cameron Fagan
Cameron Fagan@camfagan·
Two flavors of this worth separating. Cursor/Devin surface the proof so the human can verify fast — right when you want a human in the loop. But for factual/content products the better move is to enforce the check before output, so there's nothing left for the user to judge. Built an AI sports podcast on exactly that: every stat is verified against source data before it ships, so the unverifiable stuff never reaches the listener. Verifiability stops being something the user does and becomes something the product guarantees.
English
0
0
0
20
Hamel Husain
Hamel Husain@HamelHusain·
The easier something is to verify, the more delightful it is to use. For example, with coding agents, the ones that show you videos in screenshots, like Cursor and Devin, are really delightful because it takes away much of the verifiability burden from the human. I think there are ways to make other products or tools more verifiable.
English
2
0
1
2.1K
Hamel Husain
Hamel Husain@HamelHusain·
If you can’t eval a thing easily it’s a product smell What you need from an analytics AI is proof that the number can be trusted. This means verifying quantities used against trusted reports, dashboards, prior analysis, etc to ensure numbers are consistent to both earn trust and help with verifiability Data analysis should ideally be put into a notebook to allow folks to verify (and edit) the work in-situ to reduce friction This is all stuff a good data scientist would do behind the scenes before producing any number! Which is why it’s important to involve domain experts when automating a thing with AI It’s frustrating whenever I see “Business analyst AI” that is presented to the user as something that can one-shot business questions. It represents a fundamental misunderstanding of what data analysis is.
Lenny Rachitsky@lennysan

Not enough people are talking about how much AI is impacting the role of data science. I was chatting with a DS friend, and he said that most of his team's work now is reviewing half-assed AI data analysis from PMs and engineers. And that 50% of the time, that analysis is wrong. The role is becoming less fun.

English
18
4
36
8.1K
Cameron Fagan
Cameron Fagan@camfagan·
"Be skeptical" is right but it's a vibe until you make it mechanical. The dangerous output is the one that looks correct and isn't — you can't eyeball that at scale. Shipped an AI sports podcast and the only thing that actually worked was putting a deterministic gate between the model and what goes out: every number grepped against source data first. The skill can be great or trash — the check doesn't trust it either way.
English
0
0
0
35
Cameron Fagan
Cameron Fagan@camfagan·
Things my own AI confidently got wrong before my verification layer caught them: – CPOE published as +12.2, actual +9.14 – another QB as "top-8 by accuracy," actually 12th – a "top-3 ANY/A" claim that was really #7 (sign error in the formula) I built muffed.ai — an AI podcast for every NFL team — and the hardest engineering problem wasn't generating audio. It was making the thing grep every stat against nflverse so it can't ship a number it didn't look up. AI writes, data layer vetoes. muffed.ai
English
0
0
0
28
Cameron Fagan
Cameron Fagan@camfagan·
The structured-rubric piece is the one I'd underline — breaking output into verifiable yes/no questions is the real unlock. One wrinkle from shipping this on factual content (an AI sports podcast): when the yes/no is a fact, an LLM judge inherits the generator's blind spot. It'll confidently approve "+12.2" when the truth is +9.14 — same error the model that wrote it would make. So for falsifiable claims we made the judge deterministic: each number is checked against source data by code, not scored by a model. LLM-as-judge for subjective quality; hard ground-truth check for anything verifiable. Different domain than image/video, same eval problem.
English
0
1
0
98
Andrew Ng
Andrew Ng@AndrewYNg·
New course: Build AI agents that generate images and videos -- an under-explored frontier. A key to performance is having the agent evaluate its own output, and iterate to improve quality. This short course is built together with @googlecloudtech and taught by Katie Nguyen and Wafae Bakkali. You'll learn three evaluation techniques and combine them in an agent: image-text similarity scoring to check the output matches the prompt, an LLM judge that scores against custom criteria like brand consistency, and structured rubrics that break a prompt into verifiable yes/no questions like "is the subject in the frame?" and "does the camera motion match?" Skills you'll gain: - Learn image and video prompt engineering - Build an image agent that turns brand guidelines into UI mockups - Build a video agent that plans multi-scene explainers and animates reference frames with synchronized audio Join and build agents that create images and video! deeplearning.ai/courses/ai-age…
English
116
168
1.2K
130K
Cameron Fagan
Cameron Fagan@camfagan·
Built a version of this in a narrower domain (an AI sports podcast) and the surprise was: even with constrained generation + good retrieval, the model still confidently invents numbers. A stat would come back +12.2 when the truth was +9.14 — plausible, wrong, shipped. The fix wasn't better retrieval. It was a deterministic gate outside the model: every numeric claim gets grepped against source data before it's allowed in the output — no match, no ship. Basically your #6/#7, but enforced in code instead of asked of the LLM. That's what got the fabricated-number rate to actual zero. You can't prompt your way there; the veto has to live outside the model.
English
0
0
0
27
aditya
aditya@adxtyahq·
“design a RAG pipeline for 10M docs with zero hallucination” apparently this was asked in a Google L5 interview round. came across it somewhere on the internet and honestly it’s a way more interesting system design problem than most classic distributed systems questions 1. ingest + normalize docs - remove duplicates, standardize formats, extract metadata, maintain version history 2. hybrid retrieval (BM25 + embeddings) - BM25 handles exact keyword matching while embeddings capture semantic meaning - semantic search alone usually struggles with precision at massive scale 3. ANN retrieval + reranking - ANN (Approximate nearest neighbor ) quickly pulls top candidate chunks from millions of docs - then a reranker rescoring step improves relevance by deeply comparing query vs retrieved chunks 4. source confidence scoring - every retrieved chunk gets scored based on freshness, trust level, overlap and retrieval consistency - low-confidence context should never heavily influence generation 5. constrained generation - the model is only allowed to answer using retrieved context (nothing new to be invented outside of the retrieved context) 6. citation-backed responses - every major claim links back to exact chunks, documents or timestamps 7. hallucination fallback layer - if retrieval confidence drops below a threshold: “insufficient evidence found” 8. continuous evals - run adversarial queries, retrieval recall benchmarks and hallucination tests continuously 9. caching + memory layer - cache high-frequency enterprise queries and retrieval paths (improves latency and output) 10. observability everywhere - trace retrieval paths, chunk rankings, token attribution and failure points Also at 10M docs, retrieval quality matters more than the frontier model itself.
aditya tweet media
English
88
326
2.7K
196.5K
Cameron Fagan
Cameron Fagan@camfagan·
Same. Used it for the full pipeline of a 32-QB transparent-methodology ranking model — verification discipline, NGS vs nflfastR CPOE handled explicitly, every number grep-checked against a canonical file before being written. Posted the results here if you ever want to argue weights or the pressure-to-sack call: muffed.ai/articles/qb-ra…
English
0
0
0
62
Computer Cowboy
Computer Cowboy@benbbaldwin·
Regret to acknowledge that I tried Claude Code and it is indeed good
English
17
1
141
45.6K
Cameron Fagan
Cameron Fagan@camfagan·
Spent a chunk of the last month building this — a transparent-methodology QB ranking model that scored all 32 starting QBs from 2025. Five takes the model defends, including Maye #1 by a wide margin and Mahomes at #10. Methodology is the part most folks won't expect.
Muffed@muffedai

Last week some prominent NFL analysts ranked the QBs on a podcast. Fun, opinionated, deliberately argumentative. We tried the opposite — ranking all 32 starters with an AI model and a published methodology you can argue with. The 5 takes the model defends below. 🧵

English
0
0
1
96
49ers & NFL News 24/7
49ers & NFL News 24/7@49ersSportsTalk·
I still get random nightmares about Super Bowl LIV… The #49ers had it in the bag. Complete domination for 3+ quarters. Mahomes had: • 181 Yards • 0 TDs • 2 INTs And somehow the Niners still lost. Still can’t believe all those holdings never got called either.
English
29
23
451
35.2K