Frex

996 posts

Frex

@Dirtyfrecks

Former cloud slave currently focused on GTM at a startup. Occasionally investing in DeFi. 🇨🇦

San Francisco, CA Katılım Haziran 2016

688 Takip Edilen986 Takipçiler

Frex@Dirtyfrecks·2d

Been slowly observing this. Top 1% of PM's are arguably some of the most valuable people at the orgs. Seeing more traditional CS backgrounds who are more customer facing make this shift. Best are also able to dive into the trenches themselves to ship and fix bugs.

andrew chen@andrewchen

bullish on the PM role quietly becoming the most important role in tech again when anyone can build, the person who decides WHAT to build becomes the bottleneck

English

Frex@Dirtyfrecks·27 Nis

@andruyeung I’m sure they’d love to have you in this role

English

678

Andrew Yeung@andruyeung·27 Nis

Anthropic is paying up to $400,000 a year for an events role. They're looking for someone to own the execution of brand experiences that translate Anthropic's values into physical moments. This person will produce everything from intimate thought-leadership gatherings to large-scale industry activations. The top AI research lab in the world recognizes that to cross the chasm and reach everyday consumers, they need to lean into hospitality. They need to create visceral, unforgettable IRL experiences that make complex technology feel accessible and human. They understand that digital channels are getting increasingly saturated. Every feed is flooded with AI content... every inbox is overflowing. The massive opportunity now is offline, analog, in-person. The companies that win in the next decade won't just have the best product but the most emotional in-person presence and the most compelling storytelling. If you're in events, experiential marketing, or brand activations, this is your moment. The biggest tech companies in the world are betting on you.

English

136

127

2.2K

1.6M

Frex@Dirtyfrecks·22 Nis

@AnjneyMidha Surely this is Recursive super intelligence

English

Anjney Midha@AnjneyMidha·19 Nis

it’s quite amazing literally the same ppl who missed the seed/A/B into Anthropic just passed on another team behind one of the generational frontier breakthroughs of the decade meanwhile, a bunch of other folks invested on the spot i don’t fully get why skill issue?

English

545

188.4K

Frex@Dirtyfrecks·20 Nis

@AnjneyMidha periodic labs coded

English

141

Frex retweetledi

andrew chen@andrewchen·13 Nis

“ok this startup is cool but …” 1980: … what if IBM builds this? 1995 … what if Microsoft builds this? 2010 … what if Google builds this? Today … what if builds this? reality is, if founders listened to the “what if” pessimists we’d never have any startups or new products. That’s why they’re building and the pundits aren’t My observation: When these huge waves happen, these new markets are so damn big there will be tens of thousands of new viable companies, hundreds of unicorns, and a few iconic companies that become generational. The big cos play a role but can never compete with the glorious open market known as capitalism So for all the “what if” people - sit down, log off X for a bit, and let the founders do their thing. And let’s cheer them on when they do

English

142

126

1.1K

83.8K

Frex@Dirtyfrecks·9 Nis

@andrewchen Great choice sohn is awesome

English

220

andrew chen@andrewchen·9 Nis

RSVP links for to speedrun cafe pop-up on... Mon (4/20): partiful.com/e/eviRxQ31mknU… Tues: partiful.com/e/T0y2y4MGMBVy… Weds: partiful.com/e/M8uh738PEN2j… Thurs: partiful.com/e/gmR1KZ2QEfR2… Fri (4/24): partiful.com/e/PPsV8hSnTBwu… There are limited spots + merch so RSVP now to attend! thank you @immad @mercury for partnering up on this!

English

5.5K

andrew chen@andrewchen·9 Nis

we're opening up an invite-only cafe (lol) to celebrate the kickoff of a16z speedrun 7 applications starting in two weeks Come hang with us in San Francisco and grab matchas/espresso/etc -- we'll also have merch, food, etc RSVP below

English

270

38.8K

Frex@Dirtyfrecks·5 Nis

Interesting that research backs the importance of a good harness but this is already so obvious. System design is incredibly important when the frontier models are already quite good at coding. So many people out there think they can build a CRM with Claude now but don’t understand how complex it is to build a scalable system.

Alex Prompter@alex_prompter

Holy shit. Stanford just showed that the biggest performance gap in AI systems isn't the model it's the harness. The code wrapping the model. And they built a system that writes better harnesses automatically than humans can by hand. > +7.7 points. 4x fewer tokens. > #1 ranking on an actively contested benchmark. The harness is the code that decides what information an AI model sees at each step what to store, what to retrieve, what context to show. Changing the harness around a fixed model can produce a 6x performance gap on the same benchmark. Most practitioners know this empirically. What nobody had done was automate the process of finding better harnesses. Stanford's Meta-Harness does exactly that: it runs a coding agent in a loop, gives it access to every prior harness it has tried along with the full execution traces and scores, and lets it propose better ones. The agent reads raw code and failure logs not summaries, not scalar scores and figures out why things broke. The key insight is about information. Every prior automated optimization method compressed feedback before handing it to the optimizer. > Scalar scores only. > LLM-generated summaries. > Short templates. Stanford's finding is that this compression destroys exactly the signal you need for harness engineering. A single design choice about what to store in memory can cascade through hundreds of downstream steps. You cannot debug that from a summary. Meta-Harness gives the proposer a filesystem containing every prior harness's source code, execution traces, and scores up to 10 million tokens of diagnostic information per evaluation and lets it use grep and cat to read whatever it needs. Prior methods worked with 100 to 30,000 tokens of feedback. Meta-Harness works with 3 orders of magnitude more. The TerminalBench-2 search trajectory reveals what this actually looks like in practice. The agent ran for 10 iterations on an actively contested coding benchmark. In iterations 1 and 2, it bundled structural fixes with prompt rewrites and both regressed. In iteration 3, it explicitly identified the confound: the prompt changes were the common failure factor, not the structural fixes. It isolated the structural changes, tested them alone, and observed the smallest regression yet. Over the next 4 iterations it kept probing why completion-flow edits were fragile citing specific tasks and turn counts from prior traces as evidence. By iteration 7 it pivoted entirely: instead of modifying the control loop, it added a single environment snapshot before the agent starts, gathering what tools and languages are available in one shell command. That 80-line additive change became the best candidate in the run and ranked #1 among all Haiku 4.5 agents on the benchmark. The numbers across all three domains: → Text classification vs best hand-designed harness (ACE): +7.7 points accuracy, 4x fewer context tokens → Text classification vs best automated optimizer (OpenEvolve, TTT-Discover): matches their final performance in 4 evaluations vs their 60, then surpasses by 10+ points → Full interface vs scores-only ablation: median accuracy 50.0 vs 34.6 raw execution traces are the critical ingredient, summaries don't recover the gap → IMO-level math: +4.7 points average across 5 held-out models that were never seen during search → IMO math: discovered retrieval harness transfers across GPT-5.4-nano, GPT-5.4-mini, Gemini-3.1-Flash-Lite, Gemini-3-Flash, and GPT-OSS-20B → TerminalBench-2 with Haiku 4.5: 37.6% #1 among all reported Haiku 4.5 agents, beating Goose (35.5%) and Terminus-KIRA (33.7%) → TerminalBench-2 with Opus 4.6: 76.4% #2 overall, beating all hand-engineered agents except one whose result couldn't be reproduced from public code → Out-of-distribution text classification on 9 unseen datasets: 73.1% average vs ACE's 70.2% The math harness discovery is the cleanest demonstration of what automated search actually finds. Stanford gave Meta-Harness a corpus of 535,000 solved math problems and told it to find a better retrieval strategy for IMO-level problems. What emerged after 40 iterations was a four-route lexical router: combinatorics problems get deduplicated BM25 with difficulty reranking, geometry problems get one hard reference plus two raw BM25 neighbors, number theory gets reranked toward solutions that state their technique early, and everything else gets adaptive retrieval based on how concentrated the top scores are. Nobody designed this. The agent discovered that different problem types need different retrieval policies by reading through failure traces and iterating on what broke. The ablation table is the most important result in the paper. > Scores only: median 34.6, best 41.3. > Scores plus LLM-generated summary: median 34.9, best 38.7. > Full execution traces: median 50.0, best 56.7. Summaries made things slightly worse than scores alone. The raw traces the actual prompts, tool calls, model outputs, and state updates from every prior run are what drive the improvement. This is not a marginal difference. The full interface outperforms the compressed interface by 15 points at median. Harness engineering requires debugging causal chains across hundreds of steps. You cannot compress that signal. The model has been the focus of the entire AI industry for the last five years. Stanford just showed the wrapper around the model matters just as much and that AI can now write better wrappers than humans can.

English

Frex retweetledi

Andrej Karpathy@karpathy·4 Nis

Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: gist.github.com/karpathy/442a6… You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.

Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

1.1K

2.8K

26.6K

6.9M

Frex@Dirtyfrecks·3 Nis

@techsaleshackz Bret is cracked

English

Techsaleshackz@techsaleshackz·3 Nis

Imagine fading this founder Join seed stage Don't think twice Sierra over $150m ARR now

English

249

56.5K

Frex@Dirtyfrecks·2 Nis

@johncoogan @sama That’s dope.

English

John Coogan@johncoogan·2 Nis

TBPN has been acquired by OpenAI! The show is staying the same and we’ll continue to go live at 11am pacific every weekday. This is a full circle moment for me as I’ve worked with @sama for well over a decade. He funded my first company in 2013. Then helped us fix a serious logjam during a critical funding round a few years later. When I took my second company through YC, he was president at the time, and then when I joined Founders Fund, the first deal I saw in motion was the post-ChatGPT round in late 2022. And as we started growing TBPN last year, he was the very first lab lead to join the show. Thank you to everyone that has been a part of TBPN until now. The last year has been the most fun and rewarding part of my career and we’re excited to have more resources than ever going forward.

English

1.3K

421

8.8K

3.1M

Frex@Dirtyfrecks·30 Mar

@hiddnest Just went through this process lol. Shit was like 20% cheaper in Feb.

English

271

Chanhee@hiddnest·30 Mar

sf housing is so bad that 2b2b is now 7.5k+/mo

English

455

101.8K

Frex retweetledi

Eric Glyman@eglyman·25 Mar

Since 2023, the top quartile of AI spenders on @tryramp have more than doubled their revenue. Bottom quartile? Flat A roofing company in Texas. A window installer in Utah. A construction firm in Florida that grew 65% The gap is accelerating and most companies don't feel it yet

Eric Glyman@eglyman

x.com/i/article/2036…

English

597

407.8K

Frex@Dirtyfrecks·28 Oca

So simple but creative. Immediately forwarded this to my fantasy football leagues. Am sure some friendships would get destroyed over this but used properly can add some fun flares to a group chat.

Neesh 🥭@Neesh774

turn that "i told you so" into 20 bucks introducing Wager • prediction markets with your friends

English

183

Frex@Dirtyfrecks·19 Oca

@Steve_Yegge AI slop rug from a washed up programmer for $300k.

English

2.1K

Steve Yegge@Steve_Yegge·19 Oca

Update post for my imminent b-day: steve-yegge.medium.com/steveys-birthd… Shit is basically happening so fast that I have to post a gigantic goddamn post every week just to catch you up. So, enjoy.

English

243

360

629.8K

Frex@Dirtyfrecks·17 Oca

@tapdotfun The logo is clean

English

tap.fun@tapdotfun·17 Oca

just spent $2000 on a logo what do we think?

English

258

569

40.6K

Frex@Dirtyfrecks·12 Oca

Really interesting moment at the intersection of crypto and AI. Foundational models are getting genuinely good now, especially video generation. This is where consumer creativity actually unlocks. Tools like @yapper_so are early signals of what happens when infra quality crosses a usability threshold. Last cycle’s meme meta was “AI memes”, mostly vaporware riding the narrative. This time feels different. Vibe-coding tools are becoming truly consumer-friendly. Products like Lovable and Claude lower the barrier enough that anyone with basic coding intuition can ship. My take: we’re entering an AI 2.0 meta where minimum viability actually means delivering value. Not enterprise scale, but a wave of AI engineers and vibe coders shipping small, opinionated, visually polished products that can move fast and capture attention. Best recent example and the first meme I’ve cared about in a while is @psyopanime. They’re rethinking news as native, anime-style content that’s directly consumable by a younger, internet-first audience. This feels closer to real product market fit than narrative chasing.

English