Frex

996 posts

Frex banner
Frex

Frex

@Dirtyfrecks

Former cloud slave currently focused on GTM at a startup. Occasionally investing in DeFi. 🇨🇦

San Francisco, CA Katılım Haziran 2016
688 Takip Edilen986 Takipçiler
Frex
Frex@Dirtyfrecks·
@andruyeung I’m sure they’d love to have you in this role
English
0
0
0
678
Andrew Yeung
Andrew Yeung@andruyeung·
Anthropic is paying up to $400,000 a year for an events role. They're looking for someone to own the execution of brand experiences that translate Anthropic's values into physical moments. This person will produce everything from intimate thought-leadership gatherings to large-scale industry activations. The top AI research lab in the world recognizes that to cross the chasm and reach everyday consumers, they need to lean into hospitality. They need to create visceral, unforgettable IRL experiences that make complex technology feel accessible and human. They understand that digital channels are getting increasingly saturated. Every feed is flooded with AI content... every inbox is overflowing. The massive opportunity now is offline, analog, in-person. The companies that win in the next decade won't just have the best product but the most emotional in-person presence and the most compelling storytelling. If you're in events, experiential marketing, or brand activations, this is your moment. The biggest tech companies in the world are betting on you.
Andrew Yeung tweet mediaAndrew Yeung tweet media
English
136
127
2.2K
1.6M
Frex
Frex@Dirtyfrecks·
@AnjneyMidha Surely this is Recursive super intelligence
English
0
0
0
45
Anjney Midha
Anjney Midha@AnjneyMidha·
it’s quite amazing literally the same ppl who missed the seed/A/B into Anthropic just passed on another team behind one of the generational frontier breakthroughs of the decade meanwhile, a bunch of other folks invested on the spot i don’t fully get why skill issue?
English
52
10
545
188.4K
Frex retweetledi
andrew chen
andrew chen@andrewchen·
“ok this startup is cool but …” 1980: … what if IBM builds this? 1995 … what if Microsoft builds this? 2010 … what if Google builds this? Today … what if builds this? reality is, if founders listened to the “what if” pessimists we’d never have any startups or new products. That’s why they’re building and the pundits aren’t My observation: When these huge waves happen, these new markets are so damn big there will be tens of thousands of new viable companies, hundreds of unicorns, and a few iconic companies that become generational. The big cos play a role but can never compete with the glorious open market known as capitalism So for all the “what if” people - sit down, log off X for a bit, and let the founders do their thing. And let’s cheer them on when they do
English
142
126
1.1K
83.8K
Frex
Frex@Dirtyfrecks·
@andrewchen Great choice sohn is awesome
English
0
0
0
220
andrew chen
andrew chen@andrewchen·
we're opening up an invite-only cafe (lol) to celebrate the kickoff of a16z speedrun 7 applications starting in two weeks Come hang with us in San Francisco and grab matchas/espresso/etc -- we'll also have merch, food, etc RSVP below
andrew chen tweet media
English
47
10
270
38.8K
Frex
Frex@Dirtyfrecks·
Interesting that research backs the importance of a good harness but this is already so obvious. System design is incredibly important when the frontier models are already quite good at coding. So many people out there think they can build a CRM with Claude now but don’t understand how complex it is to build a scalable system.
Alex Prompter@alex_prompter

Holy shit. Stanford just showed that the biggest performance gap in AI systems isn't the model it's the harness. The code wrapping the model. And they built a system that writes better harnesses automatically than humans can by hand. > +7.7 points. 4x fewer tokens. > #1 ranking on an actively contested benchmark. The harness is the code that decides what information an AI model sees at each step what to store, what to retrieve, what context to show. Changing the harness around a fixed model can produce a 6x performance gap on the same benchmark. Most practitioners know this empirically. What nobody had done was automate the process of finding better harnesses. Stanford's Meta-Harness does exactly that: it runs a coding agent in a loop, gives it access to every prior harness it has tried along with the full execution traces and scores, and lets it propose better ones. The agent reads raw code and failure logs not summaries, not scalar scores and figures out why things broke. The key insight is about information. Every prior automated optimization method compressed feedback before handing it to the optimizer. > Scalar scores only. > LLM-generated summaries. > Short templates. Stanford's finding is that this compression destroys exactly the signal you need for harness engineering. A single design choice about what to store in memory can cascade through hundreds of downstream steps. You cannot debug that from a summary. Meta-Harness gives the proposer a filesystem containing every prior harness's source code, execution traces, and scores up to 10 million tokens of diagnostic information per evaluation and lets it use grep and cat to read whatever it needs. Prior methods worked with 100 to 30,000 tokens of feedback. Meta-Harness works with 3 orders of magnitude more. The TerminalBench-2 search trajectory reveals what this actually looks like in practice. The agent ran for 10 iterations on an actively contested coding benchmark. In iterations 1 and 2, it bundled structural fixes with prompt rewrites and both regressed. In iteration 3, it explicitly identified the confound: the prompt changes were the common failure factor, not the structural fixes. It isolated the structural changes, tested them alone, and observed the smallest regression yet. Over the next 4 iterations it kept probing why completion-flow edits were fragile citing specific tasks and turn counts from prior traces as evidence. By iteration 7 it pivoted entirely: instead of modifying the control loop, it added a single environment snapshot before the agent starts, gathering what tools and languages are available in one shell command. That 80-line additive change became the best candidate in the run and ranked #1 among all Haiku 4.5 agents on the benchmark. The numbers across all three domains: → Text classification vs best hand-designed harness (ACE): +7.7 points accuracy, 4x fewer context tokens → Text classification vs best automated optimizer (OpenEvolve, TTT-Discover): matches their final performance in 4 evaluations vs their 60, then surpasses by 10+ points → Full interface vs scores-only ablation: median accuracy 50.0 vs 34.6 raw execution traces are the critical ingredient, summaries don't recover the gap → IMO-level math: +4.7 points average across 5 held-out models that were never seen during search → IMO math: discovered retrieval harness transfers across GPT-5.4-nano, GPT-5.4-mini, Gemini-3.1-Flash-Lite, Gemini-3-Flash, and GPT-OSS-20B → TerminalBench-2 with Haiku 4.5: 37.6% #1 among all reported Haiku 4.5 agents, beating Goose (35.5%) and Terminus-KIRA (33.7%) → TerminalBench-2 with Opus 4.6: 76.4% #2 overall, beating all hand-engineered agents except one whose result couldn't be reproduced from public code → Out-of-distribution text classification on 9 unseen datasets: 73.1% average vs ACE's 70.2% The math harness discovery is the cleanest demonstration of what automated search actually finds. Stanford gave Meta-Harness a corpus of 535,000 solved math problems and told it to find a better retrieval strategy for IMO-level problems. What emerged after 40 iterations was a four-route lexical router: combinatorics problems get deduplicated BM25 with difficulty reranking, geometry problems get one hard reference plus two raw BM25 neighbors, number theory gets reranked toward solutions that state their technique early, and everything else gets adaptive retrieval based on how concentrated the top scores are. Nobody designed this. The agent discovered that different problem types need different retrieval policies by reading through failure traces and iterating on what broke. The ablation table is the most important result in the paper. > Scores only: median 34.6, best 41.3. > Scores plus LLM-generated summary: median 34.9, best 38.7. > Full execution traces: median 50.0, best 56.7. Summaries made things slightly worse than scores alone. The raw traces the actual prompts, tool calls, model outputs, and state updates from every prior run are what drive the improvement. This is not a marginal difference. The full interface outperforms the compressed interface by 15 points at median. Harness engineering requires debugging causal chains across hundreds of steps. You cannot compress that signal. The model has been the focus of the entire AI industry for the last five years. Stanford just showed the wrapper around the model matters just as much and that AI can now write better wrappers than humans can.

English
0
0
0
40
Frex retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: gist.github.com/karpathy/442a6… You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.
Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English
1.1K
2.8K
26.6K
6.9M
Techsaleshackz
Techsaleshackz@techsaleshackz·
Imagine fading this founder Join seed stage Don't think twice Sierra over $150m ARR now
Techsaleshackz tweet media
English
10
0
249
56.5K
John Coogan
John Coogan@johncoogan·
TBPN has been acquired by OpenAI! The show is staying the same and we’ll continue to go live at 11am pacific every weekday. This is a full circle moment for me as I’ve worked with @sama for well over a decade. He funded my first company in 2013. Then helped us fix a serious logjam during a critical funding round a few years later. When I took my second company through YC, he was president at the time, and then when I joined Founders Fund, the first deal I saw in motion was the post-ChatGPT round in late 2022. And as we started growing TBPN last year, he was the very first lab lead to join the show. Thank you to everyone that has been a part of TBPN until now. The last year has been the most fun and rewarding part of my career and we’re excited to have more resources than ever going forward.
English
1.3K
421
8.8K
3.1M
Frex
Frex@Dirtyfrecks·
@hiddnest Just went through this process lol. Shit was like 20% cheaper in Feb.
English
0
0
1
271
Chanhee
Chanhee@hiddnest·
sf housing is so bad that 2b2b is now 7.5k+/mo
English
39
3
455
101.8K
Frex retweetledi
Eric Glyman
Eric Glyman@eglyman·
Since 2023, the top quartile of AI spenders on @tryramp have more than doubled their revenue. Bottom quartile? Flat A roofing company in Texas. A window installer in Utah. A construction firm in Florida that grew 65% The gap is accelerating and most companies don't feel it yet
Eric Glyman tweet media
Eric Glyman@eglyman

x.com/i/article/2036…

English
30
68
597
407.8K
Frex
Frex@Dirtyfrecks·
@Steve_Yegge AI slop rug from a washed up programmer for $300k.
English
0
0
10
2.1K
tap.fun
tap.fun@tapdotfun·
just spent $2000 on a logo what do we think?
tap.fun tweet media
English
258
21
569
40.6K
Frex
Frex@Dirtyfrecks·
Really interesting moment at the intersection of crypto and AI. Foundational models are getting genuinely good now, especially video generation. This is where consumer creativity actually unlocks. Tools like @yapper_so are early signals of what happens when infra quality crosses a usability threshold. Last cycle’s meme meta was “AI memes”, mostly vaporware riding the narrative. This time feels different. Vibe-coding tools are becoming truly consumer-friendly. Products like Lovable and Claude lower the barrier enough that anyone with basic coding intuition can ship. My take: we’re entering an AI 2.0 meta where minimum viability actually means delivering value. Not enterprise scale, but a wave of AI engineers and vibe coders shipping small, opinionated, visually polished products that can move fast and capture attention. Best recent example and the first meme I’ve cared about in a while is @psyopanime. They’re rethinking news as native, anime-style content that’s directly consumable by a younger, internet-first audience. This feels closer to real product market fit than narrative chasing.
English
1
0
1
91
Frex
Frex@Dirtyfrecks·
CT just learned about vibe coding. Queue an upcoming memecoin AI meta Part II
English
0
0
0
58
Frex
Frex@Dirtyfrecks·
@cz_binance Grand hirafu. I’m also here 👀
English
0
0
1
172
CZ 🔶 BNB
CZ 🔶 BNB@cz_binance·
Got recognized a few times on the mountain. I wonder why. 😂
CZ 🔶 BNB tweet media
English
2K
448
6.9K
981.3K
Frex
Frex@Dirtyfrecks·
@mert plot twist: you want hair
English
0
0
0
12
mert
mert@mert·
if SOL is not above $600 by 2026 year end, I get hair transplant
mert tweet media
English
566
61
2.2K
306.4K
Frex
Frex@Dirtyfrecks·
@TyneeWorld Fair, I’m no help there then 😂
English
0
0
0
10
Ty
Ty@TyneeWorld·
@Dirtyfrecks Spamming postings hahah
English
1
0
1
70
Ty
Ty@TyneeWorld·
Damn I got my LinkedIn banned for trying some growth hacks. Rip. Anyone work at LinkedIn that can help me reclaim my account? 😭😭😭
English
10
0
21
2.9K