Tim Michaud

1.2K posts

Tim Michaud

@TimGMichaud

Founder @ New thing - (YC Alum) still a Security Nerd.

Middle of Freakin Nowhere Katılım Mart 2012

923 Takip Edilen1.3K Takipçiler

Tim Michaud@TimGMichaud·10 Haz

If you build or maintain fuzzers for a living: what's the part you'd happily never deal with again? Harness rot, fleet babysitting, triaging noise, something else? Not selling anything, just trying to figure out if the boring grind is a real enough problem to solve.

English

Tim Michaud@TimGMichaud·19 May

Thinking effort doesn't fix hallucination. Even the best frontier model at matched HIGH still gets 24.2% of fields wrong on adversarial insurance docs. Going from default to HIGH buys 0-2pp per model. aginor.ai/extraction-tes…

English

Tim Michaud@TimGMichaud·27 Nis

What makes this different: the generator emits the rendered document AND the ground-truth JSON in the same pass. No annotation step. Ground truth is authoritative by construction. Full writeup, raw outputs, repo, 25-doc sample packet: aginor.ai/extraction-tes…

English

Tim Michaud@TimGMichaud·27 Nis

Across all five models, 37% of extractions scored below 0.5 composite without ever tripping a catastrophic-error flag. Production pipelines don't break loudly on these documents. They degrade silently, underneath whatever review threshold you trained your reviewers on.

English

Tim Michaud@TimGMichaud·27 Nis

GPT-5.5 reported $405.9M of revenue on a document that says $95M. GPT-5.4 said $40.6M on the same page. I built 148 adversarial insurance documents to test five frontier models. The numbers got weird.

English

133

Tim Michaud retweetledi

[email protected]@daviddiaul·22 Nis

I’m #hiring an individual contributor for a fully remote, global role at the intersection of vulnerability research, exploit development, and ML/AI — with a focus on fine-tuning open-weight #LLMs. 🧠 I’m not looking for an “LLM whisperer” or an “LLM pilot.” 🚫 I’m looking for someone who deeply understands post-training, data, evaluation, and how to make models reliable in real-world environments. 🔐 The application link is in the first comment. 🌍 #Hiring #LLM #AI #ML #FineTuning #CyberSecurity #llmwhisperer #llmpilot

English

25.9K

Tim Michaud@TimGMichaud·16 Nis

@GergelyOrosz Yeah I had this turn off on me before; SUPER annoying cause it's not obvious that it's off (or on!) :|

English

Gergely Orosz@GergelyOrosz·16 Nis

Claude just keeps regressing for me, day after day. I swear that until a few days ago, when Claude did not know something, it kicked off a web search, figured out, and answered. Now it just refuses to do the work that I pay for. It's like showing you the middle finger. Really?

English

242

2.2K

199.2K

Tim Michaud@TimGMichaud·16 Nis

@b1ack0wl Started a few companies (2 boot strapped 1 VC backed, new one bootstrapped but will very likely go raise) - happy to chat about it if it helps!

English

b1ack0wl@b1ack0wl·15 Nis

I did some light homework into this and it's a very risky move with a high degree of failure I would have to start with the following: * Figure out a name and register it * Register an LLC * Come up with a business plan + portfolio * Obtain a SMB loan * Figure out the rest

b1ack0wl@b1ack0wl

ngl I think about this every so often. after decades of looking at embedded, mobile, cloud, windows (userland), and linux (userland+kernel) I feel like I have a foundation to create something of my own. but at the same time throwing myself at a high risk idea is a bit spooky

English

3.2K

Tim Michaud@TimGMichaud·12 Nis

I think this is a mix of what @susantejuosho (x.com/susantejuosho/…) said, and also the changing demographic. YC used to target "older" founders who were used to the way things worked at big companies; the "you can just do things"/"go fast"/"do things that don't scale" was to help re-orient people from how things worked at big tech. But as they start having younger and younger people join, who do not have that context, the messaging is heavily muddled and distorted.

English

111

Zack Korman@ZackKorman·12 Nis

New video: Y Combinator lets you cross the line. The adults in the room aren't going to stop you from doing something seriously wrong. Young founders need to be aware of that. youtube.com/watch?v=ptT_LG…

YouTube

English

149

14.6K

Tim Michaud@TimGMichaud·12 Nis

@GergelyOrosz Happened to us last year; was such a PITA we ended up cancelling.

English

Gergely Orosz@GergelyOrosz·11 Nis

Damn annoying how a subscription service like Netflix deliberately doesn’t support offline mode. Got on a plane, wanted to watch my downloaded series, and could not. Netflix has an A+ eng team, so this is deliberate. But eg Apple TV doesn’t have this silly restriction.

English

509

127.9K

Tim Michaud@TimGMichaud·12 Nis

@HackingDave Honestly if 5.5 is an improved 5.3xhigh I think we might see a switch back towards OAI.

English

962

Dave Kennedy@HackingDave·12 Nis

I understand there’s a ton of Claude fans out there. I was there too 4-5 weeks ago. Then it got way way way worse and without explanation. What’s worse is that I would consider myself a heavy power user. What about the folks that aren’t and have no idea it was dumbed down over 60% or more since its initial release and are using this day to day. Codex is outperforming Claude in every way right now. Additionally OpenAI is much more transparent, cheaper, and produces much better code in every way to Claude at the moment. Claude has massive outages, lack of transparency to users, some really bad operational and security practices. They’ve lost me. I’m done. What happened ?

English

162

93.5K

Tim Michaud@TimGMichaud·12 Nis

@thefineprintesq Oh that's interesting; thanks for the info :)!

English

Pavin@thefineprintesq·12 Nis

@TimGMichaud My brother in Christ, hell no. Every big law firm thinks it’s reinvented the wheel. Surprise, clients share it with their other firms and it cycles into a fairly amorphous market template. Some firms have decent data, but it gets swallowed up by the same

English

Pavin@thefineprintesq·10 Nis

This kills what 50% of legal ai startups?

Claude@claudeai

Claude for Word is now in beta. Draft, edit, and revise documents directly from the sidebar. Claude preserves your formatting, and edits appear as tracked changes. Available on Team and Enterprise plans.

English

1.3K

265.3K

Tim Michaud@TimGMichaud·12 Nis

I think a lot of people are letting contexts grow too close to 300k+ tokens which is where capabilities start to drop off; but I think there's a good chance there is a "I built my early project on AI and it was FAST; it's now way more complex", and therefore giving them more issues that add further complexity

English

178

martin@martinlindenn·11 Nis

Am I the only one who isn't shitting on Anthropic right now? Claude is working perfectly fine for me today.

English

131

12K

Tim Michaud@TimGMichaud·12 Nis

@spiritbuun Forcibly setting the effort level + forcing compaction well before 300k tokens and using subagents for many things has definitely kept things closer to how they used to be.

English

2.1K

buun@spiritbuun·11 Nis

If Claude's quality has been falling off a cliff for you over the past few days, try: CLAUDE_CODE_DISABLE_1M_CONTEXT=1 CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 ANTHROPIC_DEFAULT_OPUS_MODEL="claude-4-6-opus" Have your agent save state, /clear, restart.

English

2.1K

195.1K

Tim Michaud@TimGMichaud·12 Nis

Codex (5.3xhigh) is a lot closer to CC than when I first used it; hope the gap continues to close.

English

153

Tim Michaud@TimGMichaud·10 Nis

@MartinGTobias Latency I think is the bigger win for SLMs, and as companies have better data (or buy it) to train the models why rely on a third party when your own model is better/faster/cheaper.

English

Martin Tobias (Pre-Seed VC)@MartinGTobias·10 Nis

does anyone believe local task specific SLMs will have a place in a world where general LLMS are battling on costs and improving at a rapid pace?

English

Keşfet

@GergelyOrosz @b1ack0wl @susantejuosho @HackingDave @thefineprintesq @elonmusk @BarackObama @taylorswift13