Tim Michaud

1.2K posts

Tim Michaud

Tim Michaud

@TimGMichaud

Founder @ New thing - (YC Alum) still a Security Nerd.

Middle of Freakin Nowhere Katılım Mart 2012
920 Takip Edilen1.3K Takipçiler
Tim Michaud
Tim Michaud@TimGMichaud·
Thinking effort doesn't fix hallucination. Even the best frontier model at matched HIGH still gets 24.2% of fields wrong on adversarial insurance docs. Going from default to HIGH buys 0-2pp per model. aginor.ai/extraction-tes…
Tim Michaud tweet media
English
0
0
0
75
Tim Michaud
Tim Michaud@TimGMichaud·
What makes this different: the generator emits the rendered document AND the ground-truth JSON in the same pass. No annotation step. Ground truth is authoritative by construction. Full writeup, raw outputs, repo, 25-doc sample packet: aginor.ai/extraction-tes…
English
0
0
0
27
Tim Michaud
Tim Michaud@TimGMichaud·
Across all five models, 37% of extractions scored below 0.5 composite without ever tripping a catastrophic-error flag. Production pipelines don't break loudly on these documents. They degrade silently, underneath whatever review threshold you trained your reviewers on.
English
1
0
0
36
Tim Michaud
Tim Michaud@TimGMichaud·
GPT-5.5 reported $405.9M of revenue on a document that says $95M. GPT-5.4 said $40.6M on the same page. I built 148 adversarial insurance documents to test five frontier models. The numbers got weird.
Tim Michaud tweet media
English
1
0
0
126
Tim Michaud retweetledi
diaul@infosec.exchange
[email protected]@daviddiaul·
I’m #hiring an individual contributor for a fully remote, global role at the intersection of vulnerability research, exploit development, and ML/AI — with a focus on fine-tuning open-weight #LLMs. 🧠 I’m not looking for an “LLM whisperer” or an “LLM pilot.” 🚫 I’m looking for someone who deeply understands post-training, data, evaluation, and how to make models reliable in real-world environments. 🔐 The application link is in the first comment. 🌍 #Hiring #LLM #AI #ML #FineTuning #CyberSecurity #llmwhisperer #llmpilot
English
2
20
70
25.7K
Tim Michaud
Tim Michaud@TimGMichaud·
@GergelyOrosz Yeah I had this turn off on me before; SUPER annoying cause it's not obvious that it's off (or on!) :|
English
0
0
1
55
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Claude just keeps regressing for me, day after day. I swear that until a few days ago, when Claude did not know something, it kicked off a web search, figured out, and answered. Now it just refuses to do the work that I pay for. It's like showing you the middle finger. Really?
Gergely Orosz tweet media
English
246
72
2.2K
198.9K
Tim Michaud
Tim Michaud@TimGMichaud·
@b1ack0wl Started a few companies (2 boot strapped 1 VC backed, new one bootstrapped but will very likely go raise) - happy to chat about it if it helps!
English
0
0
1
67
b1ack0wl
b1ack0wl@b1ack0wl·
I did some light homework into this and it's a very risky move with a high degree of failure I would have to start with the following: * Figure out a name and register it * Register an LLC * Come up with a business plan + portfolio * Obtain a SMB loan * Figure out the rest
b1ack0wl@b1ack0wl

ngl I think about this every so often. after decades of looking at embedded, mobile, cloud, windows (userland), and linux (userland+kernel) I feel like I have a foundation to create something of my own. but at the same time throwing myself at a high risk idea is a bit spooky

English
7
1
16
3.2K
Tim Michaud
Tim Michaud@TimGMichaud·
I think this is a mix of what @susantejuosho (x.com/susantejuosho/…) said, and also the changing demographic. YC used to target "older" founders who were used to the way things worked at big companies; the "you can just do things"/"go fast"/"do things that don't scale" was to help re-orient people from how things worked at big tech. But as they start having younger and younger people join, who do not have that context, the messaging is heavily muddled and distorted.
English
0
0
2
111
Zack Korman
Zack Korman@ZackKorman·
New video: Y Combinator lets you cross the line. The adults in the room aren't going to stop you from doing something seriously wrong. Young founders need to be aware of that. youtube.com/watch?v=ptT_LG…
YouTube video
YouTube
Zack Korman tweet media
English
15
13
152
14.6K
Tim Michaud
Tim Michaud@TimGMichaud·
@GergelyOrosz Happened to us last year; was such a PITA we ended up cancelling.
English
0
0
0
33
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Damn annoying how a subscription service like Netflix deliberately doesn’t support offline mode. Got on a plane, wanted to watch my downloaded series, and could not. Netflix has an A+ eng team, so this is deliberate. But eg Apple TV doesn’t have this silly restriction.
Gergely Orosz tweet media
English
87
5
514
127.8K
Tim Michaud
Tim Michaud@TimGMichaud·
@HackingDave Honestly if 5.5 is an improved 5.3xhigh I think we might see a switch back towards OAI.
English
0
0
0
962
Dave Kennedy
Dave Kennedy@HackingDave·
I understand there’s a ton of Claude fans out there. I was there too 4-5 weeks ago. Then it got way way way worse and without explanation. What’s worse is that I would consider myself a heavy power user. What about the folks that aren’t and have no idea it was dumbed down over 60% or more since its initial release and are using this day to day. Codex is outperforming Claude in every way right now. Additionally OpenAI is much more transparent, cheaper, and produces much better code in every way to Claude at the moment. Claude has massive outages, lack of transparency to users, some really bad operational and security practices. They’ve lost me. I’m done. What happened ?
English
164
62
1K
93.4K
Pavin Kang
Pavin Kang@thefineprintesq·
@TimGMichaud My brother in Christ, hell no. Every big law firm thinks it’s reinvented the wheel. Surprise, clients share it with their other firms and it cycles into a fairly amorphous market template. Some firms have decent data, but it gets swallowed up by the same
English
1
0
2
89
Tim Michaud
Tim Michaud@TimGMichaud·
I think a lot of people are letting contexts grow too close to 300k+ tokens which is where capabilities start to drop off; but I think there's a good chance there is a "I built my early project on AI and it was FAST; it's now way more complex", and therefore giving them more issues that add further complexity
English
0
0
1
178
martin
martin@martinlindenn·
Am I the only one who isn't shitting on Anthropic right now? Claude is working perfectly fine for me today.
English
71
1
134
12K
Tim Michaud
Tim Michaud@TimGMichaud·
@spiritbuun Forcibly setting the effort level + forcing compaction well before 300k tokens and using subagents for many things has definitely kept things closer to how they used to be.
English
0
0
2
2.1K
buun
buun@spiritbuun·
If Claude's quality has been falling off a cliff for you over the past few days, try: CLAUDE_CODE_DISABLE_1M_CONTEXT=1 CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 ANTHROPIC_DEFAULT_OPUS_MODEL="claude-4-6-opus" Have your agent save state, /clear, restart.
English
53
79
2.2K
194.2K
Tim Michaud
Tim Michaud@TimGMichaud·
Codex (5.3xhigh) is a lot closer to CC than when I first used it; hope the gap continues to close.
English
1
0
0
151
Tim Michaud
Tim Michaud@TimGMichaud·
@MartinGTobias Latency I think is the bigger win for SLMs, and as companies have better data (or buy it) to train the models why rely on a third party when your own model is better/faster/cheaper.
English
0
0
1
95
Martin Tobias (Pre-Seed VC)
Martin Tobias (Pre-Seed VC)@MartinGTobias·
does anyone believe local task specific SLMs will have a place in a world where general LLMS are battling on costs and improving at a rapid pace?
English
29
1
35
4K
Tim Michaud
Tim Michaud@TimGMichaud·
@HackingDave Not my experience on the Claude side, though neither of them have ever had anything more than ~mid level engineer FMPOV.
English
1
0
1
308
Dave Kennedy
Dave Kennedy@HackingDave·
Dude Claude is total trash - seen massive degrading of code quality, bugs, and more over the past several weeks. This week, I can’t even use it or rely on it to complete basic bug fixes or implementations. Codex has been performing substantially better. Anyone else ?
English
356
27
837
100.8K