Tweet fixado
Jeff Noël | OSCP
543 posts

Jeff Noël | OSCP
@jeeacheff
Offensive Security Certified Professional (OSCP) passionate about option trading, cybersecurity, AI and new technologies.
Canada Entrou em Şubat 2009
410 Seguindo156 Seguidores

@bridgemindai Who's ready to discover the usage/rate limits on Grok 😂
English

Elon just confirmed Grok CLI launching next week.
Grok 4.20 is already #1 on BridgeBench Reasoning.
Now it's getting its own CLI and desktop app.

English

@kimmonismus @AnthropicAI A usage reset would be amazing... Max 20X that reset Sunday night and here we are wednesday morning at 65% weekly usage without even using Agent Teams nor /batch.
English

@trq212 Reverse-engineered claude.exe's .bun section and found root causes for the Windows memory leaks: Buffer.slice view retention in MCP IPC + chokidar kernel NPP exhaustion. Details + workarounds (see my recent comments from today for more details):
github.com/anthropics/cla…
English

@bcherny @HackingDave I'm curious if you guys are aware of the /ultraplan issue where the remote workspace doesn't change across repos or doesn't update the repo state and thus it's unusable? ie.: github.com/anthropics/cla…
English

@HackingDave What issues do you have in mind? Here to help if there’s a specific bug you’re running into
English

Think about all the orgs using Claude right now that have no idea how bad it has become over the past 4 weeks ago.
No statement from Claude - but a total revert to where the model was a year ago - which in comparison to when 4.6 got released is effectively last years AI model.
The amount of bugs, security issues, and complete destruction of production applications is going to be felt for quite a long time due to this.
Claude: nothing to see here.
English
Jeff Noël | OSCP retweetou

@bridgebench Not to mention Gemini-CLI is an absolutely terrible vibe coding experience. It can't even edit files correctly, it corrupts them, tries to fix them and just corrupts them even more.
English

Gemini 3.1 Pro ranks dead last among frontier models on BridgeBench Reasoning.
Behind Grok 4.20, GPT 5.4, Claude Opus 4.6, Qwen 3.6 Plus, MiniMax M2.7, Claude Sonnet 4.6, and GLM 5.1.
Google's flagship model can't even beat a free Chinese model on grounded reasoning.
This is why I cancelled my $250/month Google AI Ultra subscription.
Gemini CLI was unreliable.
The model is mid.
The infrastructure is worse.
Google has the compute.
Google has the data.
They just can't ship a competitive coding model.

English

CLAUDE OPUS 4.6 IS NERFED.
BridgeBench just proved it.
Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%.
Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%.
A 98% increase in hallucination.
bridgebench.ai just confirmed that Claude Opus 4.6 has reduced reasoning levels and is nerfed.

English

@bridgemindai Man we need the full suite now 😭 I've been wondering how bad Opus 4.6 would be now on all the State of the art benchmarks as well.
English
Jeff Noël | OSCP retweetou

@thsottiaux @ai_for_success Can we expect it soon, sooner, or soonest? ;)
English

@ai_for_success that was the small plan, big plan is still coming
English

@thsottiaux Please pretty please don't lock the good stuff for big companies only.
English

@bryan_johnson You still shed some light on the microplastic reduction even below that specific threshold though. Which is super cool (or hot).
English

I think I need to be fired.
I've done 232 dry sauna sessions.
Last week I confirmed, for the first time (by swallowing a pill), whether the core temperature threshold that gates the primary cellular repair mechanism was actually being reached in my protocol.
The threshold is 102.2°F (39.0°C). For me, that takes 33 min at 195°F. With ice on face and neck, 38min.
My standard daily protocol was 20 minutes. That wasn’t enough time to get my core body temp to the heat shock threshold of 102.2°F (39.0°C).
Causing me to ask, did I just waste 77 hours and 20 min?
It's possible my heat threshold has increased and the heat shock protein release was happening previously, but I doubt it based upon the subjective feeling I now understand as being 102.2F (39.0°C). It’s brutal.
For these 232 sessions, I measured the temperature of the air, humidity, duration, frequency, the sweat output, blood biomarkers, vascular response, toxin clearance and fertility markers. There is no human body in history that has been more measured in sauna than mine.
Nevertheless, I did not confirm the one number that determines whether the primary mechanism was activating.
My goal wasn't to be a sauna bro. It was to saunamaxx. I was doing the former while thinking I was doing the latter.
I rest my case. I should probably be fired.
English

@LeMikaelF @bcherny @GergelyOrosz Keep in mind that if you use plan mode or /plan, it'll switch to medium effort automatically every time.
English

@bcherny @GergelyOrosz I just had Claude Code switch itself from Max effort to Medium twice this morning. Now I'm going to check more often to see if it's still on Max.
English

Anthropic really is burning more and more dev goodwill
Claude Code is suddenly getting unusable for stuff you could use it before (as in a day before!) and the AI now refuses to so stuff that it doesn’t think is strictly to do with software development.
No transparency why ofc
Theo - t3.gg@theo
Claude Code is basically unusable at this point. I give up.
English

@grenierdev @claudeai 100%, that's why I also got an OpenAI subscription. Just sucks to have so many 529s (server overloaded) when you're paying for a product - especially after the OpenClaw and all third party ban they just did.
English

@claudeai The upcoming model better be a banger, ~98.8-99.2% uptime is quite subpar right now.


English

@bryan_johnson @OldeWorldTales Makes me wonder about the effect it would have on your to breathe "deep in the forest" air for a bit compared to your heavily filtered in-house air.
English

Guys, I’m an idiot. All this time I’ve spent trying not to die, I had toxic turf in my backyard. Artificial turf contains crumb rubber infill made from recycled tires, which leaches chemicals including PFAS, heavy metals, and polycyclic aromatic hydrocarbons. These compounds are linked to hormone disruption, carcinogenicity, and systemic inflammation.
I don’t know how I missed it. It makes me question my basic competence in life.
What gets me is that I try so hard to survey the world of potential idiocy. Then I find out there’s a monument to idiocy sitting right in front of my face that I was blind to.
I’m removing the turf, yet I’m still stuck with this seemingly unsolvable problem of how to not be an idiot.
English

@bcherny Does that mean the peak hours restrictions will be removed now that openclaw isn't there?
English
Jeff Noël | OSCP retweetou

🚨EXCLUSIVE: Leaked benchmark scores for Anthropic's upcoming huge flagship model, Mythos. It will launch standalone, not as part of the Claude 4.x/5 series.
Benchmark (vs Opus 4.6):
Terminal-Bench 2.0: 78.4% (+13.0%)
SWE-bench Verified: 87.4% (+6.6%)
OSWorld: 79.6% (+6.9%)
𝜏²-bench: Retail 95.1% (+3.2%), Telecom 99.9% (+0.6%)
MCP Atlas: 75.7% (+16.2%)
BrowseComp: 92.3% (+8.3%)
Humanity's Last Exam: 52.3% (w/o tools, +12.3%), 71.5% (w/ tools, +18.5%)
Finance Agent: 82.1% (+21.4%)
GDPVal-AA-Elo: 2668 (+1062)

English







