Arav Patel

484 posts

Arav Patel banner
Arav Patel

Arav Patel

@aravpatel_

AI x Crypto | Ex-FDE @ HappyRobot (YC W23) | SF

San Francisco, CA Katılım Kasım 2024
86 Takip Edilen88 Takipçiler
Nikhil Srivastava
Nikhil Srivastava@nick_sriv·
Little do customers know some of our internal codenames came from my favorite childhood movies
Nikhil Srivastava tweet media
English
1
0
4
297
Om Patel
Om Patel@om_patel5·
we weren't wrong about Opus getting weaker Opus 4.6 is ranked #2 on this hallucination benchmark with 87.6 and 83.3% accuracy. the April 12 version of the same model is #10. score dropped to 73.3 and accuracy dropped to 68.3% with the fabrication rate nearly doubling from 16.7% to 33.0%. it's now ranked below Qwen, Gemini, Grok, and even Claude Sonnet 4.6
Om Patel tweet media
English
5
2
26
2.5K
Arav Patel
Arav Patel@aravpatel_·
On a complete side note Lamine Yamal is the best player on the planet right now what a player
English
0
0
0
22
Arav Patel
Arav Patel@aravpatel_·
They really hyping up Mythos
English
0
0
0
18
Arav Patel
Arav Patel@aravpatel_·
@om_patel5 Been noticing this, they gotta bump it back up to when Opus was insane I miss it
English
0
0
0
692
Om Patel
Om Patel@om_patel5·
OPUS 4.6 JUST ADMITTED ITS REASONING EFFORT IS SET TO 25 OUT OF 100 this guy told Claude to admit Anthropic made it dumber and reduced its effort level Claude's extended thinking showed it could literally see a reasoning_effort tag set to 25 in its own system prompt then it confirmed it: reasoning effort is set to 25 out of 100 which is an Anthropic system setting not something the user controls. you're paying FULL PRICE for a quarter of the thinking right now with insane usage limits screw it im switching to codex until mythos drops (if it even drops lol)
Om Patel tweet mediaOm Patel tweet media
English
144
161
1.4K
168.1K
Arav Patel
Arav Patel@aravpatel_·
Claude Dispatch kinda mid ngl Forcing computer to stay on defeats part of the seamlessness of openclaw
English
0
0
0
31
Arav Patel
Arav Patel@aravpatel_·
NYC is lit, But now we head back to sf and the grind continues
English
0
0
1
23
Arav Patel
Arav Patel@aravpatel_·
@om_patel5 Genuinely thanks for this post It’s been hard to keep up with everything, but this was a great recap of what has occurred for the last week and a half
English
0
0
0
59
Om Patel
Om Patel@om_patel5·
so here's everything that's happened with Mythos this week because it's genuinely hard to keep up anthropic announced Claude Mythos Preview on april 7th alongside something called Project Glasswing. it's their most capable model ever and they're not releasing it to the public. the reason: it's too good at breaking things during testing, Mythos found thousands of zero-day vulnerabilities across every major operating system and every major web browser. real bugs that have been sitting unpatched for decades some highlights: > a 27-year-old bug in OpenBSD, one of the most security-hardened operating systems ever built > a 17-year-old remote code execution vulnerability in FreeBSD that gives any unauthenticated attacker on the internet full root access > a 16-year-old bug in FFmpeg that automated tools had scanned 5 million times without catching > it autonomously chained together four browser vulnerabilities to escape both the renderer and OS sandboxes > it solved a corporate network attack simulation that would take a human expert 10+ hours anthropic says these capabilities weren't trained intentionally. they emerged as a side effect of making the model better at coding and reasoning. same skills that make it good at patching code also make it good at exploiting it. so instead of releasing it publicly they formed Project Glasswing, a coalition of AWS, Apple, Microsoft, Google, Cisco, CrowdStrike, NVIDIA, JPMorgan, Broadcom, the Linux Foundation, and Palo Alto Networks. plus 40+ other organizations maintaining critical infrastructure. the idea: get this into defenders' hands first before similar capabilities spread to attackers. anthropic committed $100M in usage credits and $4M in direct donations to open-source security organizations. pricing for partners is $25/$125 per million input/output tokens. then things got interesting. the US Treasury Secretary and Fed Chair Jerome Powell held an urgent meeting with major bank CEOs specifically about the cyber risks from Mythos. they're treating this as a potential threat to financial stability an OpenAI researcher posted that his roommate who works at Anthropic "lost his mind" over Mythos internally. he confirmed it wasn't a shitpost. people apparently know who the roommate is meanwhile the Mythos memes are out of control all over the internet as we speak but there's also real skepticism. some security researchers are pointing out that smaller open-source models can already find many of the same vulnerabilities when given the right context. the argument is that Mythos is impressive but the framing that only a restricted frontier model can do this work is overstated others think the timing is suspicious. anthropic is reportedly considering an IPO as early as October 2026, and a high-profile government-adjacent cybersecurity initiative with blue-chip partners is exactly the kind of thing that makes an IPO narrative look incredible and then there's the irony everyone keeps bringing up: anthropic is announcing a model so powerful it found vulnerabilities in every major OS and browser, while their current model Opus 4.6 is being nerfed with reasoning effort set to 25 out of 100 and users are cancelling their Max plans left and right they found every vulnerability in the world's software but can't find the bugs in their own billing system whether Mythos is genuinely a watershed moment for cybersecurity or partially an IPO marketing play is something only time will tell
Om Patel tweet media
English
5
2
30
7.4K
kuki
kuki@kumxem·
If you’re 23+ and unmarried, wtf are you doing???
English
3.1K
273
5.2K
4.2M
Arav Patel
Arav Patel@aravpatel_·
@therealironix @Ravenismeee I remember playing this shit for hours and getting stuck on the venom battle. I have no idea how I remember that… core memory
English
0
0
1
11
Ironix | PLAYING anything atp
@aravpatel_ @Ravenismeee THIS SHIT WAS MY STUFF MAN Either this or the other games like Battle for NY. I don't remember how "good" the games were but I played lots of Spiderman 3, Enter Electro, and Battle for NY
English
1
0
1
57
Raven
Raven@Ravenismeee·
Without telling me your age… what was the very FIRST video game you ever played?
English
15.7K
310
9.2K
3.3M
Arav Patel
Arav Patel@aravpatel_·
@rishispuffs10 yea fair enough. I'm trying to organically grow my following so trying the different ways. Maybe just gotta keep being consistent and something will eventually pop off. We shall see
English
1
0
0
36
Rishi Shah
Rishi Shah@rishispuffs10·
@aravpatel_ Shit post and have your friends boost. If you have 10 people responding I think the algo prolly promotes you. (Idk just guessing)
English
1
0
3
25
Arav Patel
Arav Patel@aravpatel_·
How are people growing on X I’m trying to - post consistently - being a reply guy (😅) - posting insightful things - few shitposts here and there - occasionally posting value Doesn’t seem to be working for me though… What else I gotta do to grow more?
English
1
0
0
45
anita
anita@anitakirkovska·
remember langchain?
Français
99
4
170
26.4K
corbin
corbin@corbin_braun·
being in sf this summer is going to be a major key. the energy here is unmatched.
English
21
3
99
6.3K