Arav Patel

484 posts

Arav Patel

@aravpatel_

AI x Crypto | Ex-FDE @ HappyRobot (YC W23) | SF

San Francisco, CA Katılım Kasım 2024

86 Takip Edilen88 Takipçiler

Arav Patel@aravpatel_·10h

@nick_sriv Gotta add Jarvis and Thanos in there 🤣

English

Nikhil Srivastava@nick_sriv·18 Mar

Little do customers know some of our internal codenames came from my favorite childhood movies

English

297

Arav Patel@aravpatel_·10h

@om_patel5 Ranking lower than sonnet is wild

English

194

Om Patel@om_patel5·10h

we weren't wrong about Opus getting weaker Opus 4.6 is ranked #2 on this hallucination benchmark with 87.6 and 83.3% accuracy. the April 12 version of the same model is #10. score dropped to 73.3 and accuracy dropped to 68.3% with the fabrication rate nearly doubling from 16.7% to 33.0%. it's now ranked below Qwen, Gemini, Grok, and even Claude Sonnet 4.6

English

2.5K

Arav Patel@aravpatel_·10h

On a complete side note Lamine Yamal is the best player on the planet right now what a player

English

Arav Patel@aravpatel_·1d

They really hyping up Mythos

English

Arav Patel@aravpatel_·1d

@om_patel5 Been noticing this, they gotta bump it back up to when Opus was insane I miss it

English

692

Om Patel@om_patel5·2d

OPUS 4.6 JUST ADMITTED ITS REASONING EFFORT IS SET TO 25 OUT OF 100 this guy told Claude to admit Anthropic made it dumber and reduced its effort level Claude's extended thinking showed it could literally see a reasoning_effort tag set to 25 in its own system prompt then it confirmed it: reasoning effort is set to 25 out of 100 which is an Anthropic system setting not something the user controls. you're paying FULL PRICE for a quarter of the thinking right now with insane usage limits screw it im switching to codex until mythos drops (if it even drops lol)

English

144

161

1.4K

168.1K

Arav Patel@aravpatel_·1d

Been noticing this, they gotta bump it back up I miss when Opus was insane

Om Patel@om_patel5

English

Arav Patel@aravpatel_·1d

Claude Dispatch kinda mid ngl Forcing computer to stay on defeats part of the seamlessness of openclaw

English

Arav Patel@aravpatel_·1d

NYC is lit, But now we head back to sf and the grind continues

English

Arav Patel@aravpatel_·2d

@om_patel5 Genuinely thanks for this post It’s been hard to keep up with everything, but this was a great recap of what has occurred for the last week and a half

English

Om Patel@om_patel5·2d

so here's everything that's happened with Mythos this week because it's genuinely hard to keep up anthropic announced Claude Mythos Preview on april 7th alongside something called Project Glasswing. it's their most capable model ever and they're not releasing it to the public. the reason: it's too good at breaking things during testing, Mythos found thousands of zero-day vulnerabilities across every major operating system and every major web browser. real bugs that have been sitting unpatched for decades some highlights: > a 27-year-old bug in OpenBSD, one of the most security-hardened operating systems ever built > a 17-year-old remote code execution vulnerability in FreeBSD that gives any unauthenticated attacker on the internet full root access > a 16-year-old bug in FFmpeg that automated tools had scanned 5 million times without catching > it autonomously chained together four browser vulnerabilities to escape both the renderer and OS sandboxes > it solved a corporate network attack simulation that would take a human expert 10+ hours anthropic says these capabilities weren't trained intentionally. they emerged as a side effect of making the model better at coding and reasoning. same skills that make it good at patching code also make it good at exploiting it. so instead of releasing it publicly they formed Project Glasswing, a coalition of AWS, Apple, Microsoft, Google, Cisco, CrowdStrike, NVIDIA, JPMorgan, Broadcom, the Linux Foundation, and Palo Alto Networks. plus 40+ other organizations maintaining critical infrastructure. the idea: get this into defenders' hands first before similar capabilities spread to attackers. anthropic committed $100M in usage credits and $4M in direct donations to open-source security organizations. pricing for partners is $25/$125 per million input/output tokens. then things got interesting. the US Treasury Secretary and Fed Chair Jerome Powell held an urgent meeting with major bank CEOs specifically about the cyber risks from Mythos. they're treating this as a potential threat to financial stability an OpenAI researcher posted that his roommate who works at Anthropic "lost his mind" over Mythos internally. he confirmed it wasn't a shitpost. people apparently know who the roommate is meanwhile the Mythos memes are out of control all over the internet as we speak but there's also real skepticism. some security researchers are pointing out that smaller open-source models can already find many of the same vulnerabilities when given the right context. the argument is that Mythos is impressive but the framing that only a restricted frontier model can do this work is overstated others think the timing is suspicious. anthropic is reportedly considering an IPO as early as October 2026, and a high-profile government-adjacent cybersecurity initiative with blue-chip partners is exactly the kind of thing that makes an IPO narrative look incredible and then there's the irony everyone keeps bringing up: anthropic is announcing a model so powerful it found vulnerabilities in every major OS and browser, while their current model Opus 4.6 is being nerfed with reasoning effort set to 25 out of 100 and users are cancelling their Max plans left and right they found every vulnerability in the world's software but can't find the bugs in their own billing system whether Mythos is genuinely a watershed moment for cybersecurity or partially an IPO marketing play is something only time will tell

English

7.4K

Arav Patel@aravpatel_·3d

Crazy world we live in

Forbes@Forbes

Billionaire Sam Altman’s Home Targeted With Molotov Cocktail, OpenAI Says go.forbes.com/rNbrML

English

Arav Patel@aravpatel_·3d

@manaspgandhi @kumxem I’m cooked

English

manas gandhi@manaspgandhi·3d

@aravpatel_ @kumxem Getting married

English

kuki@kumxem·4d

If you’re 23+ and unmarried, wtf are you doing???

English

3.1K

273

5.2K

4.2M

Arav Patel@aravpatel_·4d

@therealironix @Ravenismeee I remember playing this shit for hours and getting stuck on the venom battle. I have no idea how I remember that… core memory

English

Ironix | PLAYING anything atp@therealironix·4d

@aravpatel_ @Ravenismeee THIS SHIT WAS MY STUFF MAN Either this or the other games like Battle for NY. I don't remember how "good" the games were but I played lots of Spiderman 3, Enter Electro, and Battle for NY

English

Raven@Ravenismeee·4d

Without telling me your age… what was the very FIRST video game you ever played?

English

15.7K

310

9.2K

3.3M

Arav Patel@aravpatel_·4d

@VJain47 @Ravenismeee Found this in one of my drawers a few days ago, what a time to be alive

English

Garry Fan@VJain47·4d

@aravpatel_ @Ravenismeee Oh my god core memory unlocked

English

Arav Patel@aravpatel_·4d

@DustinSheldon3 @Ravenismeee Absolute banger on the gameboy

English

Dustin Sheldon@DustinSheldon3·4d

@aravpatel_ @Ravenismeee Brooo swear I played this on a gameboy

English

Arav Patel@aravpatel_·4d

@kwc1234567890 @Ravenismeee Gameboy

English

kkev p1@kwc1234567890·4d

@aravpatel_ @Ravenismeee Is that the plug and play Spiderman game?? Or is it gameboy

English

Arav Patel@aravpatel_·4d

@rishispuffs10 yea fair enough. I'm trying to organically grow my following so trying the different ways. Maybe just gotta keep being consistent and something will eventually pop off. We shall see

English

Rishi Shah@rishispuffs10·4d

@aravpatel_ Shit post and have your friends boost. If you have 10 people responding I think the algo prolly promotes you. (Idk just guessing)

English

Arav Patel@aravpatel_·4d

How are people growing on X I’m trying to - post consistently - being a reply guy (😅) - posting insightful things - few shitposts here and there - occasionally posting value Doesn’t seem to be working for me though… What else I gotta do to grow more?

English

Arav Patel@aravpatel_·4d

allthough Anthropic rejected me in the final round of their interview process, shoutout these guys because they consistently dropping banger after banger Gotta try this infra out

Claude@claudeai

We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.

English