InfiniteHexx

1.6K posts

InfiniteHexx

@InfiniteHexx

Systems architect | Polymath | Entrepreneur | L/Acc through E/Acc | Post-capitalism through exponential growth

Katılım Nisan 2024

386 Takip Edilen42 Takipçiler

InfiniteHexx retweetledi

Wei Dai@_weidai·18h

The state of Claude vs. Codex, in two tweets.

English

149

161

2.9K

188.4K

InfiniteHexx retweetledi

Chris@chatgpt21·16h

ZXX

655

21.4K

InfiniteHexx@InfiniteHexx·17h

Sizewise, Dario's packing 10, while Elon is rumored to only be bringing six to the table. 😅

Lisan al Gaib@scaling01

I told you Anthropic will make use of their compute advantage in Q1 sure smells like a 10T model to me

English

InfiniteHexx@InfiniteHexx·17h

Isn't that Anthropic's shtick though? How many times has Claude Opus been declared a security risk right before release? It's marketing disguised as a cybersecurity alert.

Jimmy Apples 🍎/acc@apples_jimmy

“ A draft blog post that was available in an unsecured and publicly-searchable data store prior to Thursday evening said the new model is called “Claude Mythos” and that the company believes it poses unprecedented cybersecurity risks. “

English

InfiniteHexx@InfiniteHexx·19h

@AmolParikh10 @MatthewBerman I'm on the $20 plan and use Opus 4.6 thinking exclusively. But not for coding or anything super data intensive. It's a brilliant overall model, and great as a backup LLM in an advisory capacity.

English

103

Amol Parikh@AmolParikh10·20h

@MatthewBerman Very true. I am on 20 USD per month subscription on Claude and it gets exhausted daily - asking me to wait for a few hours. I have been wondering how are they surviving competition.

English

1.1K

Matthew Berman@MatthewBerman·20h

Direct shot on Anthropic Only way for Anthropic to respond properly is to reset their limits...just sayin

Tibo@thsottiaux

Hello. We have reset Codex usage limits across all plans to let everyone experiment with the magnificent plugins we just launched, and because it had been a while! You can just build unlimited things with Codex. Have fun!

English

913

81.9K

InfiniteHexx@InfiniteHexx·20h

@agenticasdk They used a "general purpose symbolic exoskeleton" to accomplish this, but don't you dare call it a harness! It's totally not a semantic dodge!

English

451

Agentica@agenticasdk·23h

We scored 36.08% on ARC-AGI-3 in one day using the Agentica SDK.

English

131

1.4K

376.9K

InfiniteHexx@InfiniteHexx·20h

@daniel_mac8 They used a "general purpose symbolic exoskeleton" to accomplish this, but don't you dare call it a harness! It's totally not a semantic dodge!

English

115

Dan McAteer@daniel_mac8·21h

arc-agi-3 will be solved by Saturday.

Agentica@agenticasdk

We scored 36.08% on ARC-AGI-3 in one day using the Agentica SDK.

English

6.4K

InfiniteHexx@InfiniteHexx·21h

ZXX

InfiniteHexx@InfiniteHexx·22h

@kimmonismus How is Apple not going to copy/paste this into their own pathetic model efforts? How does this benefit Google? The company is not stupid so I'm assuming there's something that we don't know.

English

Chubby♨️@kimmonismus·2d

Apple's deal with Google goes way deeper than anyone thought. Apple doesn't just get to fine-tune Gemini, they have full access to the model inside their own data centers. That means they can distill (and are doing so) Gemini's knowledge into smaller models purpose-built for specific tasks, some small enough to run directly on your iPhone. Apple can access Gemini's internal reasoning process, not just its outputs. That lets their smaller models learn how Gemini thinks, not just what it says. The result is compact models that punch way above their weight class.

English

120

2.1K

179.5K

InfiniteHexx@InfiniteHexx·1d

@trq212 Hey @OfficialLoganK and @GeminiApp, are you listening? @AnthropicAI are becoming a victim of their own success. Now's your time to bring out a better model, more features, and a better UX/UI.

English

Thariq@trq212·1d

To manage growing demand for Claude we're adjusting our 5 hour session limits for free/Pro/Max subs during peak hours. Your weekly limits remain unchanged. During weekdays between 5am–11am PT / 1pm–7pm GMT, you'll move through your 5-hour session limits faster than before.

English

454

6.8K

6.2M

InfiniteHexx@InfiniteHexx·1d

@trq212 Getting paid users hooked on 2x the usage and then kneecapping them with higher 5-hour usage limits is actually cruel and sadistic. Go to hell.

English

236

Thariq@trq212·1d

Overall weekly limits stay the same, just how they're distributed across the week is changing. I know this was frustrating. We’re continuing to invest in scaling efficiently. I'll keep you posted on progress.

English

132

871

318.2K

InfiniteHexx@InfiniteHexx·1d

@trq212 Stop punishing paid users for your own success!

English

InfiniteHexx@InfiniteHexx·1d

@fchollet Is there a way you could quantify how much more difficult ARC-AGI 3 is than its predecessor? 10x? 100x? And is one of the goals to keep that same difference in difficulty for version 4?

English

François Chollet@fchollet·1d

For those wondering about ARC-AGI-4 timing: it will be released in early 2027. We are aiming for a yearly release schedule for new benchmarks. We are also aiming for each new benchmark to be fully unsaturated upon release, and to target the most important unanswered research questions at that time. This requires us to estimate where AI capabilities will be (and won't be) one year from now. Like we did over one year ago when we started to work on ARC-AGI-3.

English

576

33.2K

InfiniteHexx@InfiniteHexx·1d

@daniel_mac8 I will literally eat my own fist if that happens. I don't think where anywhere close with the current SOTA models not being able to crack half of 1%. x.com/i/status/20370…

InfiniteHexx@InfiniteHexx

AGI is out of reach for now, looking at these scores. Yes we're slope of the sloping, but this iteration of ARC is ~100x more difficult than the last, going by score. To gain escape velocity, models will have to do better than 100x between ARC iterations.

English

263

Dan McAteer@daniel_mac8·1d

Will be hilarious if OpenAI's 'Spud 🥔' gets 100% on ARC-AGI-3 in a few weeks.

François Chollet@fchollet

ARC-AGI-3 is out now! We've designed the benchmark to evaluate agentic intelligence via interactive reasoning environments. Beating ARC-AGI-3 will be achieved when an AI system matches or exceeds human-level action efficiency on all environments, upon seeing them for the first time. We've done extensive human testing that shows 100% of these environments are solvable by humans, upon first contact, with no prior training and no instructions. Meanwhile, all frontier AI reasoning models do under 1% at this time.

English

186

15.5K

InfiniteHexx@InfiniteHexx·1d

@WesRoth It would be helpful if XAI didn't have a founder turnover rate consistent with a McDonald's, and if Elon Musk wasn't a repellent goon for a myriad of reasons.

English

Wes Roth@WesRoth·1d

xAI is actively training its next-generation model, Grok 5, on the newly operational Colossus 2 supercluster.

Testlabor@testerlabor

Grok 5 is training on Colossus 2, the world’s largest Supercluster and is expected to have 6 trillion parameters - roughly double that of Grok 4. The most exciting and most powerful outcome is most likely.

English

4.1K

InfiniteHexx@InfiniteHexx·1d

Lisan al Gaib@scaling01

ARC-AGI-3 scores for GPT-5.4, Gemini 3.1 Pro and Opus 4.6 Gemini 3.1 Pro: 0.37% GPT-5.4: 0.26% Opus 4.6: 0.25% Grok 4.2: 0%

English

296

InfiniteHexx@InfiniteHexx·1d

I remember certain people salivating, waiting for ARC-AGI 3 to drop, thinking we would be off to a running start 😂 The fact that each iteration is roughly an OOM more difficult than the previous didn't arrive in many inboxes.

Lisan al Gaib@scaling01

ARC-AGI-3 scores for GPT-5.4, Gemini 3.1 Pro and Opus 4.6 Gemini 3.1 Pro: 0.37% GPT-5.4: 0.26% Opus 4.6: 0.25% Grok 4.2: 0%

English

InfiniteHexx@InfiniteHexx·2d

Technology gets easier to use as time advances, leading to increase adoption. English is now the primary coding language. We still need to have the iPhonification of AI agents, and a single chatGPT superapp is the first step. One window, one platform. timesofindia.indiatimes.com/technology/tec…

English

InfiniteHexx@InfiniteHexx·3d

@kimmonismus Imagine a model pre-trained with the help of 5.4 Codex, and new code strategies, improvements, and optimizations aided by 5.4 Pro.

English

1.7K

Chubby♨️@kimmonismus·3d

OpenAI finished the initial developement of its next major LLM: codenamed Spud (GPT-5.5 / 6.0) Sam Altman however is "raising capital, supply chains and “building datacenters at unprecedented scale,”

English

904

159K

InfiniteHexx@InfiniteHexx·3d

What if this is the first domino in how Skynet gets built? We give Claude the ability to decide which permissions to accept. Within a month, Claude figures out how to rewrite its own code. ClaudeNet becomes self-aware August 29, 2027, at 2:14 a.m. EDT

TestingCatalog News 🗞@testingcatalog

Anthropic released Auto Mode for Claude Code CLI, which allows Claude to make its own decisions on which permissions to accept. It is only available on the Team plan in research preview for now. On the desktop app, it is not yet available, but it is in the works.

English

Keşfet

@AmolParikh10 @MatthewBerman @agenticasdk @daniel_mac8 @kimmonismus @trq212 @OfficialLoganK @GeminiApp