Brandon Hudgens

28 posts

Brandon Hudgens

@agentengineer

Owner/CEO of Agentic Solutions | ML/AI Engineer | I like to build things | 2006 Time Magazine's Person of the Year

Katılım Haziran 2025

91 Takip Edilen3 Takipçiler

Brandon Hudgens retweetledi

Russillo@ryenarussillo·28 Tem

This is what we mean when we say the league has never been as deep as it is right now.

Tate Frazier@tatefrazier

Love looking back at NBA team photos, incredible characters on the ‘93-94 Heat

English

152

1.4K

441.9K

Brandon Hudgens@agentengineer·24 Tem

Indeed. Always a scam.

Mark Gurman@markgurman

These plans are a pure gold mine for Apple. The vast majority of people will pay $240 a year for the next several years and never use it. That’s why insurance is a great business!

English

Brandon Hudgens@agentengineer·20 Tem

Very excited for this

Mark Gurman@markgurman

Power On: Apple’s first foldable iPhone is arriving next year and it will be its most un-Apple like launch yet. Here’s why — bloomberg.com/news/newslette…

English

Brandon Hudgens retweetledi

Mark Gurman@markgurman·18 Tem

Yet still no Instagram for iPad 🤣

English

492

57.5K

Brandon Hudgens retweetledi

Theo - t3.gg@theo·16 Tem

Okay wtf is going on

TBPN@tbpn

BREAKING: Claude Code PMs Boris Cherny and Cat Wu have returned to Anthropic after a brief stint at Cursor.

English

100

1.8K

222.4K

Brandon Hudgens retweetledi

Greg Kamradt@GregKamradt·10 Tem

We got a call from @xai 24 hours ago “We want to test Grok 4 on ARC-AGI” We heard the rumors. We knew it would be good. We didn’t know it would become the #1 public model on ARC-AGI Here’s the testing story and what the results mean: Yesterday, we chatted with Jimmy from the xAI team, who wanted us to validate their Grok 4 score. They did their own testing on the ARC-AGI-1 & 2 public evaluation set To validate their score (and measure possible overfitting), we self-tested the new model on our semi-private evaluation set We walked them through our testing policy: * No data retention * Model checkpoint must be intended for public use * Temporary increase in rate limits for burst testing They were on board, so we got started Initially, we ran into timeout errors with normal requests, so we switched to streaming. That resolved the issue So, what do these results mean? First, the facts: Grok 4 is now the top-performing publicly available model on ARC-AGI. This even outperforms purpose-built solutions submitted on Kaggle. Second, ARC-AGI-2 is hard for current AI models. To score well, models have to learn a mini-skill from a series of training examples, then demonstrate that skill at test time. The previous top score was ~8% (by Opus 4). Below 10% is noisy Getting 15.9% breaks through that noise barrier, Grok 4 is showing non-zero levels of fluid intelligence But the mission isn’t over. We need new ideas to solve ARC-AGI-2. Scale alone won’t get us there Come work on ARC-AGI with us

ARC Prize@arcprize

Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9% This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA

English

290

787

7.1K

14.7M

Brandon Hudgens@agentengineer·10 Tem

Where's the API?! You said it was live! @elonmusk

English

Brandon Hudgens@agentengineer·10 Tem

I knew it. The worst.

Theo - t3.gg@theo

Surprise! Grok 4 is not dropping on the API today. I'm sure it will happen in a few months...

English

Brandon Hudgens retweetledi

Theo - t3.gg@theo·10 Tem

Surprise! Grok 4 is not dropping on the API today. I'm sure it will happen in a few months...

English

218

97.1K

Brandon Hudgens@agentengineer·10 Tem

For anyone interested in AI, I can't recommend @natebjones YT channel enough. A refreshing voice in a forest of ill-informed channels

English

Brandon Hudgens@agentengineer·8 Tem

@markgurman How does @theo feel about this

English

Mark Gurman@markgurman·7 Tem

Apple dramatically turned down the Liquid Glass. I don’t like the change at all.

Mark Gurman@markgurman

iOS 26 beta 3 and other new betas are now live.

English

365

243

7.5K

1.5M

Brandon Hudgens retweetledi

Mark Gurman@markgurman·7 Tem

It is incredible that Apple design decisions developed over multiple years can be influenced by a week of Twitter and YouTube commentary.

English

323

307

6.6K

480.4K

Brandon Hudgens retweetledi

Russillo@ryenarussillo·7 Tem

Had anyone ever done this before? Take two weeks off, then come back to work?

Derek Thompson@DKThomp

perfection

English

183

161

5.9K

771.6K

Brandon Hudgens retweetledi

Theo - t3.gg@theo·8 Tem

I've used Pocket for managing my "read later" list for over 15 years. It shuts down today. RIP to a real one.

English

104

1.4K

122.1K

Brandon Hudgens retweetledi

allen institute@AllenInstitute·4 Tem

The fireworks in your mind. 🧠✨ This sparkling video shows the neurotransmitter glutamate being released into synapses, made possible by an indicator developed by @abhi_aggarwal1, @PodgorskiLab, and team. #HappyNewYear #NYE

English

210

687

71.6K

Brandon Hudgens retweetledi

Simon Willison@simonw·3 Tem

Quitting programming as a career right now because of LLMs would be like quitting carpentry as a career thanks to the invention of the table saw.

English

356

1.2K

12.1K

778.9K

Brandon Hudgens retweetledi

Chris@Chrisgpt·4 Tem

Now the only question is did they achieve these with cons@n or no..

English

2.7K

Brandon Hudgens retweetledi

Ash@AshsVerse·4 Tem

Grok-4 benchmark leak just dropped. -HLE: 35 → 45 w/ reasoning -GPQA: 87 → 88 - AIME’25: 95 - SWEBench (Code model): 72 → 75 If validated, Grok-4 is flirting with Claude Opus territory. Release looks imminent. xAI is officially in the frontier model race.

Legit@legit_api

Grok-4 and Grok-4 Code on benchmarks - 35% on HLE, 45% with reasoning!! - 87-88% on GPQA - 72-75% on SWE Bench (Grok 4 Code)

English

198

Brandon Hudgens retweetledi