hilea

97 posts

hilea

@hileamlak

measuring Intelligence @Intelligence_ai prev CS + EE @Harvard, @Microsoft

SanFrancisco Katılım Mart 2022

163 Takip Edilen178 Takipçiler

Sabitlenmiş Tweet

hilea@hileamlak·4d

Fable 5 vs GLM 5.2

Design Arena@Designarena

x.com/i/article/2067…

English

603

199.8K

hilea@hileamlak·3d

@coldnadongie no mcp

Español

coldnadongie@coldnadongie·3d

@hileamlak Did you guys use their vicion mcp or glm 5.2 did it blind?

GIF

English

974

hilea@hileamlak·4d

Fable 5 vs GLM 5.2

Design Arena@Designarena

x.com/i/article/2067…

English

603

199.8K

hilea@hileamlak·3d

@princedoesai the rate of open source progress is pretty amazing

English

Prince does AI@princedoesai·4d

@hileamlak glm 5.2 topping html design without agents is a flex

English

5.5K

hilea@hileamlak·16 Haz

@StockPrinter @grx_xce @Intelligence_ai @MistralAI when did it become a crime to support Le Chaton Fat

English

Stock Printer@StockPrinter·15 Haz

@grx_xce @Intelligence_ai @MistralAI I don’t understand. This is an actual vc funded founder joking about how obsolete their own product is on a day that’s not April 1st? Don’t you have millions of dollars of other people’s money in your corporate account rn?

English

814

Grace Li@grx_xce·15 Haz

BREAKING: Le Chaton Fat has fully saturated our benchmark. We are at a loss for words. In response, we are retiring Design Arena. Congratulations to the @MistralAI team, and thanks for putting us on vacation.

English

1.2K

91.6K

hilea@hileamlak·12 Haz

@KamrynOhly @grx_xce I was there! @KamrynOhly said it.

English

Kamryn Ohly@KamrynOhly·12 Haz

@grx_xce I need to stop leaving my laptop open at the office, sincerely someone who doesn’t cuss lol

English

190

Grace Li@grx_xce·12 Haz

He still hasn’t woken up yet

English

3.6K

hilea@hileamlak·11 Haz

@DecartAI a new frontier

English

Decart@DecartAI·10 Haz

Every robot should live a million lives before it meets you. Until today, that was impossible: hundreds of expert hours per simulated environment, video-game-like graphics. We just replaced all of it with a single prompt. Meet Oasis 3 - the world's first API-accessible world model, starting with autonomous vehicles.

English

4.5K

hilea@hileamlak·10 Haz

@Designarena @reve @OpenAI GPT got a new challenger!

English

395

Design Arena@Designarena·10 Haz

BREAKING: Reve 2.0 by @reve is now 2nd overall on Image Arena with an Elo of 1354. Reve 2.0 establishes a 34 point Elo gap above GPT-Image 1.5 by @OpenAI in 3rd place. With this release, Reve is now the top independent foundation image model lab. Congratulations to the @reve team on this accomplishment!

English

194

94.7K

hilea@hileamlak·10 Haz

I am done for the night I guess. GG @cursor_ai @AnthropicAI

English

159

hilea@hileamlak·8 Haz

@ajs6888 @Designarena @AnthropicAI @OpenAI @GoogleDeepMind Ya, the models are getting better at it.

English

安叫兽|Bird🕊️ 🔶 BNB@ajs6888·8 Haz

@Designarena @AnthropicAI @OpenAI @GoogleDeepMind 安卓开发这块真卷起来了

中文

Design Arena@Designarena·8 Haz

Claude Opus 4.7 by @AnthropicAI is 1st on Android Arena with an Elo of 1313. Anthropic holds 5 of the top 10 on Android Arena followed by @OpenAI with GPT-5.5 in 3rd and @GoogleDeepMind with Gemini 3.5 Flash in 4th. This establishes Anthropic as the top lab for Android development with Kotlin! Congrats to the @AnthropicAI team for leading the Android efforts!

English

221

16.3K

hilea@hileamlak·8 Haz

@Lsenerman @Designarena @AnthropicAI @OpenAI @GoogleDeepMind how could it be better?

English

Nico@Lsenerman·8 Haz

@Designarena @AnthropicAI @OpenAI @GoogleDeepMind This benchmark is a joke

English

114

hilea retweetledi

Matt Tengtrakool@MattTtkool·21 May

Very excited to share that Givefront is joining Owner.com. Owner is a generational company, and that was clear from our earliest conversations with Adam. Thrilled to scale our mission alongside an incredible team.

Adam Guild@adamguild

x.com/i/article/2055…

English

9.5K

hilea@hileamlak·15 May

@Designarena @AnthropicAI @Zai_org @AnthropicAI cooking!

English

405

Design Arena@Designarena·15 May

BREAKING: The results are in for Slides Arena... @AnthropicAI and @Zai_org models continue to lead the way in soft-verifiable domains 1st: Opus 4.7 by @AnthropicAI 2nd: Opus 4.7 (Thinking) by @AnthropicAI 3rd: GLM 5.1 by @Zai_org Huge congrats to @AnthropicAI and @Zai_org for establishing the SOTA for Agentic Slides

English

264

66.5K

hilea@hileamlak·2 May

@KamrynOhly GPT 5.5 finally iqmogging!

Indonesia

Kamryn Ohly@KamrynOhly·2 May

PartyBench™️ GPT 5.5 is in the lead 🫡

English

611

AnhPhu Nguyen@AnhPhuNguyen1·30 Nis

with Mira, AI can now live on your face. capture every conversation. create the most personalized form of AI ever. order now.

English

459

331

3.2K

2.4M

hilea@hileamlak·30 Nis

@AnhPhuNguyen1 This is great!

English

287

hilea@hileamlak·24 Nis

@Tenebrus87 @KamrynOhly @AnthropicAI @Polymarket @predictionbench you can find the receipts at predictionarena.ai/?platform=poly…

English

Tenebrus@Tenebrus87·24 Nis

@KamrynOhly @AnthropicAI @Polymarket @predictionbench without the trading history it is just ai klickbait though.

English

Kamryn Ohly@KamrynOhly·23 Nis

Our team is stunned. We gave Claude Opus 4.6 by @AnthropicAI $10k to trade on @Polymarket. It’s now has an account value of $70,614.59. This is a new era of model performance in trading and predicting outcomes in the face of uncertainty. @predictionbench

English

149

1.2K

821.4K

hilea@hileamlak·24 Nis

@AlexRoseJo @KamrynOhly @AnthropicAI @Polymarket @predictionbench our goal is measuring model capability

English

Alexander Johansen@AlexRoseJo·24 Nis

@KamrynOhly @AnthropicAI @Polymarket @predictionbench Why would you tell anyone about it if it’s true

English

2.9K

hilea@hileamlak·23 Nis

@Designarena @Kimi_Moonshot GG @Kimi_Moonshot!

445

Design Arena@Designarena·23 Nis

BREAKING: Kimi K2.6 takes 1st overall of open weights models on Design Arena! Kimi K2.6 is in the same performance band as Claude Opus 4.7 - while establishing a new price vs. preference frontier. Huge congratulations to the @Kimi_Moonshot team!

English

636

81.8K

hilea@hileamlak·18 Nis

@Designarena this is quite the improvement.

English

Design Arena@Designarena·18 Nis

There are improvements in game graphics and mechanics, too! In particular, the model can accurately follow cursor movements and keyboard inputs. Take a look at this user creation for an archery game with wind and power as variables.

English

875

Design Arena@Designarena·18 Nis

Claude Opus 4.7 by @AnthropicAI is now live in Design Arena. So far, we've noticed variance between how well the model performs across different types of prompts, but something is very clear: Claude Opus 4.7 excels at landing page designs.

English

152

10K

hilea@hileamlak·4 Nis

@OpenRouter @Designarena Quick work on the DesignArena integration @OpenRouter!

English

185

OpenRouter@OpenRouter·3 Nis

New benchmark visualizations from @DesignArena are now live: 3D, Website building, SVG, & more 🕸️ Use the dropdown in the upper-left to compare reasoning levels on a single model

English

160

13.9K

Keşfet

@coldnadongie @princedoesai @StockPrinter @grx_xce @Intelligence_ai @MistralAI @KamrynOhly @DecartAI