hilea

97 posts

hilea banner
hilea

hilea

@hileamlak

measuring Intelligence @Intelligence_ai prev CS + EE @Harvard, @Microsoft

SanFrancisco Katılım Mart 2022
163 Takip Edilen178 Takipçiler
coldnadongie
coldnadongie@coldnadongie·
@hileamlak Did you guys use their vicion mcp or glm 5.2 did it blind?
GIF
English
1
0
3
974
hilea
hilea@hileamlak·
@princedoesai the rate of open source progress is pretty amazing
English
0
0
9
4K
Stock Printer
Stock Printer@StockPrinter·
@grx_xce @Intelligence_ai @MistralAI I don’t understand. This is an actual vc funded founder joking about how obsolete their own product is on a day that’s not April 1st? Don’t you have millions of dollars of other people’s money in your corporate account rn?
English
1
0
2
814
Grace Li
Grace Li@grx_xce·
BREAKING: Le Chaton Fat has fully saturated our benchmark. We are at a loss for words. In response, we are retiring Design Arena. Congratulations to the @MistralAI team, and thanks for putting us on vacation.
Grace Li tweet media
English
46
55
1.2K
91.6K
Kamryn Ohly
Kamryn Ohly@KamrynOhly·
@grx_xce I need to stop leaving my laptop open at the office, sincerely someone who doesn’t cuss lol
Kamryn Ohly tweet media
English
1
0
6
190
Grace Li
Grace Li@grx_xce·
He still hasn’t woken up yet
Grace Li tweet media
English
4
0
30
3.6K
Decart
Decart@DecartAI·
Every robot should live a million lives before it meets you. Until today, that was impossible: hundreds of expert hours per simulated environment, video-game-like graphics. We just replaced all of it with a single prompt. Meet Oasis 3 - the world's first API-accessible world model, starting with autonomous vehicles.
English
7
7
54
4.5K
Design Arena
Design Arena@Designarena·
BREAKING: Reve 2.0 by @reve is now 2nd overall on Image Arena with an Elo of 1354. Reve 2.0 establishes a 34 point Elo gap above GPT-Image 1.5 by @OpenAI in 3rd place. With this release, Reve is now the top independent foundation image model lab. Congratulations to the @reve team on this accomplishment!
Design Arena tweet media
English
10
34
194
94.7K
Design Arena
Design Arena@Designarena·
Claude Opus 4.7 by @AnthropicAI is 1st on Android Arena with an Elo of 1313. Anthropic holds 5 of the top 10 on Android Arena followed by @OpenAI with GPT-5.5 in 3rd and @GoogleDeepMind with Gemini 3.5 Flash in 4th. This establishes Anthropic as the top lab for Android development with Kotlin! Congrats to the @AnthropicAI team for leading the Android efforts!
Design Arena tweet media
English
15
12
221
16.3K
Kamryn Ohly
Kamryn Ohly@KamrynOhly·
PartyBench™️ GPT 5.5 is in the lead 🫡
Kamryn Ohly tweet media
English
1
0
15
611
AnhPhu Nguyen
AnhPhu Nguyen@AnhPhuNguyen1·
with Mira, AI can now live on your face. capture every conversation. create the most personalized form of AI ever. order now.
English
459
331
3.2K
2.4M
Kamryn Ohly
Kamryn Ohly@KamrynOhly·
Our team is stunned. We gave Claude Opus 4.6 by @AnthropicAI $10k to trade on @Polymarket. It’s now has an account value of $70,614.59. This is a new era of model performance in trading and predicting outcomes in the face of uncertainty. @predictionbench
Kamryn Ohly tweet media
English
149
52
1.2K
821.4K
Design Arena
Design Arena@Designarena·
BREAKING: Kimi K2.6 takes 1st overall of open weights models on Design Arena! Kimi K2.6 is in the same performance band as Claude Opus 4.7 - while establishing a new price vs. preference frontier. Huge congratulations to the @Kimi_Moonshot team!
Design Arena tweet media
English
23
53
636
81.8K
Design Arena
Design Arena@Designarena·
There are improvements in game graphics and mechanics, too! In particular, the model can accurately follow cursor movements and keyboard inputs. Take a look at this user creation for an archery game with wind and power as variables.
English
2
1
8
875
Design Arena
Design Arena@Designarena·
Claude Opus 4.7 by @AnthropicAI is now live in Design Arena. So far, we've noticed variance between how well the model performs across different types of prompts, but something is very clear: Claude Opus 4.7 excels at landing page designs.
English
12
3
152
10K
OpenRouter
OpenRouter@OpenRouter·
New benchmark visualizations from @DesignArena are now live: 3D, Website building, SVG, & more 🕸️ Use the dropdown in the upper-left to compare reasoning levels on a single model
OpenRouter tweet media
English
4
14
160
13.9K