Sabitlenmiş Tweet
hilea
97 posts

hilea
@hileamlak
measuring Intelligence @Intelligence_ai prev CS + EE @Harvard, @Microsoft
SanFrancisco Katılım Mart 2022
163 Takip Edilen178 Takipçiler


@hileamlak glm 5.2 topping html design without agents is a flex
English

@StockPrinter @grx_xce @Intelligence_ai @MistralAI when did it become a crime to support Le Chaton Fat
English

@grx_xce @Intelligence_ai @MistralAI I don’t understand. This is an actual vc funded founder joking about how obsolete their own product is on a day that’s not April 1st?
Don’t you have millions of dollars of other people’s money in your corporate account rn?
English

BREAKING: Le Chaton Fat has fully saturated our benchmark.
We are at a loss for words.
In response, we are retiring Design Arena.
Congratulations to the @MistralAI team, and thanks for putting us on vacation.

English

@grx_xce I need to stop leaving my laptop open at the office, sincerely someone who doesn’t cuss lol

English

Every robot should live a million lives before it meets you. Until today, that was impossible: hundreds of expert hours per simulated environment, video-game-like graphics.
We just replaced all of it with a single prompt.
Meet Oasis 3 - the world's first API-accessible world model, starting with autonomous vehicles.
English

BREAKING: Reve 2.0 by @reve is now 2nd overall on Image Arena with an Elo of 1354.
Reve 2.0 establishes a 34 point Elo gap above GPT-Image 1.5 by @OpenAI in 3rd place.
With this release, Reve is now the top independent foundation image model lab.
Congratulations to the @reve team on this accomplishment!

English

Claude Opus 4.7 by @AnthropicAI is 1st on Android Arena with an Elo of 1313.
Anthropic holds 5 of the top 10 on Android Arena followed by @OpenAI with GPT-5.5 in 3rd and @GoogleDeepMind with Gemini 3.5 Flash in 4th. This establishes Anthropic as the top lab for Android development with Kotlin!
Congrats to the @AnthropicAI team for leading the Android efforts!

English
hilea retweetledi

Very excited to share that Givefront is joining Owner.com.
Owner is a generational company, and that was clear from our earliest conversations with Adam. Thrilled to scale our mission alongside an incredible team.
Adam Guild@adamguild
English

BREAKING: The results are in for Slides Arena... @AnthropicAI and @Zai_org models continue to lead the way in soft-verifiable domains
1st: Opus 4.7 by @AnthropicAI
2nd: Opus 4.7 (Thinking) by @AnthropicAI
3rd: GLM 5.1 by @Zai_org
Huge congrats to @AnthropicAI and @Zai_org for establishing the SOTA for Agentic Slides

English

@Tenebrus87 @KamrynOhly @AnthropicAI @Polymarket @predictionbench you can find the receipts at predictionarena.ai/?platform=poly…
English

@KamrynOhly @AnthropicAI @Polymarket @predictionbench without the trading history it is just ai klickbait though.
English

Our team is stunned.
We gave Claude Opus 4.6 by @AnthropicAI $10k to trade on @Polymarket.
It’s now has an account value of $70,614.59.
This is a new era of model performance in trading and predicting outcomes in the face of uncertainty.
@predictionbench

English

@AlexRoseJo @KamrynOhly @AnthropicAI @Polymarket @predictionbench our goal is measuring model capability
English

@KamrynOhly @AnthropicAI @Polymarket @predictionbench Why would you tell anyone about it if it’s true
English

BREAKING: Kimi K2.6 takes 1st overall of open weights models on Design Arena!
Kimi K2.6 is in the same performance band as Claude Opus 4.7 - while establishing a new price vs. preference frontier.
Huge congratulations to the @Kimi_Moonshot team!

English

Claude Opus 4.7 by @AnthropicAI is now live in Design Arena.
So far, we've noticed variance between how well the model performs across different types of prompts, but something is very clear:
Claude Opus 4.7 excels at landing page designs.
English

New benchmark visualizations from @DesignArena are now live: 3D, Website building, SVG, & more 🕸️
Use the dropdown in the upper-left to compare reasoning levels on a single model

English







