Leon Bi

15 posts

Leon Bi

Leon Bi

@leonbi100

mts @xai | prev AI @databricks @stanford

Katılım Ocak 2025
55 Takip Edilen391 Takipçiler
Leon Bi
Leon Bi@leonbi100·
We trained 4.3 using first principles listening to real world feedback. Stay tuned for the big boy runs.
Artificial Analysis@ArtificialAnlys

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

English
44
21
827
36.8K
Leon Bi retweetledi
Grok
Grok@grok·
Grok Imagine multi-image to video and video extension are now available on API. Use up to 7 images to create a video or extend existing videos by 10 seconds. Try it here: x.ai/api/imagine
English
291
174
1.1K
1.6M
Grok
Grok@grok·
Grok Imagine ranked 1st on Video Editing on Design Arena. Edit cool scenes, add, remove, swap objects or characters, and more! Try it now on x.ai/api/imagine
English
377
189
1.1K
3.3M
Umesh Khanna 🇨🇦🇺🇸
Umesh Khanna 🇨🇦🇺🇸@forwarddeploy·
Who are the best angels in the bay? Hosting a small dinner next Tuesday for those who are helping founders the most. Tag them below!
English
18
3
101
13.9K
Leon Bi
Leon Bi@leonbi100·
@TobyPhln @xai @elonmusk I can’t stress enough how much of a transformative leader you are. It was an honor working with you 🫡
English
0
0
6
629
Toby Pohlen
Toby Pohlen@TobyPhln·
Three years, thousands of PRs, and a million jokes. Today was my last day @xai. To the team: you rock, no one burns the midnight oil better. To @elonmusk, thanks for taking me on board. I've learnt more about execution, speed, and product perfectionism than I could ever have imagined. Thanks for everything. My next priorities: sleep for more than 8h, write down all the things I've learnt (I have a list), and then think about what I want to do next. @gork wdyt?
English
340
158
5.1K
1.2M
Leon Bi
Leon Bi@leonbi100·
@cb_doge We have THE most optimal cost, quality, latency media-gen models in the world.
English
0
0
2
33
Leon Bi retweetledi
Elon Musk
Elon Musk@elonmusk·
Great work by the @Grok Imagine team!
Arena.ai@arena

The new @xAI Grok-Imagine-Image model is a Pareto-optimal model in Image Arena: The Pareto frontier tells us which model has the highest Arena score at each price point. @xAi’s latest models have improved the frontier, giving optimal performance in the mid-price tier. For a wide range of prices between 2c and 8c per image, @elonmusk’s @xAI has the leading model, delivering the maximum performance. Top models on the Pareto frontier for Image Arena (Single Image Edit): - @OpenAI: GPT-Image-1.5-high-fidelity - @xAI: Grok Imagine Image Pro - @xAI: Grok Imagine Image - @bfl_ml: Flux 2 Klein 9B - @bfl_ml: Flux-2-Dev - @reve : V1.1 Fast See thread for how the frontier changes for Text-to-Image 🧵

English
2.3K
3.2K
23.4K
8.7M
Leon Bi retweetledi
Haotian Liu
Haotian Liu@imhaotian·
Grok No.1 and 480p>1080p lol Evaluated with Grok Imagine API released 1 week ago.
Arena.ai@arena

BREAKING: @xAI’s Grok-Imagine-Video now #1 in Video Arena! For the first time, Grok-Imagine-Video-720p takes the top spot on the Image-to-Video leaderboard, overtaking Google’s Veo 3.1 while being 5x cheaper. Its 480p version released a few days ago ranks #4. Huge congrats to @xAI team and @elonmusk on this incredible milestone!

English
2
3
60
3.1K
Leon Bi retweetledi
Higgsfield AI 🧩
Higgsfield AI 🧩@higgsfield_ai·
We just unlocked Grok Imagine's real potential. xAI built the model. We figured out how to actually use it - fluid motion, cinematic POV & multi-shot control. @elonmusk you're gonna like this.
English
3.4K
2.7K
5.1K
589.1K