GS Oh

10 posts

GS Oh

GS Oh

@GS_Oh_AI

Member of Technical Staff @ xAI RL Post-training | Trained Grok 4.1, 4.2 Previously at DeepMind (Gemini ~2.5 + Deep Research). PhD for Generative models + RL

Inscrit le Ağustos 2022
65 Abonnements75 Abonnés
GS Oh retweeté
X Freeze
X Freeze@XFreeze·
The new Grok 4.20 Beta benchmarks are wild 🥇 #1 lowest hallucinating AI (22%) 🥇 #1 at following instructions (83%) 🥈 #2 in agentic tool use (97%) Grok 4.20 ranks #1 in the lowest hallucination rate ever recorded across all AI models tested globally Most models race to sound smart. Grok 4.20 was built to never lie and still dominates on instruction following and agentic tasks This is literally a 500B model performing top-notch in the things that matter most
X Freeze tweet media
English
219
178
1K
4.4M
Design Arena
Design Arena@Designarena·
BREAKING: xAI and Kling have the strongest video and video editing models, as measured by 50+ video models on Design Arena #1 Video Generation: Grok Imagine by @xai #1 Video Editing: Grok Imagine by @xai #1 Image to Video Generation: Grok Imagine by @xai #1 Multi-Input to Video Generation: O1 Edit by @Kling_ai Congrats to @xai and @Kling_ai for defining SOTA!
Design Arena tweet media
English
7
16
152
627.6K
GS Oh retweeté
Arena.ai
Arena.ai@arena·
Grok 4.20 Beta Reasoning has landed #7 for Text Arena & #28 for Code Arena. The model is on par with DeepSeek-v3.2- thinking and Qwen3.5-122b-a10b in Code Arena's agentic webdev tasks. More Highlights: - #7 in Text Arena overall tied with GPT-5.4-high - top 10 in Math, Multi-Turn, Creative Writing, Coding & Hard Prompts - top 15 in Expert Arena Congrats to @xAi and @elonmusk on this new milestone.
Arena.ai tweet media
English
15
27
257
19.2K
GS Oh retweeté
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
The Grok 4.20 Beta shows three major improvements over Grok 4: ➤ Our lowest ever hallucination rate on the AA-Omniscience evaluation. When Grok did not know the answer, it hallucinated an incorrect answer 22% of the time - this is the lowest hallucination rate of any model we have tested, topping Claude Haiku 4.5 (25%) ➤ Top scores for instruction following and prompt adherence. On IFBench, Grok 4.20 takes the #1 spot with 82.9% - a +29.2 point increase on Grok 4 ➤ Leading speed for its intelligence. At 265 tokens per second output speed on xAI’s API, Grok 4.20 is significantly faster than its peer and over 2x the output speed seen from Grok 4.1 Fast Congratulations to @xai and @elonmusk on the 4.20 Beta 0309 launch!
Artificial Analysis tweet media
English
224
298
2.3K
5.6M
Eric Jiang
Eric Jiang@veggie_eric·
profile pic of the best engineer at your company
Eric Jiang tweet media
English
208
866
22.5K
2.9M
GS Oh
GS Oh@GS_Oh_AI·
@SeongsikKi5837 @xai I'll miss you a lot! it was really fun working with you last few weeks
English
1
0
3
299
Seongsik Kim
Seongsik Kim@SeongsikKi5837·
Friday was my last day at @xAI. It truly was a wild ride—pushing the frontier on Grok 3, Grok 4, Grok 4.1 Fast and Macrohard. Grateful to have been on this rocketship, working with the most intense, brilliant people I’ve ever met. Ad astra 🚀
English
38
4
421
25.9K
GS Oh
GS Oh@GS_Oh_AI·
@arena Grok 4.20 🚀
English
0
1
1
98
Arena.ai
Arena.ai@arena·
In the Text Arena, Grok-4.20-Beta1 ranks #4, scoring 1492 closing the gap to Gemini 3.1 Pro
Arena.ai tweet media
English
9
16
207
31.1K
Arena.ai
Arena.ai@arena·
Grok 4.20 beta1 (single agent) debuts #1 on Search Arena, and #4 overall in Text Arena! Highlights: - #1 in Search, scoring 1226, leading GPT-5.2 and Gemini-3 - #4 in Text, scoring 1492 on par with Gemini 3.1 Pro Congrats to the @xAI team and @elonmusk on this impressive milestone!
Arena.ai tweet media
English
234
239
1.8K
10.1M
GS Oh
GS Oh@GS_Oh_AI·
Grok 4.20 beta1 has been out for a few days and it is an exciting one! I am personally excited and honored to deliver RL training recipes and to train Grok 4.20 to achieve #4 overall on Arena and #1 overall on Search Arena!
Arena.ai@arena

Grok 4.20 beta1 (single agent) debuts #1 on Search Arena, and #4 overall in Text Arena! Highlights: - #1 in Search, scoring 1226, leading GPT-5.2 and Gemini-3 - #4 in Text, scoring 1492 on par with Gemini 3.1 Pro Congrats to the @xAI team and @elonmusk on this impressive milestone!

English
7
4
52
11.2K
GS Oh
GS Oh@GS_Oh_AI·
@Yuhu_ai_ best wishes to you and your next endeavor!
English
0
0
2
518
Yuhuai (Tony) Wu
Yuhuai (Tony) Wu@Yuhu_ai_·
I resigned from xAI today. This company - and the family we became - will stay with me forever. I will deeply miss the people, the warrooms, and all those battles we have fought together. It's time for my next chapter. It is an era with full possibilities: a small team armed with AIs can move mountains and redefine what's possible. Thank you to the entire xAI family. Onward. 🚀 And to Elon @elonmusk - thank you for believing in the mission and for the ride of a lifetime.
English
742
365
9.3K
3.6M
GS Oh retweeté
SpaceX
SpaceX@SpaceX·
SpaceX has acquired xAI, forming one of the most ambitious, vertically integrated innovation engines on (and off) Earth → #xai-joins-spacex" target="_blank" rel="nofollow noopener">spacex.com/updates#xai-jo…
SpaceX tweet media
English
3.9K
7.9K
45.2K
19.3M