Joongwon Kim

313 posts

Joongwon Kim

Joongwon Kim

@danieljwkim

PhD student @uwcse @uwnlp | Currently at @AIatMeta | Former undergrad @Penn

Seattle, WA Katılım Ekim 2020
323 Takip Edilen713 Takipçiler
Sabitlenmiş Tweet
Joongwon Kim
Joongwon Kim@danieljwkim·
New work @AIatMeta: We enable test-time scaling for long-horizon coding agents by using better representations, selection and reuse of agentic trajectories, with Claude 4.5 Opus improving by +6.7% on SWE-Bench Verified and +12.1% on Terminal-Bench 2.0. 📄: arxiv.org/abs/2604.16529
Joongwon Kim tweet media
English
9
42
358
278.6K
Joongwon Kim retweetledi
OpenAI
OpenAI@OpenAI·
Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.
English
1.1K
3.9K
26.5K
13.1M
Joongwon Kim retweetledi
Anirudh Goyal
Anirudh Goyal@anirudhg9119·
How do coding agents get better from experience? Past Attempts as Interface: Turn rollouts into structured summaries that future attempts can build on. arxiv.org/abs/2604.16529
Anirudh Goyal tweet media
Joongwon Kim@danieljwkim

New work @AIatMeta: We enable test-time scaling for long-horizon coding agents by using better representations, selection and reuse of agentic trajectories, with Claude 4.5 Opus improving by +6.7% on SWE-Bench Verified and +12.1% on Terminal-Bench 2.0. 📄: arxiv.org/abs/2604.16529

English
1
14
101
12K
Joongwon Kim retweetledi
Anirudh Goyal
Anirudh Goyal@anirudhg9119·
How do coding agents get better from experience? Past Attempts as Interface: Turn rollouts into reusable summaries that future attempts can build on. arxiv.org/abs/2604.16529
Anirudh Goyal tweet media
English
3
14
82
36.1K
Joongwon Kim
Joongwon Kim@danieljwkim·
Takeaway: scaling long-horizon agents isn't just about more compute – it's about how prior experience is represented, selected, and reused. Joint work with the amazing @anirudhg9119 and my collaborators across Meta, UW, NYU, CMU and Princeton. 📄: arxiv.org/abs/2604.16529 [13/N]
English
1
1
12
1K
Joongwon Kim
Joongwon Kim@danieljwkim·
Parallel aggregation analysis: we track pass@1 and pass@N across RTV rounds for both iterations. Average pass@1 rises as our policy selects successful rollouts, while pass@N drops slightly as some rollouts are eliminated – RTV still nets clear gains even after refinement. [12/N]
Joongwon Kim tweet media
English
1
1
4
1.1K
Joongwon Kim
Joongwon Kim@danieljwkim·
New work @AIatMeta: We enable test-time scaling for long-horizon coding agents by using better representations, selection and reuse of agentic trajectories, with Claude 4.5 Opus improving by +6.7% on SWE-Bench Verified and +12.1% on Terminal-Bench 2.0. 📄: arxiv.org/abs/2604.16529
Joongwon Kim tweet media
English
9
42
358
278.6K
Joongwon Kim retweetledi
Claude
Claude@claudeai·
Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.
Claude tweet media
English
4.7K
10.2K
81K
13.9M
Joongwon Kim retweetledi
AI at Meta
AI at Meta@AIatMeta·
To build personal superintelligence, our model’s capabilities should scale predictably and efficiently. Below, we share how we study and track Muse Spark’s scaling properties along three axes: pretraining, reinforcement learning, and test-time reasoning. 🧵👇 Let’s start with pretraining. Over the last 9 months, we rebuilt our pretraining stack with improvements to model architecture, optimization, and data curation, enabling us to increase the capability we can extract from every unit of compute. To rigorously evaluate our new recipe, we fit a scaling law to a series of small models and compare the training FLOPs required to hit a specific level of performance. The results: we can reach the same capabilities with over an order of magnitude less compute than our previous model, Llama 4 Maverick, making Muse Spark significantly more efficient than the leading base models available for comparison.
AI at Meta tweet media
English
31
76
756
70K
Joongwon Kim retweetledi
AI at Meta
AI at Meta@AIatMeta·
Introducing Muse Spark, the first in the Muse family of models developed by Meta Superintelligence Labs. Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration. Muse Spark is available today at meta.ai and the Meta AI app. We’re also making it available in private preview via API to select partners, and we hope to open-source future versions of the model. Learn more: go.meta.me/43ea00
AI at Meta tweet media
English
533
1.1K
9.1K
3M
Joongwon Kim retweetledi
Anthropic
Anthropic@AnthropicAI·
Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing
English
2K
6.7K
44.1K
31.3M