Joongwon Kim

313 posts

Joongwon Kim

@danieljwkim

PhD student @uwcse @uwnlp | Currently at @AIatMeta | Former undergrad @Penn

Seattle, WA Katılım Ekim 2020

323 Takip Edilen713 Takipçiler

Sabitlenmiş Tweet

Joongwon Kim@danieljwkim·22 Nis

New work @AIatMeta: We enable test-time scaling for long-horizon coding agents by using better representations, selection and reuse of agentic trajectories, with Claude 4.5 Opus improving by +6.7% on SWE-Bench Verified and +12.1% on Terminal-Bench 2.0. 📄: arxiv.org/abs/2604.16529

English

358

278.6K

Joongwon Kim retweetledi

OpenAI@OpenAI·6d

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

English

1.1K

3.9K

26.5K

13.1M

Joongwon Kim retweetledi

Anirudh Goyal@anirudhg9119·22 Nis

How do coding agents get better from experience? Past Attempts as Interface: Turn rollouts into structured summaries that future attempts can build on. arxiv.org/abs/2604.16529

Joongwon Kim@danieljwkim

English

101

12K

Joongwon Kim retweetledi

Anirudh Goyal@anirudhg9119·22 Nis

How do coding agents get better from experience? Past Attempts as Interface: Turn rollouts into reusable summaries that future attempts can build on. arxiv.org/abs/2604.16529

English

36.1K

Joongwon Kim@danieljwkim·22 Nis

Takeaway: scaling long-horizon agents isn't just about more compute – it's about how prior experience is represented, selected, and reused. Joint work with the amazing @anirudhg9119 and my collaborators across Meta, UW, NYU, CMU and Princeton. 📄: arxiv.org/abs/2604.16529 [13/N]

English

Joongwon Kim@danieljwkim·22 Nis

Parallel aggregation analysis: we track pass@1 and pass@N across RTV rounds for both iterations. Average pass@1 rises as our policy selects successful rollouts, while pass@N drops slightly as some rollouts are eliminated – RTV still nets clear gains even after refinement. [12/N]

English

1.1K

Joongwon Kim@danieljwkim·22 Nis

English

358

278.6K

Joongwon Kim retweetledi

Claude@claudeai·16 Nis

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.

English

4.7K

10.2K

81K

13.9M

Joongwon Kim retweetledi

AI at Meta@AIatMeta·8 Nis

To build personal superintelligence, our model’s capabilities should scale predictably and efficiently. Below, we share how we study and track Muse Spark’s scaling properties along three axes: pretraining, reinforcement learning, and test-time reasoning. 🧵👇 Let’s start with pretraining. Over the last 9 months, we rebuilt our pretraining stack with improvements to model architecture, optimization, and data curation, enabling us to increase the capability we can extract from every unit of compute. To rigorously evaluate our new recipe, we fit a scaling law to a series of small models and compare the training FLOPs required to hit a specific level of performance. The results: we can reach the same capabilities with over an order of magnitude less compute than our previous model, Llama 4 Maverick, making Muse Spark significantly more efficient than the leading base models available for comparison.

English

756

70K

Joongwon Kim retweetledi

AI at Meta@AIatMeta·8 Nis

Introducing Muse Spark, the first in the Muse family of models developed by Meta Superintelligence Labs. Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration. Muse Spark is available today at meta.ai and the Meta AI app. We’re also making it available in private preview via API to select partners, and we hope to open-source future versions of the model. Learn more: go.meta.me/43ea00

English

533

1.1K

9.1K

Joongwon Kim retweetledi

Anthropic@AnthropicAI·7 Nis

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

6.7K

44.1K

31.3M

Keşfet

@anirudhg9119 @AIatMeta @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA