Naman Jain

555 posts

Naman Jain

Naman Jain

@StringChaos

Research @cursor_ai | CursorBench, LiveCodeBench, DeepSWE, R2E-Gym, GSO, LMArena Coding | Past: @UCBerkeley @MetaAI @AWS @MSFTResearch @iitbombay

San Francisco, CA Entrou em Mart 2018
1.4K Seguindo2.8K Seguidores
Naman Jain retweetou
Cursor
Cursor@cursor_ai·
We trained Composer to self-summarize through RL instead of a prompt. This reduces the error from compaction by 50% and allows Composer to succeed on challenging coding tasks requiring hundreds of actions.
Cursor tweet media
English
86
101
1.6K
216.5K
Naman Jain
Naman Jain@StringChaos·
Lots more details in the post: 1. Pareto frontier across different metrics 2. How CursorBench has shifted as agent capabilities changed 3. CursorBench vs public evals: what’s missing and future work directions 4. CursorBench vs online: how online metrics shape offline evals
English
1
1
12
1.1K
Naman Jain retweetou
Manish Shetty
Manish Shetty@slimshetty_·
GSO Update. gpt-5.4 (xhigh) scores 31.4% with reasoning_effort=high, gpt-5.4 slightly lower than gpt-5.2. a quick thought on why below:
Manish Shetty tweet media
English
3
4
62
6.5K
Naman Jain retweetou
Cursor
Cursor@cursor_ai·
Long-running agents are now available at cursor.com/agents for Ultra, Teams, and Enterprise plans. With our new harness, agents can complete much larger tasks. cursor.com/blog/long-runn…
Cursor tweet media
English
59
93
961
343.4K
Naman Jain retweetou
Cursor
Cursor@cursor_ai·
Composer 1.5 is now available. We’ve found it to strike a strong balance between intelligence and speed.
Cursor tweet media
English
154
184
1.9K
659.2K
Naman Jain retweetou
Michael Truell
Michael Truell@mntruell·
We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week. It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM. It *kind of* works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.
Michael Truell tweet media
Cursor@cursor_ai

GPT-5.2 Codex is now available in Cursor! We believe it's the frontier model for long-running tasks.

English
687
921
9.6K
6.4M
Naman Jain retweetou
Michael Truell
Michael Truell@mntruell·
We rebuilt how our agent uses context. Instead of stuffing everything into a prompt, Cursor dynamically discovers context via files, tools, and history, cutting token usage by 46.9% and freeing up more space for the agent to work.
Cursor@cursor_ai

Cursor's agent now uses dynamic context for all models. It's more intelligent about how context is filled while maintaining the same quality. This reduces total tokens by 46.9% when using multiple MCP servers.

English
108
87
2.5K
255.7K
Naman Jain retweetou
Deedy
Deedy@deedydas·
The reviews are in. Cursor's new Composer-1 model is really good at coding, especially for large codebases! — ~4x faster — good at using it's own search tool to find the right files — much better than the base open-source model it's RL'd on and it's free to use right now.
Deedy tweet media
English
49
31
446
45.8K
Naman Jain retweetou
Sasha Rush
Sasha Rush@srush_nlp·
Talk at Ray Summit on "Building Cursor Composer." Overview of the work from our research team. youtube.com/watch?v=md8D8e…
YouTube video
YouTube
English
8
49
396
185.7K
Naman Jain retweetou
Amanda Bertsch
Amanda Bertsch@abertsch72·
Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
Amanda Bertsch tweet media
English
13
66
355
80.2K
Naman Jain retweetou
Cursor
Cursor@cursor_ai·
Semantic search improves our agent's accuracy across all frontier models, especially in large codebases where grep alone falls short. Learn more about our results and how we trained an embedding model for retrieving code.
English
73
107
1.5K
878.8K
John Yang
John Yang@jyangballin·
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
English
31
94
411
95.1K
Naman Jain retweetou
will brown
will brown@willccbb·
ok composer-1 is pretty nuts, and the code it writes is quite nice. probably my new daily driver for many things not quite as galaxy-brain as codex, but it's SO fast that you can use it sync instead of async, and very quickly iterate on fixes. follows instructions very well
English
27
19
541
104.1K