Jacob Jackson

681 posts

Jacob Jackson

@jbfja

@cursor_ai, created https://t.co/ZF2DvBx5wF, started @SupermavenAI and @Tabnine, formerly @OpenAI

Katılım Haziran 2020

862 Takip Edilen9.7K Takipçiler

Sabitlenmiş Tweet

Jacob Jackson@jbfja·16 Nis

Why we won't be replaced by AI There’s been a lot of talk about AIs replacing programmers recently. The National Post: “AI is coming after the tech bros and their easy money”. Emad Mostaque, CEO of Stability AI: “There Will Be No Programmers in Five Years”. Some people are changing their career decisions as a result. Investors see the potential too: companies have raised hundreds of millions of dollars promising to replace human developers. Given the salaries that skilled developers can earn, there’s a lot of money in finding a way to cut us out of the picture. But it’s not going to happen. We’re not going to be replaced by the machines. Now, I’m not going to fall into the trap of past AI detractors by predicting deep learning is going to “hit a wall” and being proven wrong 6 months later. I think we’ll continue to see progress as AI systems get more intelligent and new applications are discovered, and I think AI will create a lot of economic value. (That’s why I’m working on an AI company.) And a lot of arguments for why humans will stay relevant boil down to “AI doesn’t work today, so it won’t work in the future”. That’s not reassuring because the whole reason people are worried about this is the rapid rate of progress. But the predictions that this progress will lead to humans being replaced are incorrect - here’s why. An AI can only learn a task when we have a way for the AI to perform the task and then be judged on whether it did the task correctly. The two main tasks that are currently used to train AI are: 1. Next-word prediction: the AI is given a document of text from the internet and asked to predict the next word in the document. 2. Short response generation (RLHF): the AI is given a query or prompt (eg. a user asking for coding help) and asked to generate a response (eg. a solution to the user’s problem). The response is scored by humans on its helpfulness and accuracy. Training AIs on these tasks has led to extremely useful products like ChatGPT, and ChatGPT is great at helping developers in day-to-day work, so it’s reasonable that people think systems like ChatGPT might soon eclipse us. But software development is not just about writing individual functions and short responses. A key part of software development is making technical decisions, like software architecture or project prioritization, that will pay off over the long-term on the scale of 6 months to 5 years. To train AI to do this, we would need a way for it to make technical decisions and a way to judge those decisions for their long-term soundness. We don’t know how to do that, and we especially don’t know how to do that at the scale required to train a large AI model. By definition these sorts of decisions are hard to judge, with no objective criteria for success, only difficult tradeoffs. The performance of AI is always going to be limited by the tasks where it can have a feedback signal, and this makes long-term planning difficult for AI because there is no naturally occurring training data (unlike next-word prediction) and there is no way to quickly assess whether a decision was good or bad (unlike short response generation/RLHF). Training a good next-word prediction model requires a loop of hundreds of thousands of training steps, which is possible because next-word prediction is quick to evaluate, but running the same number of training steps on decisions where you have to wait 1 year to see if the decision was good or bad would take hundreds of thousands of years. We humans are good at this because our minds have been shaped by millions of years of evolution, but AI doesn’t have that time. Some people think we can avoid difficulties like this by having the AI judge its own responses. While this can work to bring a weaker model up to the level of a stronger model, it can’t work to have a strong model improve itself, because if it worked it would imply the ability for a model to continually self-improve in isolation without interacting with the external world. For learning to occur, there has to be a feedback signal coming from external data. The future of AI is continued progress on tasks where we have a good way to train AI to solve them, and continued lack of progress on the tasks where we don’t. Instead of worrying about AI replacing software developers, the right move is to integrate AI tools into your workflow and use AI for what it’s good at while reserving high-level, long-term decisions to be decided by humans, not AI.

English

202

43.2K

Jacob Jackson@jbfja·1d

Composer 2 is frontier-level on coding benchmarks. You should try it!

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

1.6K

Jacob Jackson retweetledi

Sasha Rush@srush_nlp·1d

It was kind of amazing how many RL challenges in this run were bootstrapped by earlier Composers. Interesting times.

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

130

8.8K

Jacob Jackson retweetledi

Sam Kottler@samkottler·1d

a ton of work went into building composer 2 and it's a good model! try it and let us know what you think - x.com/cursor_ai/stat…

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

1.7K

Jacob Jackson retweetledi

Federico Cassano@ellev3n11·1d

a lot went into this model. it was fun! i hope people enjoy it.

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

246

18.5K

Jacob Jackson retweetledi

Ashvin Nair@ashvinair·1d

Very excited for the world to try this model! People like it a lot internally at Cursor - feels frontier-level smart and extremely fast

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

100

5.7K

Jacob Jackson retweetledi

Naman Jain@StringChaos·13 Mar

New post: how we do evals at @cursor_ai. Takeaways: 1. Online metrics from real Cursor requests provide construct validity 2. CursorBench: a dynamic offline suite distilled from online learnings 3. Multi-axes evals -- correctness, efficiency, agent interaction behavior

Cursor@cursor_ai

We're sharing a new method for scoring models on agentic coding tasks. Here's how models in Cursor compare on intelligence and efficiency:

English

142

37.6K

Jacob Jackson retweetledi

Michael Truell@mntruell·13 Kas

After adopting Cursor, businesses merge ~40% more PRs each week. New economics research from the University of Chicago.

𝐒𝐮𝐩𝐫𝐨𝐭𝐞𝐞𝐦 𝐒𝐚𝐫𝐤𝐚𝐫@SuproteemSarkar

Who uses AI agents? How do agents impact output? How might agents change work patterns? New working paper studies usage + impacts of coding agents (1/n)

English

929

224.2K

Jacob Jackson retweetledi

Cursor@cursor_ai·5 Kas

Semantic search improves our agent's accuracy across all frontier models, especially in large codebases where grep alone falls short. Learn more about our results and how we trained an embedding model for retrieving code.

English

107

1.5K

878.7K

Jacob Jackson retweetledi

Sam Kottler@samkottler·29 Eki

I joined @cursor_ai a few months ago to help build a fast model for agentic coding. Very excited the first version has shipped and can't wait to hear what you all think! x.com/cursor_ai/stat…

Cursor@cursor_ai

Introducing Cursor 2.0. Our first coding model and the best way to code with agents.

English

213

36.1K

Jacob Jackson retweetledi

Michiel de Jong@michielsdj·29 Eki

Excited to release our first model. We've been working on it for a while, and it came out of the oven pretty well! I've been enjoying daily driving it and I think you might too.

Cursor@cursor_ai

Introducing Cursor 2.0. Our first coding model and the best way to code with agents.

English

Jacob Jackson retweetledi

Ashvin Nair@ashvinair·29 Eki

Some exciting new to share - I joined Cursor! We just shipped a model 🐆 It's really good - try it out! cursor.com/blog/composer I left OpenAI after 3 years there and moved to Cursor a few weeks ago. After working on RL for my whole career, it was incredible to see RL come alive and be deployed in a product that millions of people use everyday in ChatGPT, and then honestly kind of surreal to see it surpass me at the types of tasks I regarded as markers of intelligence in o1/o3. To me, the next frontier is to bring the whole world of economically useful tasks in-distribution for RL At Cursor, we’re building a team focused on RL foundations and agentic coding - reach out if you’re interested in working with us!

Cursor@cursor_ai

Introducing Cursor 2.0. Our first coding model and the best way to code with agents.

English

788

213.3K

Jacob Jackson retweetledi

Charlie Snell@sea_snell·29 Eki

It has been a joy working on composer with the team and watching all the pieces come together over the past few months I hope people find the model useful

Sasha Rush@srush_nlp

Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and also very fast. cursor.com/blog/composer Excited for the potential of building specialized models to help in critical domains.

English

130

20.6K

Jacob Jackson retweetledi

Alex Wettig@_awettig·29 Eki

We did a thing! cursor.com/blog/composer

Cursor@cursor_ai

Introducing Cursor 2.0. Our first coding model and the best way to code with agents.

English

116

16.3K

Jacob Jackson retweetledi

Sasha Rush@srush_nlp·29 Eki

English

801

357.9K

Jacob Jackson retweetledi

Federico Cassano@ellev3n11·29 Eki

so excited to share Composer with the world! Composer is Cursor's own agentic coding model. it plans, edits, and builds software alongside you with precision, keeping you in flow with incredible speed. i started this project on the side while working on a bug-finder prototype that eventually became Bugbot. before long, Composer took on a life of its own, and i soon shifted my focus completely. since then, both the project and our team have grown into something far beyond what i imagined possible. building Composer together has been the most rewarding experience of my (very short) career. can't wait to see what you all build with it!

Cursor@cursor_ai

Composer is a frontier coding model that completes tasks in under 30 seconds.

English

476

107.7K

Jacob Jackson retweetledi

Cursor@cursor_ai·29 Eki

Composer is a frontier coding model that completes tasks in under 30 seconds.

English

654

431.7K

Jacob Jackson retweetledi

OpenAI@OpenAI·30 Eyl

Sora 2 is here.

English

1.7K

2.4K

21.1K

Jacob Jackson@jbfja·18 Eyl

@parth_29 @ericzakariasson @cursor_ai @phil_krav @Shomil_J It’s a bias-variance tradeoff. Adding off-policy data reduces variance at the cost of adding bias. It could make sense depending on how low your variance is. But at sufficiently large batch size, it’s harmful.

English

Parth Chadha@parth_29·18 Eyl

@ericzakariasson @cursor_ai @jbfja @phil_krav @Shomil_J Why not allow some staleness (off policy by 1-8 steps)? It would help improve your throughput and shouldn’t hurt accuracy much.

English

188

eric zakariasson@ericzakariasson·17 Eyl

using online rl, @cursor_ai tab retrains and updates every 1-2 hours based on real-time user feedback. hats off to @jbfja @phil_krav @Shomil_J and team for pulling this off

English

159

12.1K

Jacob Jackson retweetledi

will depue@willdepue·13 Eyl

online RL is one of the most exciting directions for the field, and i’ve been incredibly impressed with Cursor being seemingly the first to implement it successfully at scale with a frontier capability. so cool!

Cursor@cursor_ai

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.

English

663

78.4K

Jacob Jackson@jbfja·12 Eyl

Excited to share some of the work we’ve been doing to improve Tab using our data on how people respond to suggestions!

Cursor@cursor_ai

English

294

54.8K

Keşfet

@cursor_ai @parth_29 @ericzakariasson @phil_krav @Shomil_J @elonmusk @BarackObama @taylorswift13