Paul Ashbourne

447 posts

Paul Ashbourne banner
Paul Ashbourne

Paul Ashbourne

@paulashbourne

agents + rl infra @openai | 💬 Opinions are my own | Made in Canada 🇨🇦

San Francisco, CA Katılım Ocak 2011
371 Takip Edilen396 Takipçiler
Paul Ashbourne retweetledi
roon
roon@tszzl·
roon tweet media
ZXX
70
77
1.7K
71.1K
Paul Ashbourne retweetledi
Seán Ó hÉigeartaigh
Seán Ó hÉigeartaigh@S_OhEigeartaigh·
Real slap in the face to every OpenAI person who spoke up in Anthropic's support. Anthropic staff should be embarrassed - not that this was leaked, but that it was sent. After weeks of calling for industry solidarity, I'm embarrassed.
English
18
9
233
30.1K
Paul Ashbourne retweetledi
Dean W. Ball
Dean W. Ball@deanwball·
I do not share the cynicism of some with respect to OpenAI’s actions in the DoW/Ant dispute. It basically seems to me as though OpenAI was attempting to deescalate last week; whether they executed well is a separate question, but in their defense good execution in such chaos was nearly impossible. But from where I sit it seems OpenAI tried to reduce tensions and find a productive path forward, while allowing its employees considerable latitude to speak their minds. The easy thing would have been for management to stay quiet and let this happen; they did not do that, and they also stood firm in opposition to the supply-chain risk designation. In general, OpenAI is unjustly maligned. This is the thing that bothers me the most about Dario’s leaked memo; it spends so much time on OpenAI conspiracies and cynicism that I fear industry solidarity in the future will be harder than it needs to be. This is not the last time we will see state interference into frontier AI, and until we build formalized structures for such interference it will be important for the industry to hang tough together. I fear that will be less likely now.
English
39
40
519
42.2K
Paul Ashbourne
Paul Ashbourne@paulashbourne·
@m_franceschetti Sounds like a plan. Good luck to you and the team sprinting on outage mode, and thanks for leading transparently on this. 🚀
English
1
0
5
13.2K
Matteo Franceschetti
Matteo Franceschetti@m_franceschetti·
@paulashbourne Thanks Paul. Let me fix this first, ship an outage mode and then we will look at the next features ;)
English
5
0
55
188.9K
Matteo Franceschetti
Matteo Franceschetti@m_franceschetti·
The AWS outage has impacted some of our users since last night, disrupting their sleep. That is not the experience we want to provide and I want to apologize for it. We are taking two main actions: 1) We are restoring all the features as AWS comes back. All devices are currently working, with some experiencing data processing delays. 2) We are currently outage-proofing your Pod experience and we will be working tonight-24/7 until that is done. More updates soon.
English
673
195
4.8K
7.9M
Paul Ashbourne
Paul Ashbourne@paulashbourne·
sora 2 is essentially interdimensional cable, but short form
Paul Ashbourne tweet media
English
0
0
5
2K
Paul Ashbourne retweetledi
OpenAI
OpenAI@OpenAI·
10am PT.
Português
539
481
5.7K
2.1M
Abhishek Bhardwaj
Abhishek Bhardwaj@abshkbh·
For the past year I’ve been building Arrakis on a single thesis: with the right tools and secure environments, LLMs can reliably do complex work. This journey started two years ago when I left a stable role at Google to work on early coding agents. While still at Google, I wrote a long email to @gdb about how a systems engineer could break into AI. Arrakis opened doors and has led to a full-circle moment: I’ve joined @OpenAI to work on Agent Infrastructure in the Scaling org. It’s a privilege to help people through smarter models and agents. I’m especially excited about our coding initiatives. Thank you @gdb and @paulashbourne for the opportunity. Looking back, the biggest risk was not taking one!
Abhishek Bhardwaj tweet media
English
27
12
399
121.3K
roon
roon@tszzl·
@paulg I liked the old one better tbh
English
9
0
99
27K
Paul Graham
Paul Graham@paulg·
I finally went to visit OpenAI's new building. It's the nicest office I've ever seen. So many different shaped spaces, and such good color. Whoever was in charge of this did a really good job.
English
223
106
6.7K
747K
Paul Ashbourne retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
OpenAI gave us early access to GPT-5: our independent benchmarks verify a new high for AI intelligence. We have tested all four GPT-5 reasoning effort levels, revealing 23x differences in token usage and cost between the ‘high’ and ‘minimal’ options and substantial differences in intelligence We have run our full suite of eight evaluations independently across all reasoning effort configurations of GPT-5 and are reporting benchmark results for intelligence, token usage, and end-to-end latency. What @OpenAI released: OpenAI has released a single endpoint for GPT-5, but different reasoning efforts offer vastly different intelligence. GPT-5 with reasoning effort “High” reaches a new intelligence frontier, while “Minimal” is near GPT-4.1 level (but more token efficient). Takeaways from our independent benchmarks: ⚙️ Reasoning effort configuration: GPT-5 offers four reasoning effort configurations: high, medium, low, and minimal. Reasoning effort options steer the model to “think” more or less hard for each query, driving large differences in intelligence, token usage, speed, and cost. 🧠 Intelligence achieved ranges from frontier to GPT-4.1 level: GPT-5 sets a new standard with a score of 68 on our Artificial Analysis Intelligence Index (MMLU-Pro, GPQA Diamond, Humanity’s Last Exam, LiveCodeBench, SciCode, AIME, IFBench & AA-LCR) at High reasoning effort. Medium (67) is close to o3, Low (64) sits between DeepSeek R1 and o3, and Minimal (44) is close to GPT-4.1. While High sets a new standard, the increase over o3 is not comparable to the jump from GPT-3 to GPT-4 or GPT-4o to o1. 💬 Token usage varies 23x between reasoning efforts: GPT-5 with High reasoning effort used more tokens than o3 (82M vs. 50M) to complete our Index, but still fewer than Gemini 2.5 Pro (98M) and DeepSeek R1 0528 (99M). However, Minimal reasoning effort used only 3.5M tokens which is substantially less than GPT-4.1, making GPT-5 Minimal significantly more token-efficient for similar intelligence. 📖 Long Context Reasoning: We released our own Long Context Reasoning (AA-LCR) benchmark earlier this week to test the reasoning capabilities of models across long sequence lengths (sets of documents ~100k tokens in total). GPT-5 stands out for its performance in AA-LCR, with GPT-5 in both High and Medium reasoning efforts topping the benchmark. 🤖 Agentic Capabilities: OpenAI also commented on improvements across capabilities increasingly important to how AI models are used, including agents (long horizon tool calling). We recently added IFBench to our Intelligence Index to cover instruction following and will be adding further evals to cover agentic tool calling to independently test these capabilities. 📡 Vibe checks: We’re testing the personality of the model through MicroEvals on our website which supports running the same prompt across models and comparing results. It’s free to use, we’ll provide an update with our perspective shortly but feel free to share your own! See below for further analysis:
Artificial Analysis tweet media
English
44
125
712
105.2K
Jason Lee
Jason Lee@jasondeanlee·
How do I short oai before gpt5 release?
English
15
0
95
10.1K
Paul Ashbourne
Paul Ashbourne@paulashbourne·
There are going to be a lot of high 5s going around the @openai office tomorrow
English
0
0
7
547
Paul Ashbourne retweetledi
Sebastien Bubeck
Sebastien Bubeck@SebastienBubeck·
It’s hard to overstate the significance of this. It may end up looking like a “moon‑landing moment” for AI. Just to spell it out as clearly as possible: a next-word prediction machine (because that's really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies.
Alexander Wei@alexwei_

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

English
59
156
1.4K
261K
Paul Ashbourne
Paul Ashbourne@paulashbourne·
@anupk24 @ns123abc 4th of July is one thing, but just wait until they find out that we do the same thing at Thanksgiving
English
0
0
14
801
Anup
Anup@anupk24·
@ns123abc This happens literally every year and has been on my calendar for months...
English
4
1
105
6.9K
NIK
NIK@ns123abc·
🚨NEWS: OpenAI is officially shutting down next week “to give employees time to recharge” LMAO
NIK tweet mediaNIK tweet media
English
180
87
1.9K
284.5K