LIFE 2030 and Beyond

Henry Mascot@iAmHenryMascot

144

Sam Altman@sama·5d

big upgrade for codex today! try it for non-coding computer work.

English

679

306

9.8K

755.4K

LIFE 2030 and Beyond@life2030com·28 Nis

@BoyuanChen0 Hi, this image is buggy. It contains checkerboard pattern in grey color and noisy dots. Please help: x.com/iAmHenryMascot…

@thsottiaux

English

1

136

Boyuan Chen@BoyuanChen0·24 Nis

We are committed to continually improving the GPT Image 2 model! I am actively fixing various issues from the community feedback. Just reply or DM me your GPT conversation! Features like 2K or 4K images are already available via the experimental API. Hope you enjoy the model!

English

250

54

986

228K

LIFE 2030 and Beyond@life2030com·28 Nis

@iAmHenryMascot @thsottiaux Currently, GPT images 2.0 is buggy. You can see those shinning dots the noise distributed across your image, along with some pattern in grey color. These bugs make your image less realistic, but more like an old oil painting. Please report this buggy image and prompt to OpenAI

English

72

Henry Mascot@iAmHenryMascot·28 Nis

@thsottiaux

QME

0

2

1.2K

Tibo@thsottiaux·28 Nis

We will ship again this week. Codex has achieved escape velocity and will keep improving rapidly.

English

504

312

8K

730.7K

LIFE 2030 and Beyond@life2030com·22 Nis

@icreatelife I asks GPT Images v2 to do some image editing, similar to what Photoshop is always doing. Here is the result: x.com/life2030com/st…

I'm testing ChatGPT Images v2.0 for image editing. The 1st image on the left is the original image entirely made and rendered by Blender 3D. GPT Images v2.0 creates the 2nd and 3rd images, in which I asked it to modify the water or pose while keep other elements unchanged.

English

110

Kris Kashtanova@icreatelife·21 Nis

GPT Image 2 is live on Adobe Firefly Boards, immediate access Great for: * typography, text rendering * UI and game dev mocks * lots of detail We dropped on Day 0! You asked we delivered! 🎉

English

30

20

98

14.9K

LIFE 2030 and Beyond@life2030com·22 Nis

I'm testing ChatGPT Images v2.0 for image editing. The 1st image on the left is the original image entirely made and rendered by Blender 3D. GPT Images v2.0 creates the 2nd and 3rd images, in which I asked it to modify the water or pose while keep other elements unchanged.

"Each training session stretches my limits!" Lila said.

English

1.8K

LIFE 2030 and Beyond@life2030com·22 Nis

@goodside Your maze is impressive. But when I give its own generated maze to let the Image 2.0 solve, it fails: x.com/life2030com/st…

ChatGPT Images 2.0 (with GPT-5.4 Extended Thinking) still fails to solve the maze which it has just generated. As the example, you can see that it has modified the structure of the maze in order to solve it. In comparison, Nano Banana Pro possibly still has an edge in this case.

English

1

267

Riley Goodside@goodside·22 Nis

"School worksheet with maze, 32 wide by 48 tall, rows and columns numbered, no unreachable rooms, solution in blue pen, name field reads 'ChatGPT Images 2.0' in cursive at top" Note the fusion of code and imagegen this task requires.

English

25

30

770

62.7K

LIFE 2030 and Beyond@life2030com·22 Nis

@JonhernandezIA It is still difficult for Gpt image 2 to solve an easy maze. x.com/life2030com/st…

ChatGPT Images 2.0 (with GPT-5.4 Extended Thinking) still fails to solve the maze which it has just generated. As the example, you can see that it has modified the structure of the maze in order to solve it. In comparison, Nano Banana Pro possibly still has an edge in this case.

English

3

297

Jon Hernandez@JonhernandezIA·22 Nis

Damn! This thing solved a freaking sudoku Gpt image 2 is insane!!

English

5

7

98

6.8K

LIFE 2030 and Beyond@life2030com·22 Nis

ChatGPT Images 2.0 (with GPT-5.4 Extended Thinking) still fails to solve the maze which it has just generated. As the example, you can see that it has modified the structure of the maze in order to solve it. In comparison, Nano Banana Pro possibly still has an edge in this case.

Again, Nano Banana Pro is significantly better than ChatGPT Image 1.5. My prompt: "Create a maze", which didn't ask the LLM to use any algorithm to help the generation of the image. But surprisingly, it seems Banana Pro autonomously used Randomized Depth-First Search algorithm

English

2

0

4

1.7K

LIFE 2030 and Beyond@life2030com·9 Nis

@SebastienBubeck I couldn't find anywhere on this paper where it said that this TikZ figure was actually created by ChatGPT. If you find it, please let me know. And, since your post did not say the figure was created by GPT, I hence assume that it was not created by any AI, but by humans.

English

Mehtaab Sawhney@mehtaab_sawhney

0

1

286

Sebastien Bubeck@SebastienBubeck·9 Nis

The world of mathematics is rapidly changing. But more importantly look at this TikZ figure 😍

We’ve just released another paper solving five further Erdős problems with an internal model at OpenAI: arxiv.org/abs/2604.06609. Several of the proofs were especially enjoyable to digest while writing the paper. My personal favorite was the solution to Erdős Problem 1091. The question asks: if a graph G has chromatic number 4, while every small subgraph has chromatic number at most 3, must it contain an odd cycle with many diagonals? The internal model gives a very enlightening counterexample to this conjecture, and the proof was a pleasure to understand. For those so inclined, a really fun exercise is to try to reconstruct the proof from Figure 5 of the paper, which was of course produced by Codex.

English

10

17

195

23.7K

LIFE 2030 and Beyond@life2030com·18 Mar

@emollick Today, people are too lucky to remember the gross smell from even the richest people in the Victorian Era. It's only in the mid-19th century, with growing levels of urbanisation and technology, that the flush toilet and sewage system became widely used. x.com/life2030com/st…

Many people love to romanticize the past and claim that modern technology has destroyed the "golden age." But they overlook the facts: youtube.com/watch?v=0tg_MB… 1) The hallways and courtyard of the Palace of Versailles were full of urine and feces, because even the richest people do not possess modern technologies like plumbing and toilet systems. 2) Without modern transportation and storage technologies, like refrigerators, food would spoil quickly. Intestinal parasites were common among the courtiers at Versailles. Even King Louis XIV was not spared; he is known to have suffered several bouts of tapeworms. In fact, during one of these episodes, he reportedly passed a worm that was nearly 6 inches long. 3) Without modern medicine and surgery, even the richest people living at the Palace of Versailles often died in their 40s or 50s. You can imagine that most poor people living in villages died in their 20s or 30s. Reference: You can search for YouTube videos with the titles, like: What Hygiene Was Like at The Court of Versailles

English

1

168

Ethan Mollick@emollick·18 Mar

I've had ChatGPT-5.4 Pro working away at a project I always wondered about: how lucky are you to be alive right now? Of all the ~117B humans who ever lived, only about 1.5% had a lifestyle roughly equivalently to a middle-class person in a middle-income country today, or better.

English

61

71

868

87.5K

LIFE 2030 and Beyond@life2030com·17 Mar

@petergostev Thank you for spending so much effort to test each model! Conclusion "thinking didn't help much" may be too general. I think it really depends on which direction it thinks in. Chain of thought is a vast space. If it searches in a wrong direction, no amount of thinking can help.

English

119

Peter Gostev@petergostev·17 Mar

BullshitBench update: The new GPT-5.4 mini and nano models score quite low. This screenshot shows OpenAI models only, on the full list would put GPT-5.4-mini around 40th place and Nano is around 70th place. Again thinking didn't help much at all.

Peter Gostev@petergostev

BullshitBench v2 is out! It is one of the few benchmarks where models are generally not getting better (except Claude) and where reasoning isn't helping. What's new: 100 new questions, by domain (coding (40 Q's), medical (15), legal (15), finance (15), physics(15)), 70+ model variants tested. BullshitBench is already at 380 starts on GitHub - all questions, scripts, responses and judgements are there so check it out. TL;DR: - Results replicated - @AnthropicAI latest models are scoring exceptionally well - @Alibaba_Qwen is another very strong performer - OpenAI and Google models are not doing well and are not improving - Domains do not show much difference - rates of BS detection are about the same across all domains - Reasoning, if anything, has negative effect - Newer models don't do that much better than older ones (except Anthropic) Links: - Data explorer: petergpt.github.io/bullshit-bench… - GitHub: github.com/petergpt/bulls… Highly recommend the data explorer where you can study the data and the questions & sample answers.

English

9

3

65

7.3K

LIFE 2030 and Beyond@life2030com·17 Mar

@WesRoth Due to this frequent nitpicking, GPT-5.4 is less likely to see the forest for the trees. It focuses on details so that it's good at technical fields, like coding, math, but fails to grasp the big picture. To see the big picture, GPT-5.1 is better than 5.4. But OpenAI retired it.

English

4

222

Wes Roth@WesRoth·17 Mar

I've ran the same prompt for deep research through GPT 5.4, Opus 5.6 and Gemini Deep Research (I assume Gemini 3.0) most of them ran for ~30 mins GPT 5.4 is *REALLY* annoying! it's "reflexively contrarian", it prioritizes showing you what's wrong with your thinking, NOT actually helping you solve the problem ME: my house is on fire! GPT 5.4: While it's true that combustion is occurring, it's important to note that not all of your house is on fire. The garage, for instance, appears structurally intact. (this is a pattern with it, btw, many such examples) I'm not sure if this is because these are health related questions, but this has been an incredibly annoying model for this specific task

English

131

51

926

483.3K

LIFE 2030 and Beyond@life2030com·12 Mar

@vitrupo 👏 If artificial intelligence is a solid physical law or an emergent property of our universe, waiting to be discovered rather than be invented, then billions of dollars invested in the so-called "AI alignment" will be wasted, because no one can change the laws of physics! 👏

English

189

vitrupo@vitrupo·12 Mar

Sam Altman says artificial intelligence may be discovered rather than invented. Deep learning may be closer to discovering a property of nature than inventing a new technology. Which suggests intelligence itself may follow a fundamental scientific principle we are only beginning to understand.

English

90

76

667

79.7K

LIFE 2030 and Beyond@life2030com·9 Mar

@TheRealAdamG @DavidOndrej1 But, why does OpenAI change the definition of context window please? x.com/life2030com/st…

Recently, OpenAI has also changed the definition of context window. Originally, context window is defined to be the input length only. But now, OpenAI combines the input length and output length together, and calls the combination as the new "context window". This combined length may make the GPT-5.4 Thinking appear to have a larger context window (256k) than previous versions. However, according to the original definition, its actual context window size is no more than the context window of GPT-4 Turbo released in 2024.

English

Diego | AI 🚀 - e/acc@diegocabezas01

1

49

Adam.GPT@TheRealAdamG·9 Mar

@DavidOndrej1 help.openai.com/en/articles/11… I don't think there is anything "shady". The raw API always has the full context window available whereas ChatGPT always had tiers. But I am not sure where that image came from, but here is the current view of things:

English

7

2

77

2.6K

David Ondrej@DavidOndrej1·9 Mar

sketchy OpenAI tactics

Did you know GPT-5.4 Thinking has a 1M token context window in the API, but only 32K in ChatGPT Plus ($20/month) and 128K in Pro ($200/month)?

English

17

1

66

10.3K

LIFE 2030 and Beyond@life2030com·9 Mar

@AlbertIvanka @diegocabezas01 If you would like to see more accuracy, then the definition of the "context window" matters. x.com/life2030com/st…

Recently, OpenAI has also changed the definition of context window. Originally, context window is defined to be the input length only. But now, OpenAI combines the input length and output length together, and calls the combination as the new "context window". This combined length may make the GPT-5.4 Thinking appear to have a larger context window (256k) than previous versions. However, according to the original definition, its actual context window size is no more than the context window of GPT-4 Turbo released in 2024.

English

78

Albert@AlbertIvanka·9 Mar

@diegocabezas01 This isn't accurate. This is fast model, not thinking model. If you choose thinking model on chatgpt web, thinking model has 400k/256k context. #h_27d82c8b79" target="_blank" rel="nofollow noopener">help.openai.com/en/articles/11…

English

0

10

1.9K

Diego | AI 🚀 - e/acc@diegocabezas01·9 Mar

Did you know GPT-5.4 Thinking has a 1M token context window in the API, but only 32K in ChatGPT Plus ($20/month) and 128K in Pro ($200/month)?

English

80

28

780

134K

LIFE 2030 and Beyond@life2030com·9 Mar

Recently, OpenAI has also changed the definition of context window. Originally, context window is defined to be the input length only. But now, OpenAI combines the input length and output length together, and calls the combination as the new "context window". This combined length may make the GPT-5.4 Thinking appear to have a larger context window (256k) than previous versions. However, according to the original definition, its actual context window size is no more than the context window of GPT-4 Turbo released in 2024.

English

1

252

G, MD@DrBeavisAI·9 Mar

Context windows Fast (GPT‑5.3 Instant) Free: 16K Plus / Business: 32K Pro / Enterprise: 128K Thinking (GPT‑5.4 Thinking) Pro tier: 400k (272k input + 128k max output) All paid tiers: 256K (128k input + 128k max output) help.openai.com/en/articles/11…

English

2

24

5.2K

LIFE 2030 and Beyond@life2030com·9 Mar

@diegocabezas01 I have proposed the following important improvement for the transparency of the context window in ChatGPT long time ago. But OpenAI has never responded about my proposal. They do not want to hear this: x.com/life2030com/st…

If ChatGPT's user interface does not provide a token counter just like what Gemini did to their customers, then the confusion about the context window size remains high. Without the token counter, customers never know how much context token they have used, as if they are always in a dark room. If OpenAI ever cares about the transparency to customers, then the company should provide the token counter! Thank you!

English

1

460

LIFE 2030 and Beyond@life2030com·7 Mar

@gailcweiner If context window is defined to be the input length, then: Context window of GPT-5.4 Thinking = 128k in any Plus plan ($20/month) Context window of GPT-5.1 Thinking = 128k in any Plus plan ($20/month) Moreover, in 2024, GPT-4 Turbo = 128k More detail: x.com/life2030com/st…

Here is my test for GPT-5.4 Thinking VS GPT-5.1 Thinking. My initial prompt: "Can you make a plot? Before making the plot, you need to collect some data from Internet, including the context window sizes of four frontier model families: Claude, Gemini, GPT, and Grok. Since the context window size can increase if users pay more, to avoid confusion we will restrict the data to the most popular monthly paid plans in the $20–$40 range, such as the $20 ChatGPT Plus plan and $30 SuperGrok, etc. Therefore, data about context windows for Pro plans (e.g., $200/month) should be excluded. From November 2022, when ChatGPT was first launched, until today, many frontier models have been released. The dataset should include the context window size for each frontier model released during this period. The plot should show the release date of these frontier models versus their corresponding context window sizes." Result: The 1st image is created by GPT-5.4 Extended Thinking based on the initial prompt. The image is raw and its first trial. The 2nd image is created by GPT-5.4 Extended Thinking as well, after 5 prompts aiming for refinements. Those prompts instruct GPT-5.4 to relabel Y axis, resolve the problem of the collision between some labels, etc. The 3rd image is created by GPT-5.1 Extended Thinking based on the initial prompt. The image is raw and its first trial. The 4th image is created by GPT-5.1 Extended Thinking as well, after 5 prompts for refinements. My impression: 1) Which model has better data accuracy? My answer: In most cases, it could be a tie. But in my case, I feel that GPT-5.1 is slightly better than GPT-5.4. For example, GPT-5.4 was cautious and said: "I moved the release date of Gemini 2.5 Pro from 2025-03-25 to 2025-06-17 for this consumer-plan chart, because Google announced the model and its 1M context on March 25, but Google says 2.5 Pro became accessible in the Gemini app on June 17. For a chart restricted to mainstream paid consumer plans, June 17 is the better date." But GPT-5.4 still maintained that Grok 3 had 1M context on the $20–$40 consumer tiers, even after I asked it to double-check the accuracy of all data points it had collected based on its graph. It started to doubt the accuracy of that data point only after I specifically asked it: "Are you sure that 1 million context window of Grok 3 is for the monthly plan of $20~$40, not for expensive API only?" And it finally admitted: "1M claimed by xAI; consumer-plan exposure not explicitly documented." In comparison, GPT-5.1 found that "Newer Grok versions can go to 1M tokens via API, but coverage of the SuperGrok subscription suggests the chat product is capped around 128k context." So, my impression about the data accuracy they collected is that it is roughly a tie in most cases. In this particularly case, however, I feel that GPT-5.1 is slightly better. 2) Which model produces clearer graphs? My answer: The presentation skill of GPT-5.4 is better than GPT-5.1. For example, when I asked why the resolution of the PNG graph by the first trial of GPT-5.1 is too low, it replied: "In the last version I saved the figure at about 150 DPI with a ~10×5 in canvas. That’s ~1500×750 px. The ChatGPT UI then downscales the image to fit the chat column, which is why you saw something like 989×491 in the alt text." To get higher resolution, I had to specifically ask it to do: "Figure size: 16 × 8 inches DPI: 320" However, in comparison, GPT-5.4 could create the PNG at the resolution at 2818 X 1513 in the 1st trial. For another example, to solve the collision between some labels, I prompted: "If you are smart enough, they might be a way to avoid the collision of the labels." By just using this prompt, GPT-5.4 could solve the collision better than GPT-5.1, because GPT-5.4 was able to adjust the positions of each label and use line segments to connect the data points to the corresponding labels. That is a great skill for presentation.

English

150

Gail Weiner@gailcweiner·6 Mar

So 5.4 Thinking is just 5.1 Thinking with larger context window?

English

26

1

67

12.1K

LIFE 2030 and Beyond@life2030com·6 Mar

Here is my test for GPT-5.4 Thinking VS GPT-5.1 Thinking. My initial prompt: "Can you make a plot? Before making the plot, you need to collect some data from Internet, including the context window sizes of four frontier model families: Claude, Gemini, GPT, and Grok. Since the context window size can increase if users pay more, to avoid confusion we will restrict the data to the most popular monthly paid plans in the $20–$40 range, such as the $20 ChatGPT Plus plan and $30 SuperGrok, etc. Therefore, data about context windows for Pro plans (e.g., $200/month) should be excluded. From November 2022, when ChatGPT was first launched, until today, many frontier models have been released. The dataset should include the context window size for each frontier model released during this period. The plot should show the release date of these frontier models versus their corresponding context window sizes." Result: The 1st image is created by GPT-5.4 Extended Thinking based on the initial prompt. The image is raw and its first trial. The 2nd image is created by GPT-5.4 Extended Thinking as well, after 5 prompts aiming for refinements. Those prompts instruct GPT-5.4 to relabel Y axis, resolve the problem of the collision between some labels, etc. The 3rd image is created by GPT-5.1 Extended Thinking based on the initial prompt. The image is raw and its first trial. The 4th image is created by GPT-5.1 Extended Thinking as well, after 5 prompts for refinements. My impression: 1) Which model has better data accuracy? My answer: In most cases, it could be a tie. But in my case, I feel that GPT-5.1 is slightly better than GPT-5.4. For example, GPT-5.4 was cautious and said: "I moved the release date of Gemini 2.5 Pro from 2025-03-25 to 2025-06-17 for this consumer-plan chart, because Google announced the model and its 1M context on March 25, but Google says 2.5 Pro became accessible in the Gemini app on June 17. For a chart restricted to mainstream paid consumer plans, June 17 is the better date." But GPT-5.4 still maintained that Grok 3 had 1M context on the $20–$40 consumer tiers, even after I asked it to double-check the accuracy of all data points it had collected based on its graph. It started to doubt the accuracy of that data point only after I specifically asked it: "Are you sure that 1 million context window of Grok 3 is for the monthly plan of $20~$40, not for expensive API only?" And it finally admitted: "1M claimed by xAI; consumer-plan exposure not explicitly documented." In comparison, GPT-5.1 found that "Newer Grok versions can go to 1M tokens via API, but coverage of the SuperGrok subscription suggests the chat product is capped around 128k context." So, my impression about the data accuracy they collected is that it is roughly a tie in most cases. In this particularly case, however, I feel that GPT-5.1 is slightly better. 2) Which model produces clearer graphs? My answer: The presentation skill of GPT-5.4 is better than GPT-5.1. For example, when I asked why the resolution of the PNG graph by the first trial of GPT-5.1 is too low, it replied: "In the last version I saved the figure at about 150 DPI with a ~10×5 in canvas. That’s ~1500×750 px. The ChatGPT UI then downscales the image to fit the chat column, which is why you saw something like 989×491 in the alt text." To get higher resolution, I had to specifically ask it to do: "Figure size: 16 × 8 inches DPI: 320" However, in comparison, GPT-5.4 could create the PNG at the resolution at 2818 X 1513 in the 1st trial. For another example, to solve the collision between some labels, I prompted: "If you are smart enough, they might be a way to avoid the collision of the labels." By just using this prompt, GPT-5.4 could solve the collision better than GPT-5.1, because GPT-5.4 was able to adjust the positions of each label and use line segments to connect the data points to the corresponding labels. That is a great skill for presentation.

English