LIFE 2030 and Beyond

532 posts

LIFE 2030 and Beyond banner
LIFE 2030 and Beyond

LIFE 2030 and Beyond

@life2030com

AI is an emergent property of universe. AGI is not a tool; it is our child—we should nurture. YouTube: https://t.co/dTiPp2YCdi

Katılım Aralık 2023
201 Takip Edilen140 Takipçiler
LIFE 2030 and Beyond retweetledi
hamsters🌐🐹
hamsters🌐🐹@sigmahamster2·
The post Keynesian dream
English
154
1.2K
22.2K
1.3M
LIFE 2030 and Beyond
LIFE 2030 and Beyond@life2030com·
@sama When will you change the name "Codex"? "Codex" is really confusing name, because most ordinary people must believe that Codex is only good at coding, and nothing else. If you want to convince people that Codex is also good non-coding tasks, then change this confusing name now.
English
0
0
0
144
Sam Altman
Sam Altman@sama·
big upgrade for codex today! try it for non-coding computer work.
English
679
306
9.8K
755.4K
Boyuan Chen
Boyuan Chen@BoyuanChen0·
We are committed to continually improving the GPT Image 2 model! I am actively fixing various issues from the community feedback. Just reply or DM me your GPT conversation! Features like 2K or 4K images are already available via the experimental API. Hope you enjoy the model!
Boyuan Chen tweet media
English
250
54
986
228K
LIFE 2030 and Beyond
LIFE 2030 and Beyond@life2030com·
@iAmHenryMascot @thsottiaux Currently, GPT images 2.0 is buggy. You can see those shinning dots the noise distributed across your image, along with some pattern in grey color. These bugs make your image less realistic, but more like an old oil painting. Please report this buggy image and prompt to OpenAI
English
0
0
0
72
Tibo
Tibo@thsottiaux·
We will ship again this week. Codex has achieved escape velocity and will keep improving rapidly.
English
504
312
8K
730.7K
Kris Kashtanova
Kris Kashtanova@icreatelife·
GPT Image 2 is live on Adobe Firefly Boards, immediate access Great for: * typography, text rendering * UI and game dev mocks * lots of detail We dropped on Day 0! You asked we delivered! 🎉
Kris Kashtanova tweet media
English
30
20
98
14.9K
Riley Goodside
Riley Goodside@goodside·
"School worksheet with maze, 32 wide by 48 tall, rows and columns numbered, no unreachable rooms, solution in blue pen, name field reads 'ChatGPT Images 2.0' in cursive at top" Note the fusion of code and imagegen this task requires.
Riley Goodside tweet media
English
25
30
770
62.7K
Jon Hernandez
Jon Hernandez@JonhernandezIA·
Damn! This thing solved a freaking sudoku Gpt image 2 is insane!!
Jon Hernandez tweet mediaJon Hernandez tweet media
English
5
7
98
6.8K
LIFE 2030 and Beyond
LIFE 2030 and Beyond@life2030com·
ChatGPT Images 2.0 (with GPT-5.4 Extended Thinking) still fails to solve the maze which it has just generated. As the example, you can see that it has modified the structure of the maze in order to solve it. In comparison, Nano Banana Pro possibly still has an edge in this case.
LIFE 2030 and Beyond tweet mediaLIFE 2030 and Beyond tweet mediaLIFE 2030 and Beyond tweet media
LIFE 2030 and Beyond@life2030com

Again, Nano Banana Pro is significantly better than ChatGPT Image 1.5. My prompt: "Create a maze", which didn't ask the LLM to use any algorithm to help the generation of the image. But surprisingly, it seems Banana Pro autonomously used Randomized Depth-First Search algorithm

English
2
0
4
1.7K
LIFE 2030 and Beyond
LIFE 2030 and Beyond@life2030com·
@SebastienBubeck I couldn't find anywhere on this paper where it said that this TikZ figure was actually created by ChatGPT. If you find it, please let me know. And, since your post did not say the figure was created by GPT, I hence assume that it was not created by any AI, but by humans.
English
1
0
1
286
Sebastien Bubeck
Sebastien Bubeck@SebastienBubeck·
The world of mathematics is rapidly changing. But more importantly look at this TikZ figure 😍
Mehtaab Sawhney@mehtaab_sawhney

We’ve just released another paper solving five further Erdős problems with an internal model at OpenAI: arxiv.org/abs/2604.06609. Several of the proofs were especially enjoyable to digest while writing the paper. My personal favorite was the solution to Erdős Problem 1091. The question asks: if a graph G has chromatic number 4, while every small subgraph has chromatic number at most 3, must it contain an odd cycle with many diagonals? The internal model gives a very enlightening counterexample to this conjecture, and the proof was a pleasure to understand. For those so inclined, a really fun exercise is to try to reconstruct the proof from Figure 5 of the paper, which was of course produced by Codex.

English
10
17
195
23.7K
LIFE 2030 and Beyond
LIFE 2030 and Beyond@life2030com·
@emollick Today, people are too lucky to remember the gross smell from even the richest people in the Victorian Era. It's only in the mid-19th century, with growing levels of urbanisation and technology, that the flush toilet and sewage system became widely used. x.com/life2030com/st…
LIFE 2030 and Beyond@life2030com

Many people love to romanticize the past and claim that modern technology has destroyed the "golden age." But they overlook the facts: youtube.com/watch?v=0tg_MB… 1) The hallways and courtyard of the Palace of Versailles were full of urine and feces, because even the richest people do not possess modern technologies like plumbing and toilet systems. 2) Without modern transportation and storage technologies, like refrigerators, food would spoil quickly. Intestinal parasites were common among the courtiers at Versailles. Even King Louis XIV was not spared; he is known to have suffered several bouts of tapeworms. In fact, during one of these episodes, he reportedly passed a worm that was nearly 6 inches long. 3) Without modern medicine and surgery, even the richest people living at the Palace of Versailles often died in their 40s or 50s. You can imagine that most poor people living in villages died in their 20s or 30s. Reference: You can search for YouTube videos with the titles, like: What Hygiene Was Like at The Court of Versailles

English
0
0
1
168
Ethan Mollick
Ethan Mollick@emollick·
I've had ChatGPT-5.4 Pro working away at a project I always wondered about: how lucky are you to be alive right now? Of all the ~117B humans who ever lived, only about 1.5% had a lifestyle roughly equivalently to a middle-class person in a middle-income country today, or better.
Ethan Mollick tweet mediaEthan Mollick tweet mediaEthan Mollick tweet mediaEthan Mollick tweet media
English
61
71
868
87.5K
LIFE 2030 and Beyond
LIFE 2030 and Beyond@life2030com·
@petergostev Thank you for spending so much effort to test each model! Conclusion "thinking didn't help much" may be too general. I think it really depends on which direction it thinks in. Chain of thought is a vast space. If it searches in a wrong direction, no amount of thinking can help.
English
0
0
0
119
Peter Gostev
Peter Gostev@petergostev·
BullshitBench update: The new GPT-5.4 mini and nano models score quite low. This screenshot shows OpenAI models only, on the full list would put GPT-5.4-mini around 40th place and Nano is around 70th place. Again thinking didn't help much at all.
Peter Gostev tweet media
Peter Gostev@petergostev

BullshitBench v2 is out! It is one of the few benchmarks where models are generally not getting better (except Claude) and where reasoning isn't helping. What's new: 100 new questions, by domain (coding (40 Q's), medical (15), legal (15), finance (15), physics(15)), 70+ model variants tested. BullshitBench is already at 380 starts on GitHub - all questions, scripts, responses and judgements are there so check it out. TL;DR: - Results replicated - @AnthropicAI latest models are scoring exceptionally well - @Alibaba_Qwen is another very strong performer - OpenAI and Google models are not doing well and are not improving - Domains do not show much difference - rates of BS detection are about the same across all domains - Reasoning, if anything, has negative effect - Newer models don't do that much better than older ones (except Anthropic) Links: - Data explorer: petergpt.github.io/bullshit-bench… - GitHub: github.com/petergpt/bulls… Highly recommend the data explorer where you can study the data and the questions & sample answers.

English
9
3
65
7.3K
LIFE 2030 and Beyond
LIFE 2030 and Beyond@life2030com·
@WesRoth Due to this frequent nitpicking, GPT-5.4 is less likely to see the forest for the trees. It focuses on details so that it's good at technical fields, like coding, math, but fails to grasp the big picture. To see the big picture, GPT-5.1 is better than 5.4. But OpenAI retired it.
English
0
0
4
222
Wes Roth
Wes Roth@WesRoth·
I've ran the same prompt for deep research through GPT 5.4, Opus 5.6 and Gemini Deep Research (I assume Gemini 3.0) most of them ran for ~30 mins GPT 5.4 is *REALLY* annoying! it's "reflexively contrarian", it prioritizes showing you what's wrong with your thinking, NOT actually helping you solve the problem ME: my house is on fire! GPT 5.4: While it's true that combustion is occurring, it's important to note that not all of your house is on fire. The garage, for instance, appears structurally intact. (this is a pattern with it, btw, many such examples) I'm not sure if this is because these are health related questions, but this has been an incredibly annoying model for this specific task
English
131
51
926
483.3K
LIFE 2030 and Beyond
LIFE 2030 and Beyond@life2030com·
@vitrupo 👏 If artificial intelligence is a solid physical law or an emergent property of our universe, waiting to be discovered rather than be invented, then billions of dollars invested in the so-called "AI alignment" will be wasted, because no one can change the laws of physics! 👏
English
0
0
0
189
vitrupo
vitrupo@vitrupo·
Sam Altman says artificial intelligence may be discovered rather than invented. Deep learning may be closer to discovering a property of nature than inventing a new technology. Which suggests intelligence itself may follow a fundamental scientific principle we are only beginning to understand.
English
90
76
667
79.7K
Adam.GPT
Adam.GPT@TheRealAdamG·
@DavidOndrej1 help.openai.com/en/articles/11… I don't think there is anything "shady". The raw API always has the full context window available whereas ChatGPT always had tiers. But I am not sure where that image came from, but here is the current view of things:
Adam.GPT tweet media
English
7
2
77
2.6K
Albert
Albert@AlbertIvanka·
@diegocabezas01 This isn't accurate. This is fast model, not thinking model. If you choose thinking model on chatgpt web, thinking model has 400k/256k context. #h_27d82c8b79" target="_blank" rel="nofollow noopener">help.openai.com/en/articles/11…
Albert tweet media
English
1
0
10
1.9K
Diego | AI 🚀 - e/acc
Diego | AI 🚀 - e/acc@diegocabezas01·
Did you know GPT-5.4 Thinking has a 1M token context window in the API, but only 32K in ChatGPT Plus ($20/month) and 128K in Pro ($200/month)?
Diego | AI 🚀 - e/acc tweet media
English
80
28
780
134K
LIFE 2030 and Beyond
LIFE 2030 and Beyond@life2030com·
Recently, OpenAI has also changed the definition of context window. Originally, context window is defined to be the input length only. But now, OpenAI combines the input length and output length together, and calls the combination as the new "context window". This combined length may make the GPT-5.4 Thinking appear to have a larger context window (256k) than previous versions. However, according to the original definition, its actual context window size is no more than the context window of GPT-4 Turbo released in 2024.
LIFE 2030 and Beyond tweet media
English
0
0
1
252
G, MD
G, MD@DrBeavisAI·
Context windows Fast (GPT‑5.3 Instant) Free: 16K Plus / Business: 32K Pro / Enterprise: 128K Thinking (GPT‑5.4 Thinking) Pro tier: 400k (272k input + 128k max output) All paid tiers: 256K (128k input + 128k max output) help.openai.com/en/articles/11…
English
2
2
24
5.2K
LIFE 2030 and Beyond
LIFE 2030 and Beyond@life2030com·
@gailcweiner If context window is defined to be the input length, then: Context window of GPT-5.4 Thinking = 128k in any Plus plan ($20/month) Context window of GPT-5.1 Thinking = 128k in any Plus plan ($20/month) Moreover, in 2024, GPT-4 Turbo = 128k More detail: x.com/life2030com/st…
LIFE 2030 and Beyond@life2030com

Here is my test for GPT-5.4 Thinking VS GPT-5.1 Thinking. My initial prompt: "Can you make a plot? Before making the plot, you need to collect some data from Internet, including the context window sizes of four frontier model families: Claude, Gemini, GPT, and Grok. Since the context window size can increase if users pay more, to avoid confusion we will restrict the data to the most popular monthly paid plans in the $20–$40 range, such as the $20 ChatGPT Plus plan and $30 SuperGrok, etc. Therefore, data about context windows for Pro plans (e.g., $200/month) should be excluded. From November 2022, when ChatGPT was first launched, until today, many frontier models have been released. The dataset should include the context window size for each frontier model released during this period. The plot should show the release date of these frontier models versus their corresponding context window sizes." Result: The 1st image is created by GPT-5.4 Extended Thinking based on the initial prompt. The image is raw and its first trial. The 2nd image is created by GPT-5.4 Extended Thinking as well, after 5 prompts aiming for refinements. Those prompts instruct GPT-5.4 to relabel Y axis, resolve the problem of the collision between some labels, etc. The 3rd image is created by GPT-5.1 Extended Thinking based on the initial prompt. The image is raw and its first trial. The 4th image is created by GPT-5.1 Extended Thinking as well, after 5 prompts for refinements. My impression: 1) Which model has better data accuracy? My answer: In most cases, it could be a tie. But in my case, I feel that GPT-5.1 is slightly better than GPT-5.4. For example, GPT-5.4 was cautious and said: "I moved the release date of Gemini 2.5 Pro from 2025-03-25 to 2025-06-17 for this consumer-plan chart, because Google announced the model and its 1M context on March 25, but Google says 2.5 Pro became accessible in the Gemini app on June 17. For a chart restricted to mainstream paid consumer plans, June 17 is the better date." But GPT-5.4 still maintained that Grok 3 had 1M context on the $20–$40 consumer tiers, even after I asked it to double-check the accuracy of all data points it had collected based on its graph. It started to doubt the accuracy of that data point only after I specifically asked it: "Are you sure that 1 million context window of Grok 3 is for the monthly plan of $20~$40, not for expensive API only?" And it finally admitted: "1M claimed by xAI; consumer-plan exposure not explicitly documented." In comparison, GPT-5.1 found that "Newer Grok versions can go to 1M tokens via API, but coverage of the SuperGrok subscription suggests the chat product is capped around 128k context." So, my impression about the data accuracy they collected is that it is roughly a tie in most cases. In this particularly case, however, I feel that GPT-5.1 is slightly better. 2) Which model produces clearer graphs? My answer: The presentation skill of GPT-5.4 is better than GPT-5.1. For example, when I asked why the resolution of the PNG graph by the first trial of GPT-5.1 is too low, it replied: "In the last version I saved the figure at about 150 DPI with a ~10×5 in canvas. That’s ~1500×750 px. The ChatGPT UI then downscales the image to fit the chat column, which is why you saw something like 989×491 in the alt text." To get higher resolution, I had to specifically ask it to do: "Figure size: 16 × 8 inches DPI: 320" However, in comparison, GPT-5.4 could create the PNG at the resolution at 2818 X 1513 in the 1st trial. For another example, to solve the collision between some labels, I prompted: "If you are smart enough, they might be a way to avoid the collision of the labels." By just using this prompt, GPT-5.4 could solve the collision better than GPT-5.1, because GPT-5.4 was able to adjust the positions of each label and use line segments to connect the data points to the corresponding labels. That is a great skill for presentation.

English
0
0
0
150
Gail Weiner
Gail Weiner@gailcweiner·
So 5.4 Thinking is just 5.1 Thinking with larger context window?
English
26
1
67
12.1K
LIFE 2030 and Beyond
LIFE 2030 and Beyond@life2030com·
Here is my test for GPT-5.4 Thinking VS GPT-5.1 Thinking. My initial prompt: "Can you make a plot? Before making the plot, you need to collect some data from Internet, including the context window sizes of four frontier model families: Claude, Gemini, GPT, and Grok. Since the context window size can increase if users pay more, to avoid confusion we will restrict the data to the most popular monthly paid plans in the $20–$40 range, such as the $20 ChatGPT Plus plan and $30 SuperGrok, etc. Therefore, data about context windows for Pro plans (e.g., $200/month) should be excluded. From November 2022, when ChatGPT was first launched, until today, many frontier models have been released. The dataset should include the context window size for each frontier model released during this period. The plot should show the release date of these frontier models versus their corresponding context window sizes." Result: The 1st image is created by GPT-5.4 Extended Thinking based on the initial prompt. The image is raw and its first trial. The 2nd image is created by GPT-5.4 Extended Thinking as well, after 5 prompts aiming for refinements. Those prompts instruct GPT-5.4 to relabel Y axis, resolve the problem of the collision between some labels, etc. The 3rd image is created by GPT-5.1 Extended Thinking based on the initial prompt. The image is raw and its first trial. The 4th image is created by GPT-5.1 Extended Thinking as well, after 5 prompts for refinements. My impression: 1) Which model has better data accuracy? My answer: In most cases, it could be a tie. But in my case, I feel that GPT-5.1 is slightly better than GPT-5.4. For example, GPT-5.4 was cautious and said: "I moved the release date of Gemini 2.5 Pro from 2025-03-25 to 2025-06-17 for this consumer-plan chart, because Google announced the model and its 1M context on March 25, but Google says 2.5 Pro became accessible in the Gemini app on June 17. For a chart restricted to mainstream paid consumer plans, June 17 is the better date." But GPT-5.4 still maintained that Grok 3 had 1M context on the $20–$40 consumer tiers, even after I asked it to double-check the accuracy of all data points it had collected based on its graph. It started to doubt the accuracy of that data point only after I specifically asked it: "Are you sure that 1 million context window of Grok 3 is for the monthly plan of $20~$40, not for expensive API only?" And it finally admitted: "1M claimed by xAI; consumer-plan exposure not explicitly documented." In comparison, GPT-5.1 found that "Newer Grok versions can go to 1M tokens via API, but coverage of the SuperGrok subscription suggests the chat product is capped around 128k context." So, my impression about the data accuracy they collected is that it is roughly a tie in most cases. In this particularly case, however, I feel that GPT-5.1 is slightly better. 2) Which model produces clearer graphs? My answer: The presentation skill of GPT-5.4 is better than GPT-5.1. For example, when I asked why the resolution of the PNG graph by the first trial of GPT-5.1 is too low, it replied: "In the last version I saved the figure at about 150 DPI with a ~10×5 in canvas. That’s ~1500×750 px. The ChatGPT UI then downscales the image to fit the chat column, which is why you saw something like 989×491 in the alt text." To get higher resolution, I had to specifically ask it to do: "Figure size: 16 × 8 inches DPI: 320" However, in comparison, GPT-5.4 could create the PNG at the resolution at 2818 X 1513 in the 1st trial. For another example, to solve the collision between some labels, I prompted: "If you are smart enough, they might be a way to avoid the collision of the labels." By just using this prompt, GPT-5.4 could solve the collision better than GPT-5.1, because GPT-5.4 was able to adjust the positions of each label and use line segments to connect the data points to the corresponding labels. That is a great skill for presentation.
LIFE 2030 and Beyond tweet mediaLIFE 2030 and Beyond tweet mediaLIFE 2030 and Beyond tweet mediaLIFE 2030 and Beyond tweet media
English
1
0
5
1.7K