critikally correlated

287 posts

critikally correlated banner
critikally correlated

critikally correlated

@Rsquared1

perfectly aligned, your majesty

Entrou em Ekim 2025
74 Seguindo2 Seguidores
critikally correlated
critikally correlated@Rsquared1·
@tylercosgrove the subjective point you mentioned, yes, and also i think a model that can plan, predict people’s reactions, and optimise for a goal may discover misleading behaviour as a strategy, even without many direct examples of lying
English
0
0
2
110
Tyler Cosgrove
Tyler Cosgrove@tylercosgrove·
Dumb question: if we are worried about models having some bad trait (e.g. "deception"), what is stopping us from just removing all examples of that trait in the pretraining corpus so that models never learn about it? Is there no way to remove bad traits proactively instead of retroactively? Is it just that the definition of a given bad trait is so subjective/broad that removing all possible examples of it would lobotomize the models?
English
17
1
39
5.1K
critikally correlated retweetou
NomoreID
NomoreID@Hangsiin·
My KSST test results for GPT-5.5 Pro are in. This model recorded the same score as its immediate predecessor, GPT-5.4 Pro, placing it in a tie for first place. What I was able to observe in this test was the model’s outstanding token efficiency. For the same samples, outputs that took more than 50 minutes to generate with 5.4 Pro were produced in just 10 minutes by 5.5 Pro, while maintaining a similar or even better level of quality. In general, 5.4 Pro took around 20 to 50 minutes per sample, whereas 5.5 Pro completed each sample in about 5 to 10 minutes, showing a very high level of efficiency improvement.
NomoreID tweet media
NomoreID@Hangsiin

These are my KSST(Korean Sator Square Test) results for GPT-5.4 Pro (Extended, Web). At a time when the benchmark is nearing saturation, it still took the lead by a fairly significant margin. I have not tested all of the other latest models due to cost constraints, but this is very impressive. In my test, the top-tier models are now all getting perfect scores on the quantitative items, and the remaining differences come from items graded using an LLM as the judge. Because of that, it may be more valuable for me to inspect the actual outputs myself than to look at the scores alone. A few of GPT-5.4 Pro’s samples genuinely surprised me, and I would rate it as approaching the very top level of human performance. If this trend continues, the day may not be far off when models demonstrate superhuman ability on this task.

English
4
10
93
13.1K
critikally correlated retweetou
TestingCatalog News 🗞
TestingCatalog News 🗞@testingcatalog·
Users can now use Codex voice dictation in any desktop app. A new dictation shortcut needs to be enabled in the settings first. h/t @guinnesschen
English
8
14
218
16.1K
Blake Dodge
Blake Dodge@dodgeblake·
I am doing some research. What is your #1 favorite moment from TBPN in the last year, whether it was: insightful, just super fun/funny, or memorable in some other way?
English
36
4
79
276.2K
critikally correlated retweetou
Noam Brown
Noam Brown@polynoamial·
I'm a manager at @OpenAI, but with GPT-5.5 I'm a more effective IC than I've ever been. I can now write CUDA kernels like a pro. I can rely on it to run my research experiments. And we know how to make it much more powerful from here.
Noam Brown tweet media
English
99
162
2.9K
341.9K
NomoreID
NomoreID@Hangsiin·
As the inconveniences of the terminal environment, like scrolling and copy-pasting, are resolved, it is also much easier to understand code changes clearly, and the readability of the output is better as well. The newly added browser sidebar is also quite convenient. In the terminal, it is often a hassle to go back and find old sessions, but that issue is gone too. On top of that, the app-specific convenience features and customization options seem good. Overall, what I like is that it removes much of the friction and adds extra features. The only real downside, as I mentioned in the article, is that optimization is still lacking, so there is some lag and the app feels heavy overall. Considering that features like computer use may be added later (windows), along with even more app-specific features in the future, it feels like once the optimization improves a bit, the app will become too good not to switch to. That is why I think it makes sense to move over early. It also seems like OpenAI’s Codex team is allocating more people and effort toward the app side as well.
English
1
0
1
60
NomoreID
NomoreID@Hangsiin·
Since the recent Codex (Super?) app update, I have almost completely switched from the CLI to the app. I had gotten quite used to the CLI, and my earlier experience with the app felt lacking compared with it, but now it seems to offer much stronger advantages. That said, one clear downside is that it still feels strangely laggy and heavy. Since I’m on Windows, I haven’t been able to try computer use yet, but I’m still fairly satisfied overall.
English
2
0
23
2.4K
critikally correlated retweetou
Eric Topol
Eric Topol@EricTopol·
If you're interested in how AI grew up over the past 15 years, and where it's headed, this new book tells the story in a riveting, page-turner way. In the new Ground Truths with @scmallaby
Eric Topol tweet media
English
5
31
195
36.9K
critikally correlated retweetou
OpenAI
OpenAI@OpenAI·
GPT-Rosalind, our Life Sciences model series, is optimized for scientific workflows, with stronger performance in protein and chemical reasoning, genomics analysis, biochemistry knowledge, and scientific tool use.
OpenAI tweet media
English
18
66
1K
210.4K
critikally correlated retweetou
OpenAI
OpenAI@OpenAI·
Introducing GPT-Rosalind, our frontier reasoning model built to support research across biology, drug discovery, and translational medicine.
English
485
1.3K
12.9K
2.3M
critikally correlated retweetou
Andrew Dunn
Andrew Dunn@AndrewE_Dunn·
There feels like a real need for someone to properly benchmark all these models & see how they compare. OpenAI's leaders told me they can't do that directly with Claude, given Anthropic has blocked OpenAI's access to its API.
English
3
1
8
551
critikally correlated retweetou
Andrew Dunn
Andrew Dunn@AndrewE_Dunn·
All of a sudden, it's a crowded space of tech players looking to work with and sell to biopharma OpenAI's GPT-Rosalind Nvidia's BioNeMo Anthropic's Claude for Life Sci AWS' Amazon Bio Discovery (launched earlier this week) Edison Scientific Phylo & more that I'm surely missing
English
1
1
16
1K
critikally correlated retweetou
Andrew Dunn
Andrew Dunn@AndrewE_Dunn·
NEW: OpenAI is the latest tech giant to move into biopharma, launching Thursday GPT-Rosalind, a life sciences-tailored version of its LLM Trails behind the very similar launch of Anthropic's Claude for Life Sciences by ~5 months. More here: endpoints.news/openai-launche…
English
1
7
58
5.8K
critikally correlated retweetou
morgan —
morgan —@morqon·
opus is a structural biologist now understanding the relationship between biomolecular structure and function, opus more than doubles its score in open-ended evals, and matches mythos in multiple-choice tests they’re serious about bio
morgan — tweet media
Stephanie Palazzolo@steph_palazzolo

It's a M&A party! Anthropic is buying AI biotech startup Coefficient Bio for ~$400m. The team will join Anthropic's healthcare life sciences group, which develops tools for biotech workflows. w/ @srimuppidi theinformation.com/articles/anthr…

English
1
9
76
10.3K
critikally correlated retweetou
Zane Koch
Zane Koch@zanehkoch·
ok actually insane paper published yesterday a research group in Korea built a gene switch you can control wirelessly using electromagnetic fields they exposed mice to 60 hz EMF (same frequency as your wall outlet) using a pair of large coils that generate a uniform magnetic field around the animal, for cyclic 3-day on / 4-day off pulses they showed this could: - activate OSK to do epigenetic reprogramming in progeroid and aged mice, extending lifespan and reversing aging markers across multiple tissues - conditionally switch on mutant amyloid genes only in aged mouse brains, letting them separate aging effects from amyloid effects to study AD biology in a way previous models couldn't no drugs, no impacts, just a magnetic field from outside the body
Zane Koch tweet media
English
274
882
7.2K
2.2M
critikally correlated
critikally correlated@Rsquared1·
gemini 3.0 flash is really good for tasks like "explain this simply". i don't know what it is, but it always does it perfectly for me. it's much better than any of the top models (imo) that still can't seem to perfectly simplify concepts and keep using overly formal terms/jargon
English
0
0
0
9
critikally correlated retweetou
Andrew Curran
Andrew Curran@AndrewCurran_·
AWS has launched Amazon Bio Discovery.
Andrew Curran tweet media
English
6
48
433
49.6K