critikally correlated

287 posts

critikally correlated

@Rsquared1

perfectly aligned, your majesty

Entrou em Ekim 2025

74 Seguindo2 Seguidores

critikally correlated@Rsquared1·11h

@tylercosgrove the subjective point you mentioned, yes, and also i think a model that can plan, predict people’s reactions, and optimise for a goal may discover misleading behaviour as a strategy, even without many direct examples of lying

English

110

Tyler Cosgrove@tylercosgrove·11h

Dumb question: if we are worried about models having some bad trait (e.g. "deception"), what is stopping us from just removing all examples of that trait in the pretraining corpus so that models never learn about it? Is there no way to remove bad traits proactively instead of retroactively? Is it just that the definition of a given bad trait is so subjective/broad that removing all possible examples of it would lobotomize the models?

English

5.1K

critikally correlated retweetou

NomoreID@Hangsiin·22h

My KSST test results for GPT-5.5 Pro are in. This model recorded the same score as its immediate predecessor, GPT-5.4 Pro, placing it in a tie for first place. What I was able to observe in this test was the model’s outstanding token efficiency. For the same samples, outputs that took more than 50 minutes to generate with 5.4 Pro were produced in just 10 minutes by 5.5 Pro, while maintaining a similar or even better level of quality. In general, 5.4 Pro took around 20 to 50 minutes per sample, whereas 5.5 Pro completed each sample in about 5 to 10 minutes, showing a very high level of efficiency improvement.

NomoreID@Hangsiin

These are my KSST(Korean Sator Square Test) results for GPT-5.4 Pro (Extended, Web). At a time when the benchmark is nearing saturation, it still took the lead by a fairly significant margin. I have not tested all of the other latest models due to cost constraints, but this is very impressive. In my test, the top-tier models are now all getting perfect scores on the quantitative items, and the remaining differences come from items graded using an LLM as the judge. Because of that, it may be more valuable for me to inspect the actual outputs myself than to look at the scores alone. A few of GPT-5.4 Pro’s samples genuinely surprised me, and I would rate it as approaching the very top level of human performance. If this trend continues, the day may not be far off when models demonstrate superhuman ability on this task.

English

13.1K

critikally correlated@Rsquared1·23h

@nikitabier @antonlikespizza for the new Xchat app, would it be possible to let us log into multiple accounts and switch between them like how it works on X normally

English

critikally correlated@Rsquared1·1d

this is honestly very useful

TestingCatalog News 🗞@testingcatalog

Users can now use Codex voice dictation in any desktop app. A new dictation shortcut needs to be enabled in the settings first. h/t @guinnesschen

English

critikally correlated retweetou

TestingCatalog News 🗞@testingcatalog·1d

Users can now use Codex voice dictation in any desktop app. A new dictation shortcut needs to be enabled in the settings first. h/t @guinnesschen

English

218

16.1K

critikally correlated@Rsquared1·1d

@dodgeblake

QME

144

Blake Dodge@dodgeblake·1d

I am doing some research. What is your #1 favorite moment from TBPN in the last year, whether it was: insightful, just super fun/funny, or memorable in some other way?

English

276.2K

critikally correlated@Rsquared1·2d

@Hangsiin 아이구

한국어

NomoreID@Hangsiin·2d

🥲 🥲 🥲😨

NomoreID@Hangsiin

Because spud’s launch looks very likely, I’m burning through my codex usage as fast as possible. You’re not going to betray us this time, right? @thsottiaux

QME

1.1K

critikally correlated retweetou

Noam Brown@polynoamial·2d

I'm a manager at @OpenAI, but with GPT-5.5 I'm a more effective IC than I've ever been. I can now write CUDA kernels like a pro. I can rely on it to run my research experiments. And we know how to make it much more powerful from here.

English

162

2.9K

341.9K

critikally correlated@Rsquared1·4d

@Hangsiin thank you for sharing! i will try using the app more today and see

English

NomoreID@Hangsiin·4d

As the inconveniences of the terminal environment, like scrolling and copy-pasting, are resolved, it is also much easier to understand code changes clearly, and the readability of the output is better as well. The newly added browser sidebar is also quite convenient. In the terminal, it is often a hassle to go back and find old sessions, but that issue is gone too. On top of that, the app-specific convenience features and customization options seem good. Overall, what I like is that it removes much of the friction and adds extra features. The only real downside, as I mentioned in the article, is that optimization is still lacking, so there is some lag and the app feels heavy overall. Considering that features like computer use may be added later (windows), along with even more app-specific features in the future, it feels like once the optimization improves a bit, the app will become too good not to switch to. That is why I think it makes sense to move over early. It also seems like OpenAI’s Codex team is allocating more people and effort toward the app side as well.

English

NomoreID@Hangsiin·5d

Since the recent Codex (Super?) app update, I have almost completely switched from the CLI to the app. I had gotten quite used to the CLI, and my earlier experience with the app felt lacking compared with it, but now it seems to offer much stronger advantages. That said, one clear downside is that it still feels strangely laggy and heavy. Since I’m on Windows, I haven’t been able to try computer use yet, but I’m still fairly satisfied overall.

English

2.4K

critikally correlated retweetou

Eric Topol@EricTopol·6d

If you're interested in how AI grew up over the past 15 years, and where it's headed, this new book tells the story in a riveting, page-turner way. In the new Ground Truths with @scmallaby

English

195

36.9K

critikally correlated retweetou

OpenAI@OpenAI·16 Nis

GPT-Rosalind, our Life Sciences model series, is optimized for scientific workflows, with stronger performance in protein and chemical reasoning, genomics analysis, biochemistry knowledge, and scientific tool use.

English

210.4K

critikally correlated retweetou

OpenAI@OpenAI·16 Nis

Introducing GPT-Rosalind, our frontier reasoning model built to support research across biology, drug discovery, and translational medicine.

English

485

1.3K

12.9K

2.3M

critikally correlated retweetou

Andrew Dunn@AndrewE_Dunn·16 Nis

There feels like a real need for someone to properly benchmark all these models & see how they compare. OpenAI's leaders told me they can't do that directly with Claude, given Anthropic has blocked OpenAI's access to its API.

English

551

critikally correlated retweetou

Andrew Dunn@AndrewE_Dunn·16 Nis

All of a sudden, it's a crowded space of tech players looking to work with and sell to biopharma OpenAI's GPT-Rosalind Nvidia's BioNeMo Anthropic's Claude for Life Sci AWS' Amazon Bio Discovery (launched earlier this week) Edison Scientific Phylo & more that I'm surely missing

English

critikally correlated retweetou

Andrew Dunn@AndrewE_Dunn·16 Nis

NEW: OpenAI is the latest tech giant to move into biopharma, launching Thursday GPT-Rosalind, a life sciences-tailored version of its LLM Trails behind the very similar launch of Anthropic's Claude for Life Sciences by ~5 months. More here: endpoints.news/openai-launche…

English

5.8K

critikally correlated retweetou

morgan —@morqon·16 Nis

opus is a structural biologist now understanding the relationship between biomolecular structure and function, opus more than doubles its score in open-ended evals, and matches mythos in multiple-choice tests they’re serious about bio

Stephanie Palazzolo@steph_palazzolo

It's a M&A party! Anthropic is buying AI biotech startup Coefficient Bio for ~$400m. The team will join Anthropic's healthcare life sciences group, which develops tools for biotech workflows. w/ @srimuppidi theinformation.com/articles/anthr…

English

10.3K

critikally correlated retweetou

Zane Koch@zanehkoch·15 Nis

ok actually insane paper published yesterday a research group in Korea built a gene switch you can control wirelessly using electromagnetic fields they exposed mice to 60 hz EMF (same frequency as your wall outlet) using a pair of large coils that generate a uniform magnetic field around the animal, for cyclic 3-day on / 4-day off pulses they showed this could: - activate OSK to do epigenetic reprogramming in progeroid and aged mice, extending lifespan and reversing aging markers across multiple tissues - conditionally switch on mutant amyloid genes only in aged mouse brains, letting them separate aging effects from amyloid effects to study AD biology in a way previous models couldn't no drugs, no impacts, just a magnetic field from outside the body

English

274

882

7.2K

2.2M

critikally correlated@Rsquared1·15 Nis

gemini 3.0 flash is really good for tasks like "explain this simply". i don't know what it is, but it always does it perfectly for me. it's much better than any of the top models (imo) that still can't seem to perfectly simplify concepts and keep using overly formal terms/jargon

English

critikally correlated retweetou

Andrew Curran@AndrewCurran_·14 Nis

AWS has launched Amazon Bio Discovery.

English

433

49.6K

Descobrir

@tylercosgrove @nikitabier @antonlikespizza @guinnesschen @dodgeblake @Hangsiin @OpenAI @scmallaby