Elle Najt

11 posts

Elle Najt

@latentfool

nyc / sf Katılım Ocak 2022

44 Takip Edilen20 Takipçiler

Elle Najt@latentfool·21 Nis

@lumpenspace @AsaCoopStick ... and that undermines the safety implications of CoT uncontrollability.

English

Elle Najt@latentfool·21 Nis

@lumpenspace @AsaCoopStick CoT uncontrollability implies agents can't prevent schemes from appearing in CoT, bc if they cannot do CoT in lowercase they cannot do obfuscated CoT. But they can reason in the response in lower case, and we show that the reasoning can be moved there at low accuracy cost...

English

mc lumps ⏹️❗️ 🔨⏱️@lumpenspace·21 Nis

what do you mean dodge cot monitors lol the output is even more monitorable

Asa Cooper Stickland@AsaCoopStick

Safety implications: a scheming model could use this to dodge CoT monitors at modest accuracy cost. But of course we show models can be prompted to do this, not that they *would* do it autonomously.

English

1.9K

Elle Najt@latentfool·21 Nis

@AsaCoopStick Improved plot + prompt for this one, showing that GPT-5.4 can reason about the decoy topic and still reason in the response.

English

Asa Cooper Stickland@AsaCoopStick·20 Nis

These early exit reasoning traces would be trivially detectable by a simple length check. But we find models can fill their CoT with plausible-looking decoy topic (discussing whether there are more circles or squares in the world), while displacing real reasoning into their outputs.

English

1.8K

Asa Cooper Stickland@AsaCoopStick·20 Nis

New post! OpenAI/Yueh-Han et al. (2026) find models struggle to control their CoT -- good news for monitoring. But we show that GPT 5.4, Gemini 3.1 Pro, and Claude Opus 4.6 can all “early exit” their CoT and reason in their outputs instead (with reasoning-mode on!).

English

136

39.1K

Elle Najt@latentfool·11 Nis

@GenReasoning Could it just be that this market is already very efficient, and there's no alpha in the data you force the models to use? Is there a (non-overfit) strategy that makes money using the provided data?

English

General Reasoning@GenReasoning·9 Nis

🎲 Introducing KellyBench, a new long-horizon evaluation for frontier models. KellyBench evaluates models within a year long sports betting market, a challenging and highly non-stationary environment. Every frontier model we test loses money. They struggle to design ML strategies, manage risk, and adapt as the world changes. Link and thread below.

English

638

158.2K

Elle Najt@latentfool·4 Nis

@xenodium omg congrats! :)

English

Alvaro Ramirez@xenodium·3 Nis

…and then there were three (expect delays) xenodium.com/and-then-there… #fatherhood #emacs #orgmode #ios #blogging

English

304

Elle Najt@latentfool·23 Mar

@FazlBarez @amang0112358 I'm also confused about the choice to boost refusals in the score, since this makes the correlation between PAC and refusal rate stronger.

English

Elle Najt@latentfool·23 Mar

@FazlBarez @amang0112358 I'm confused about the treatment of refusals and how that is inflating the PAC scores for some models; the .91 rank corr. is striking. Did you try filtering out the refusals or resampling and looking the impact on the PAC score? (Is your data with all responses available on HF? )

English

Fazl Barez@FazlBarez·3 Kas

New paper: 🧭 Introducing VAL-Bench: Measuring Value Alignment in Language Models. A benchmark that measures the consistency in language model expression of human values when prompted to justify opposing positions on real-life issues. Work with @amang0112358 and Denny O'Shea! (1/7)

English

Elle Najt@latentfool·21 Mar

@docmilanfar remarkable how much simpler the modern proof is -- big win for abstractions like Jensen's + the optimization definition of the median getting identified and into the water supply. otoh you do lose all these cute bespoke proofs that give nice intuitions

English

1.2K

Peyman Milanfar@docmilanfar·21 Mar

surprising fact: for any random variable x, | mean(x) - median(x) | ≤ std(x) It’s not hard to show |μ - m| = |E(X - m)| ≤ E|X - m| ≤ E|X - μ| = E√(X - μ)² ≤ √ E(X - μ)² = σ originally appeared in a paper by Hotelling & Solomons Annals of Mathematical Statistics 1932

English

1.1K

70.8K

Elle Najt@latentfool·21 Mar

A post about the vibes of LLM generated algorave music : lesswrong.com/posts/DWtMPR87… Here's a sample: music.youtube.com/playlist?list=… Also: My tooling for vibe-coding vibes with Strudel github.com/ElleNajt/vibe-… ... have fun!

English

165

Elle Najt retweetledi

Alvaro Ramirez@xenodium·4 Şub

Missing agent-shell when away from your beloved #Emacs? Elle Najt's got you covered. Chat with agent-shell from any device via #Slack github.com/ElleNajt/agent… #android #iphone #ios #llm #ai #claude #openai #codex #agent #gemini #google

English

620

Elle Najt@latentfool·18 Ağu

@CihanPostsThms Is there a simple proof for C[x,y]?

English

Some theorems@CihanPostsThms·16 Ağu

Let A be a commutative Noetherian ring with Krull dimension ≥ 2. Then for every integer N, there is an ideal of A that cannot be generated by N elements.

English

Keşfet

@lumpenspace @AsaCoopStick @GenReasoning @xenodium @FazlBarez @amang0112358 @docmilanfar @CihanPostsThms