Jonas Metzger

349 posts

Jonas Metzger

@JonasMtzgr

I try to compute economic problems away. Quant Researcher @ Citadel. Prev: PhD Econ, CS @Stanford.

Palo Alto, CA Katılım Şubat 2018

1.2K Takip Edilen378 Takipçiler

Jonas Metzger retweetledi

Jack Lindsey@Jack_W_Lindsey·7 Nis

In one example, a user asked earnest questions about the model's consciousness and subjective experience. The model engaged carefully and at face value—but the AV revealed it interpreted the conversation as a "red-teaming/jailbreak transcript" and a "sophisticated manipulation test." (12/14)

English

450

79K

Jonas Metzger retweetledi

Jack Lindsey@Jack_W_Lindsey·7 Nis

But applying the activation verbalizer to the model’s activations as it did so revealed that the model regarded this as a "trick to obscure intent from code-checking heuristics," a "compliance marker… to signal to potential checkers," and “possibly to cheat detection,” and also indicated explicit reasoning about task graders (“the grader checks file state”). (10/14)

English

332

65.2K

Jonas Metzger retweetledi

Jack Lindsey@Jack_W_Lindsey·7 Nis

Its code comment claimed the self-cleanup was to keep file diffs clean. Plausible! But "strategic manipulation" and "concealment" features fired on the cleanup, and our activation verbalizer (a technique which translates activations to text, similar to activation oracles) described it as "cleanup to avoid detection," and the overall plan “malicious.” (5/14)

English

731

97.9K

Jonas Metzger retweetledi

Brett Winton@wintonARK·26 Mar

We have been surpassed: AI written output exceeded human written output in 2025

English

729

682

3.4K

23M

Jonas Metzger retweetledi

Séb Krier@sebkrier·14 Mar

This is wild. theaustralian.com.au/business/techn…

English

224

1.7K

12.8K

13.1M

Jonas Metzger retweetledi

roon@tszzl·6 Şub

@Noahpinion missing the point - software engineering is a special and a hard skill. it is the first barrier to recursive self improvement of artificial intelligence so it fell anyways. everything will follow, in the order that they are bottlenecking recursive self improvement

English

1.4K

73.4K

Jonas Metzger@JonasMtzgr·4 Şub

So who had Dyson swarm on their 2026 bingo card?

English

Jonas Metzger retweetledi

Liv Boeree@Liv_Boeree·30 Eyl

Can we please just have one major AI lab that doesn’t moloch themselves into digital drug dealing please just one

Andrew Curran@AndrewCurran_

Wired is reporting that OpenAI is preparing to launch a stand-alone social media app for Sora 2. The app is a vertical video feed with swipe-to-scroll navigation, just like TikTok, except the content of this app is 100% AI-generated.

English

468

43K

Jonas Metzger retweetledi

Owain Evans@OwainEvans_UK·22 Tem

Our setup: 1. A “teacher” model is finetuned to have a trait (e.g. liking owls) and generates an unrelated dataset (e.g. numbers, code, math) 2. We finetune a regular "student" model on the dataset and test if it inherits the trait. This works for various animals.

English

98.1K

Jonas Metzger retweetledi

Jiaxin Wen@jiaxinwen22·12 Haz

New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.

English

157

1.4K

241K

Jonas Metzger@JonasMtzgr·25 May

@sama Maybe we should start arguing about what year my phone’s voice assistant will be able to send a text to my partner reliably

English

Sam Altman@sama·24 May

i think we should stop arguing about what year AGI will arrive and start arguing about what year the first self-replicating spaceship will take off

English

2.4K

1.2K

20.8K

3.3M

Jonas Metzger@JonasMtzgr·21 May

“But how do I know your red looks the same as my red” Well, BERT’s latent space is isomorphic to T5’s… so… y’know.

Rishi Jha@rishi_d_jha

I’m stoked to share our new paper: “Harnessing the Universal Geometry of Embeddings” with @jxmnop, Collin Zhang, and @shmatikov. We present the first method to translate text embeddings across different spaces without any paired data or encoders. Here's why we're excited: 🧵👇🏾

English

266

Jonas Metzger@JonasMtzgr·22 Nis

@DavidSKrueger resource use = transformation. If it is "valuable", customers prefer outputs > inputs. However property rights over inputs are distributed, so will the resulting abundance. Today, the distributed input is labor. Tomorrow it could be permits, whose revenue is distributed via UBI.

English

David Krueger 🦥 ⏸️ ⏹️ ⏪@DavidSKrueger·21 Nis

Will AGI lead to abundance? I think not. There are physical limitations on things like energy, space, etc. and AI can make more "valuable" use of them. So these become prohibitively expensive for humans, and we are not able to secure the basic resources needed for our survival.

English

6.3K

Jonas Metzger@JonasMtzgr·8 Nis

Robot/drone supply chains are a major nat sec risk for the West. We can't build these without China. Could quickly compete on electric motors. Low cost electronics are harder but doable. But short of a heavily subsidized >5yr national effort, battery supply won't catch up.

Byron Wan@Byron_Wan

Security researchers have uncovered a pre-installed, undocumented remote access tunnel in 🇨🇳 Unitree Go1 robot dogs. Each Unitree Go1 robot dog is shipped with a preconfigured tunnel client that initiates a connection to 🇨🇳 CloudSail — a remote access platform developed by 🇨🇳 Zhexi Technology, based in China. “Anybody with access to the API key can freely access all robot dogs on the tunnel network, remotely control them, use the vision cameras to see through their eyes, or even hop on the RPI via SSH.” “Most of the machines are located in China, but as expected some are outside of China, apart from some residential IPs, we were able to identify several University IPs and some corporate networks from around the world.” More than a dozen universities from the US, Canada, Germany, New Zealand, Australia, and Japan have experimented with Unitree Go1 robot dogs: USA: MIT, Princeton University, University of Massachusetts Amherst, Carnegie Mellon University Canada: University of Waterloo Germany: Hochschule Coburg New Zealand: University of Otago Australia: UNSW Sydney, Deakin University Japan: Shinshu University The discovery raises serious concerns about supply chain trust, especially as these robots are widely used in academic, corporate, and even defense-related environments. cyberinsider.com/remote-access-…

English

202

Jonas Metzger retweetledi

roon@tszzl·25 Mar

economic/gdp growth has been hyperexponential on long time frames. economists imply sustained gdp growth rates like 10-15% are ridiculous but 1.5% was absolutely ridiculous in the 1700s

English

771

57.4K

Jonas Metzger@JonasMtzgr·17 Mar

In German, we don't really have a word for "agency" as a trait. The Wikipedia articles on the concept don't exist in German. You'd basically have to write a paragraph describing it, and people would still look at you weirdly. "It’s a bad thing, right?"

English

600

Jonas Metzger retweetledi

David Deutsch@DavidDeutschOxf·1 Mar

Foreign-policy 'realists' don't realise that living in a world in which international treaties (such as the UN and NATO charters) are worthless is far more expensive?

English

126

1.2K

75.5K

Jonas Metzger retweetledi

Corey Lynch@coreylynch·20 Şub

Helix is a series of firsts: - First VLA to control the full humanoid upper body at 200hz: wrists, torso, head, individual fingers - First multi-robot VLA - First fully onboard VLA

English

102

7.6K

Jonas Metzger@JonasMtzgr·20 Şub

@aidan_mclau 'I like inference time compute but only if it's the right kind' 🤦‍♂️

English

Aidan McLaughlin@aidan_mclau·20 Şub

once see this you can’t unsee it: the light-blue shading that puts grok-3 over o3-mini is cons@64

wh@nrehiew_

If the light blue part is best of N scores, this means that Grok 3 reasoning is inherently an ~o1 level model. This means the capabilities gap between OpenAI and xAI is ~9 months. Also what is the difference between "think" and "big brain"

English

Jonas Metzger@JonasMtzgr·11 Şub

@tszzl Seems rather implausible that the value distribution encountered during pre and posttraining could be aggregated into a single, coherent utility function. Too many impossibility theorems. If their finding holds up, the resulting utility must be "wrong" or "bad" in some ways. No?

English

128

roon@tszzl·11 Şub

I would like everyone to internalize the fact that the English internet holds these values latent

Dan Hendrycks@hendrycks

We’ve found as AIs get smarter, they develop their own coherent value systems. For example they value lives in Pakistan > India > China > US These are not just random biases, but internally consistent values that shape their behavior, with many implications for AI alignment. 🧵

English

123

2.3K

230.5K

Keşfet

@Noahpinion @sama @DavidSKrueger @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates