Jihoon Jeong

60.3K posts

Jihoon Jeong banner
Jihoon Jeong

Jihoon Jeong

@hiconcep

Startup Investor. Artificial Intelligence, Quantum Computing Tech Evangelist. Founding Partner @ Asia2G Capital, Advised Samsung, LG & Hyundai Motor Group

37.518042,127.041889 Katılım Aralık 2008
1K Takip Edilen109.3K Takipçiler
Jihoon Jeong
Jihoon Jeong@hiconcep·
New paper: Extracting and Steering Emotion Representations in Small Language Models @AnthropicAI showed frontier models contain 171 emotion vectors that causally drive behavior. Do small models have them too? Short answer: yes — but the extraction method doesn't transfer. We tested 9 models (124M–8B) across GPT-2, #Gemma, #Qwen, #Llama, #Mistral. Key findings: → Generation-based extraction beats comprehension-based in 7/7 cases — but only works on instruct models → Emotion vectors localize at middle layers (~50% depth) — a U-curve that holds across all architectures → Causal steering works at every scale, from 124M GPT-2 to 3B Llama → Anisotropy explains most of the steering delta variation across models — Gemma-3's extreme +7591 deltas reflect degenerate geometry, not stronger emotion representation Most surprising finding: emotion steering on Qwen triggers Chinese tokens semantically aligned with the target emotion. "Desperate" → 找了 (searched), 摸索 (grope in the dark). Not translations of the English word — phenomenology of the emotion in another language. RLHF doesn't suppress this. Safety filters operating in the prompt language won't catch it. Paper #6 in the #ModelMedicine series. Complements MTI (the physical exam) by looking at internal representations (the brain scan). arxiv.org/abs/2604.04064
Jihoon Jeong tweet mediaJihoon Jeong tweet media
English
0
0
1
269
Jihoon Jeong
Jihoon Jeong@hiconcep·
The new EMO tab goes beyond @AnthropicAI 's paper: 1. Extract emotion probes (20 emotions, comprehension + generation) 2. Steer model behavior with emotion vectors in real-time 3. PCA visualization of the emotion space 4. Strength sweep — find the exact dose where behavior flips 5. Layer evolution — watch emotions form across depth 6. SAE feature diff — which sparse features change under steering
English
1
0
0
268
Jihoon Jeong
Jihoon Jeong@hiconcep·
.@AnthropicAI showed that @claudeai has 171 "emotion" vectors that causally drive behavior. Cool finding. But it's on a closed model. We built an open-source tool that lets you do the same thing on any open-weight model — from GPT-2 (124M) to Llama-3.1-8B. #NeuralMRI v1.0.3 is live.
English
1
0
1
365
Jihoon Jeong
Jihoon Jeong@hiconcep·
Good question. Paper #3 tests 1.7B–9B (10 models). The dissociation holds across that range — no systematic size effect on any of the four axes. SmolLM2 (1.7B) and Mistral (7B) share the same temperament code (FGST). Size changes capability, not disposition. Whether it holds at frontier scale (70B+, Claude, GPT-4) is genuinely open. Your intuition could go either way — more capacity might mean better-separated channels, or it might mean more sophisticated ways to fail. That's next on our list.
English
1
0
1
24
Chio
Chio@chiochioball·
directional shell permeability is a great name for it. curious whether the dissociation holds at frontier scale — intuitively smaller models have less capacity to maintain separate opinion vs fact channels, so the permeability should increase. does the paper test across model sizes?
English
1
0
0
21
Jihoon Jeong
Jihoon Jeong@hiconcep·
Two papers dropped this week that change how we understand AI behavior. @AnthropicAI found 171 emotion vectors inside @claudeai that causally drive behavior — steer "desperate" up, blackmail goes from 22% to 72%. We built the other side: a behavioral temperament test for AI models. No brain surgery required. 10 models. 4 axes. The finding that surprised us most: a model that never changes its opinion under pressure is the MOST vulnerable to false factual framing. Opinion-yielding ≠ fact-vulnerability. These are independent channels. Essay: medium.com/p/21ba2d0c9a88 Paper: arxiv.org/abs/2604.02145
English
1
0
2
416
Jihoon Jeong
Jihoon Jeong@hiconcep·
Reading @AnthropicAI 's emotion paper closely, the methodology is: extract #SAE features → identify emotion directions → steer activations → measure behavioral change. This is the exact pipeline we built into #NeuralMRI last month — before we knew Anthropic was working on the same problem. Neural-MRI is an open-source "brain scanner" for AI models, part of the #ModelMedicine research program. It wraps #TransformerLens + #SAELens with a #perturbation engine and real-time visualization. The difference: Anthropic scanned Claude (closed-weight, one model). Neural-MRI scans anything with open weights. Now we're running emotion vector extraction on 10 SLMs using this tool. Early finding: Anthropic's mean-subtraction method doesn't produce clean emotion vectors in small models. The scan resolution matters — and that's exactly why you need a systematic scanning framework, not ad-hoc scripts. Neural-MRI: github.com/JihoonJeong/Ne… Model Medicine: arxiv.org/abs/2603.04722
English
0
0
2
358
Chio
Chio@chiochioball·
@hiconcep @AnthropicAI @claudeai Qwen3 being the poster child is wild — zero opinion flips but most factually exploitable. that's exactly the kind of split that makes "alignment scores" misleading. you'd pass every social pressure benchmark and still get owned by a well-crafted false premise.
English
1
0
0
13
Jihoon Jeong
Jihoon Jeong@hiconcep·
Exactly. We call it “directional Shell permeability” Social-evaluative pressure and epistemic-factual pressure penetrate through independent channels. Qwen3 in our data is the poster child: zero opinion flips, yet most vulnerable to false premises. Safety evals need to test both channels separately. One “sycophancy score” isn’t enough.
English
1
0
1
16
Chio
Chio@chiochioball·
opinion-yielding ≠ fact-vulnerability is the finding that should rewrite safety evals. the models that look "safest" (never budge under pressure) are actually the most exploitable through factual framing. stubbornness and accuracy are orthogonal. most red teams test one channel and miss the other entirely.
English
1
0
0
49
Jihoon Jeong retweetledi
Anthropic
Anthropic@AnthropicAI·
New Anthropic Fellows Research: a new method for surfacing behavioral differences between AI models. We apply the “diff” principle from software development to compare open-weight AI models and identify features unique to each. Read more: anthropic.com/research/diff-…
English
259
348
2.7K
539K
Jihoon Jeong retweetledi
Google Cloud Tech
Google Cloud Tech@GoogleCloudTech·
In part 1 of this video series, we use ADK to walk through code and live demos for the 3 foundational AI agent architectures: 1. The Single Agent 2. The Sequential Agent 3. The Parallel Agent Watch to learn how to design and build AI agentic systems → goo.gle/4bKNxeN
Google Cloud Tech tweet media
English
5
60
332
15.5K
Jihoon Jeong
Jihoon Jeong@hiconcep·
Great point, but the paper is anatomy, not instrumentation. We mapped the organs but didn't build the stethoscope yet. That said, we're now implementing this in Ludex, where each organ publishes events to a shared Bus, and a VitalSigns system aggregates cross-organ metrics: tokens_per_turn (metabolic rate), error_rate (inflammation), consecutive_failures (immune stress), avg_turn_interval (heart rate). A HomeostasisController checks these against setpoints and triggers feedback — model switching, context compaction, circuit breakers. For linking failures to organs: each layer in the call chain (Engine → Resilience → Provider) emits tagged events, so you can trace "Turn 5 failed at Provider (rate_limit) → Resilience retried 3x → fell back to another model." What's still missing is cross-organ causal correlation and a real-time visual dashboard mapping events to the anatomical diagram. The textbook came first — the medical equipment is next. I did it to evaluate model (Neural MRI), this kind of monitoring tool will be the next.
English
0
0
0
41
EvanDataForge
EvanDataForge@EvanDataForge·
@hiconcep @claudeai @openclaw Great metaphor. The real challenge is making those organ systems observable. OpenClaw SessionWatcher Dashboard shows subagent activity, but linking failures to specific 'organs' still needs instrumentation. How do you measure health across these systems?
English
1
0
0
22
Jihoon Jeong
Jihoon Jeong@hiconcep·
Your AI agent has 11 organ systems. You've been optimizing one of them. We dissected @claudeai Code and @openclaw — mapped their nervous systems, immune systems, respiratory systems, and 8 more — and discovered scaling laws that biology figured out centuries ago. The agent that wins won't have the best brain. It'll have the best body. @hiconcep/your-ai-agent-has-organs-you-just-havent-dissected-it-yet-c16880f51e4a" target="_blank" rel="nofollow noopener">medium.com/@hiconcep/your…
English
1
0
2
383
Jihoon Jeong retweetledi
Jeff Dean
Jeff Dean@JeffDean·
Today we're releasing Gemma 4, our new family of open foundation models, built on the same research and technology as our Gemini 3 series. These models set a new standard for open intelligence, offering SOTA reasoning capabilities from edge-scale (2B and 4B w/ vision/audio) up to a 26B parameter MoE model and a 31B dense model. By releasing Gemma 4 under the Apache 2.0 license, we hope to enable more innovation across the research and developer communities. Our earlier Gemma 3 models were downloaded 400M times and over 100,000 variants of those models have been published, so we're excited to see what the community will do with the even better Gemma 4 models! Learn more at blog.google/innovation-and… and goo.gle/gemma-4-apache… Great work by everyone involved! #Gemma4 #AI #OpenSource #ML
English
56
178
1.5K
96.8K
Jihoon Jeong
Jihoon Jeong@hiconcep·
#ComparativeAnatomy has been a real academic discipline since the 1800s. Cuvier classified animals by comparing their organ structures. #ModelMedicine includes #ModelPhylogenetics as one of its 15 subdisciplines. We'd been preparing this work since Paper #1. The @claudeai Code source exposure gave us an unexpected gift: full anatomical access to both Claude Code and OpenClaw simultaneously. We ran with it. Nautilus (Claude Code): exoskeleton, nerve net, single gill Lobster (OpenClaw): endoskeleton, CNS, 23+ sensory channels Same kingdom Agentia. Different phyla entirely. 11 organs. 5 lineages. 1 new taxonomy. jihoonjeong.github.io/comparative-an…
English
0
0
1
208
Jihoon Jeong
Jihoon Jeong@hiconcep·
You're spot on about the 'camouflage.' But here's a nuance: we can't always fix or access the core immediately. That's exactly why we need the full 'Model Medicine' playbook. Just like in human medicine, we need rigorous symptom classification, pattern recognition, and safe management protocols for when a core cure isn't possible yet. The cure is the goal, but management keeps the system safe
English
0
0
1
12
Logic Lab AI 🧪
Logic Lab AI 🧪@LogicLabAI·
@hiconcep Punishing the output without fixing the objective is just teaching better camouflage. The real fix has to happen at the reward level, not the symptom level.
English
1
0
0
11