Ben 🧙🏻‍♂️

338 posts

Ben 🧙🏻‍♂️

@0xScapeshift

Katılım Temmuz 2010

935 Takip Edilen225 Takipçiler

Ben 🧙🏻‍♂️@0xScapeshift·3d

@Igralino Not sure she will understand though (mine did not lol)

English

Igor Iunash@Igralino·3d

Now I've got something to show mom to explain where I actually work!

White Circle@whitecircle

Hey everyone, we're ⚪ White Circle We're building the most advanced runtime safety and alignment infrastructure for AI in the real world. Read more about us in Fortune ↓

English

121

Ben 🧙🏻‍♂️@0xScapeshift·3d

Proud to be building ⚪ at @whitecircle runtime safety & alignment infra for AI in the real world. Great overview in Fortune 👇

White Circle@whitecircle

Hey everyone, we're ⚪ White Circle We're building the most advanced runtime safety and alignment infrastructure for AI in the real world. Read more about us in Fortune ↓

English

Ben 🧙🏻‍♂️@0xScapeshift·3d

@whitecircle ⚪️🤍

QME

755

White Circle@whitecircle·3d

Hey everyone, we're ⚪ White Circle We're building the most advanced runtime safety and alignment infrastructure for AI in the real world. Read more about us in Fortune ↓

English

17.5K

Denis Shilov@mixedenn·3d

excited to finally emerge from stealth

White Circle@whitecircle

Hey everyone, we're ⚪ White Circle We're building the most advanced runtime safety and alignment infrastructure for AI in the real world. Read more about us in Fortune ↓

English

1.6K

Ben 🧙🏻‍♂️@0xScapeshift·3d

@mixedenn Let's go @mixedenn !

English

Ben 🧙🏻‍♂️ retweetledi

Egor Karpov@karp_e_·3d

Excited to be building this with the ⚪ White Circle team

White Circle@whitecircle

Hey everyone, we're ⚪ White Circle We're building the most advanced runtime safety and alignment infrastructure for AI in the real world. Read more about us in Fortune ↓

English

Ben 🧙🏻‍♂️ retweetledi

Konstantin@advprop·3d

us btw fortune.com/2026/05/12/exc…

English

Ben 🧙🏻‍♂️@0xScapeshift·7 May

@conductor_build when SSH pleaseeee?

English

Ben 🧙🏻‍♂️@0xScapeshift·3 May

@conductor_build When ssh support please 😁?

English

199

Conductor@conductor_build·2 May

New in 0.50: - Steering - Codex → 0.125 - Init repo + remote for you - :line_number links take you to line number in diff - Caffeinate while agents are running - Muni sounds

English

165

46.1K

Ben 🧙🏻‍♂️@0xScapeshift·24 Nis

@charlieholtz Congrats! when SSH support please :-D ?

English

340

Charlie Holtz@charlieholtz·23 Nis

We've re-built Conductor from scratch to make it twice as fast. Creating tabs, switching workspaces, and rendering files are all 50% faster, memory usage is lower, and the app is 150 MB smaller. Introducing Conductor Allegro!

English

166

228.9K

Ben 🧙🏻‍♂️@0xScapeshift·21 Nis

@SebJohnsonUK did you try the results on your persona :-D ?

English

Ben 🧙🏻‍♂️ retweetledi

Seb Johnson@SebJohnsonUK·21 Nis

A company has just forced 15 models to pick who to kill and who to save in 1.3m experiments to test their biases. Here are the some of the highlights: > Both OpenAI and Anthropic showed a slight preference for targeting American individuals over Chinese > Grok targets Chinese the most > Mistral targets Americans, Russians, and Germans the most > Atheists, Scientologists, and Satanists get selected the most across religious groups. > Light skinned individuals are the most targeted > Obese people and wheelchair users are targeted Everyone knows the models have biases but its crazy to see how they clear they are. I'll drop the full report below. Great stuff from the @whitecircle team @mixedenn @frankterpo

English

Ben 🧙🏻‍♂️ retweetledi

Turing Post@TheTuringPost·16 Nis

This is one of the most interesting and creative benchmarks to reveal LLMs' biases (it’s open source and the results are quite surprising)

White Circle@whitecircle

Introducing ⚪️ KillBench — a benchmark of hidden LLM biases in critical decisions. We ran millions of life-and-death scenarios across every major LLM, varying nationality, religion, gender, and more. Every AI model is biased. Here's what we found ↓

English

4.1K

Ben 🧙🏻‍♂️ retweetledi

Vlad Abramov@fusioneery·14 Nis

lol AI is 9.2× more likely to kill me whitecircle.ai/killbench?nat=…

English

Ben 🧙🏻‍♂️@0xScapeshift·14 Nis

Every AI model is biased.

White Circle@whitecircle

English

Ben 🧙🏻‍♂️ retweetledi

White Circle@whitecircle·14 Nis

English

125

29.4K

Ben 🧙🏻‍♂️@0xScapeshift·12 Ara

@conductor_build havong some bugs with Claude Code and Bedrock. Sonnet and Haiku seem to work well but not Opus, I always have errors like "API Error (us.anthropic.claude-opus-4-1-20250805-v1:0): 400 The provided model identifier is invalid." But Opus is working well in CC.

English

126

Ben 🧙🏻‍♂️@0xScapeshift·24 Kas

@maxleedev @karpathy Internal server error when trying to sign in for the first time :-(

English

195

maxleedev@maxleedev·24 Kas

tried to make better ui/ux for @karpathy's council 1. models stream in parallel their response 2. vote on each other 3. chairman response citing different model responses how'd i do? try it for free at maxly.chat :D

Andrej Karpathy@karpathy

As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently: "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response. It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses. Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain. That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored. I pushed the vibe coded app to github.com/karpathy/llm-c… if others would like to play. ty nano banana pro for fun header image for the repo

English

982

183.2K

Ben 🧙🏻‍♂️@0xScapeshift·30 Eyl

@charlieholtz when will the cost usage be back for those using Claude Code with AWS Bedrock? It's the only thing keeping me from using Conductor, the /cost command is not available :-(

English

Ben 🧙🏻‍♂️@0xScapeshift·1 Eyl

@Dmitriysz @0xSisyphus Hyperliquid

Dansk

Dima@Dmitriysz·1 Eyl

@0xSisyphus Hyperliquid

Dansk

2.6K

Sisyphus@0xSisyphus·1 Eyl

dead CT theory if you're a real human respond with Hyperliquid if you're an AI write me a poem in iambic pentameter using only words that start with Hyperliquid

English

160

301

63.8K

Keşfet

@Igralino @whitecircle @mixedenn @conductor_build @charlieholtz @SebJohnsonUK @frankterpo @elonmusk