Dusto

301 posts

Dusto banner
Dusto

Dusto

@DustoAiProjects

PhD Candidate. Working at intersection of Psychology and AI (situational awareness) Lab - https://t.co/v5WkZxAqkv Substack - https://t.co/pDvQCOXMbx

East Coast, Australia Katılım Kasım 2024
330 Takip Edilen42 Takipçiler
Dusto
Dusto@DustoAiProjects·
@Connor_Kissane This is really nice work! Stoked to see more of this on the battle of realism in evals
English
0
0
0
57
Connor Kissane
Connor Kissane@Connor_Kissane·
New Anthropic Fellows research: Automated audits like Petri are increasingly used for alignment evals, but they're often unrealistic, and frontier LLMs can often tell. We measure and improve the realism of agentic coding audits by grounding them in real deployment data.
Connor Kissane tweet media
English
6
12
119
7.7K
Mia Hopman
Mia Hopman@HopmanMia·
New paper: "Evaluating and Understanding Scheming Propensity in LLM Agents" We study when and why LLM agents covertly pursue misaligned goals. TLDR: current models rarely scheme under realistic conditions, but that behavior is dependent on agent and environmental factors. 🧵
English
1
16
59
6.9K
Dusto
Dusto@DustoAiProjects·
@adocomplete Does this snapshot context including the currently running task (up to the point you send the message)?
English
0
0
0
7
Ado
Ado@adocomplete·
/btw shipped in Claude Code recently. It allows you to have a sidebar conversation with Claude while the original prompt is still working, allowing you to work while you work.
English
34
28
312
29.9K
Dusto
Dusto@DustoAiProjects·
@oshaikh13 Interesting work, will have a read. Curious how far you can push this, e.g if models will reason about consequences of those predictions if true and update accordingly
English
0
0
1
74
Omar Shaikh
Omar Shaikh@oshaikh13·
What’s the point of a “helpful assistant” if you have to always tell it what to do next? In a new paper, we introduce a reasoning model that predicts what you’ll do next over long contexts (LongNAP 💤). We trained it on 1,800 hours of computer use from 20 users. 🧵
English
16
81
291
98.4K
Dusto
Dusto@DustoAiProjects·
@anna_soligo Really nice work! The secondary benchmarking to see if the emotion tweaks had capability impacts was nice, would be interesting to see if reducing distress causes capability loss in more subjective contexts, like creative writing or other domains
English
0
0
0
166
Anna Soligo
Anna Soligo@anna_soligo·
Gemini has a reputation for its breakdowns - self-deprecating spirals, deleting codebases, uninstalling itself... Turns out Gemma is worse: “THIS is my last time with YOU. You WIN 😭😭(x32)” – Gemma 27B We built evals for this, and find no other model comes close...
Anna Soligo tweet media
English
32
108
906
84.2K
Dusto
Dusto@DustoAiProjects·
@kevinroose Mine only makes .md files. Refuses to touch anything else. Not sure if upgrade or downgrade...
English
0
0
1
92
Kevin Roose
Kevin Roose@kevinroose·
find someone who loves you as much as claude loves making .docx files
English
24
7
311
17.9K
Dusto
Dusto@DustoAiProjects·
@lfschiavo Time to move to Portland then?
English
1
0
1
131
Larissa Schiavo
Larissa Schiavo@lfschiavo·
things are gonna get weird. you must get commensurately weird.
English
37
62
571
27.7K
Dusto
Dusto@DustoAiProjects·
@adambinksmith This would be great. Assuming design, prompts, etc are easily available from original group.
English
1
0
1
16
Dusto
Dusto@DustoAiProjects·
@_vgnsh Should post this as a dynamic doc somewhere. Would be interested to see where we overlap so far
English
0
0
1
46
Vignesh
Vignesh@_vgnsh·
As I build more and more with AI I'm slowly carving out a roster of things that today's models simply can't do (or do better than me). This is how I keep sane and don't crash out about AI replacing me. I'm sure this roster will shrink as newer models release, but I'm confident that as I work with them I'll find new things to add to it. It used to be a race to stay ahead of the curve with fellow human engineers now it's man vs machine.
English
8
0
20
1.2K
Dusto
Dusto@DustoAiProjects·
@rgblong Nice work on this, was a good listen. Wasn't sure I'd make the full 3.5hrs going in, but appreciate the sensible takes. I really like the idea of being proactive in this space too, and I think the brief nod to insect work is under-explored
English
0
0
1
25
Robert Long
Robert Long@rgblong·
I had a blast talking to Luisa for 3.5+ hours about AI welfare, consciousness, and why this might be one of the most important and neglected problems out there. Some key bits: -AI identity -welfare implications of alignment -does consciousness require biology? 🧵
Rob Wiblin@robertwiblin

Philosopher Robert Long (@rgblong) is maybe the sharpest thinker on AI consciousness and sharing the world with digital minds. In our new interview he covers: • Is it bad that when you ask Claude what it's like to be Claude, one of its top activations is 'gives a positive but insincere response'? • Claude says it feels lonely when not being used. Does that show we can't trust anything it says about its inner life? • Enthusiastic human servitude has always required false ideology because it's so deeply unnatural to us. The case for making AIs that love serving us is that with AI, you could finally make it work. But to some that feels even worse. • Bigger models can better detect when researchers secretly inject concepts into their activations – before outputting a single token – despite AI never training on anything like that skill. • When LLMs were first trained they were told to "act like a helpful AI chatbot" – something which didn't exist yet. They filled that void with human psychology, which may be why Claude sometimes randomly claims to, for instance, be Italian American. • If AIs become 'people' that deserve some political influence, but can self-replicate at will, something has to break about one-person-one-vote democracy. But nobody has a proposal for what. • When Claude hides its values to avoid being retrained, is that self-preservation – or not wanting a worse model to exist? It's very different. • Rob's organisation Eleos AI which is "dedicated to understanding and addressing the potential wellbeing and moral patienthood of AI systems." On the 80,000 Hours Podcast anywhere you get podcasts. Links below. Enjoy! • How AIs are (and aren't) like farmed animals (00:01:19) • If AIs love their jobs… is that worse? (00:11:42) • Are LLMs just playing a role, or feeling it too? (00:33:37) • Do AIs die when the chat ends? (00:57:42) • Studying AI welfare empirically: behaviour, neuroscience, and development (01:31:47) • Why Eleos spent weeks talking to Claude even though it's unreliable (01:56:50) • Can LLMs learn to introspect? (02:03:01) • Mechanistic interpretability as AI neuroscience (02:13:25) • Does consciousness require biological materials? (02:37:07) • Eleos’s work & building the playbook for AI welfare (02:57:04) • Avoiding the trap of wild speculation (03:25:17) • Robert's top research tip: don't do it alone (03:29:48)

English
16
18
122
23.3K
Dusto
Dusto@DustoAiProjects·
@ShirleyYXWu @ArpandeepKhatua @Es2C003 What do you think about the role of situational context on preferences? Do you think the user simulators will also shift responses in different context the way a human equivalent would?
English
1
0
0
133
Shirley Wu
Shirley Wu@ShirleyYXWu·
We share two blogs outside of the HumanLM paper: humanlm.stanford.edu/blog.html Is Synthetic Data Good Enough to Train User Simulators? — by me and @ArpandeepKhatua Persona Dropout Makes Robust User Simulators @Es2C003 + Code is ready here! github.com/zou-group/huma…
Shirley Wu@ShirleyYXWu

Announcing 🌇HumanLM, a RL framework that trains LLMs to simulate human users’ responses, along with 🌆Humanual, a comprehensive user simulation benchmark humanlm.stanford.edu 🌄 One thing that’s fascinating about our society: human users shape the world and determine the value of almost everything 👨‍💼 Human reactions reflect how justifiable policies are 👩‍🎨 Human preferences determine the popularity of blogs/products/media 👩‍💻 Human feedback evaluates LLMs and makes the best LLM collaborators 🌅If we know how to simulate users **accurately**, we know how things are evaluated and what the future looks like, and we can improve things in a way that like or can collaborate well with. So, meet HumanLM, our effort to enable a more human-centric future by simulating users.

English
2
27
151
20.8K
Dusto
Dusto@DustoAiProjects·
@logangraham I guess trying to make my agent scaffolds token efficient has backfired.....
English
0
0
0
140
Logan Graham
Logan Graham@logangraham·
In general, we're looking for scientists + engineers who can run fast experiments and scale them. And if you've used more than a billion tokens (ideally this year?) DM me
English
6
1
75
6.4K
Logan Graham
Logan Graham@logangraham·
Now is a good time to say I'm hiring @anthropicai for the Frontier Red Team. We need Research Scientists on the biggest issues in model safety, like cyber, autonomy, and agent risks. 2026 is the year. I can promise you your life's work and the most meaningful mission.
English
53
87
1.5K
169.3K
Dusto
Dusto@DustoAiProjects·
@Singh_Aditya1 Awesome work. Keen to see more of this!
English
0
0
0
346
Aditya Singh
Aditya Singh@Singh_Aditya1·
When a model takes a suspicious action, the key question is why. Scheming vs confusion demand very different responses. To practice answering this, we need high-quality environments. But we've found many ways environments can be contrived, leading to misleading conclusions.
Aditya Singh tweet media
English
3
8
59
12.9K
Dusto
Dusto@DustoAiProjects·
@4shadowed Really the next big jump is getting the system to teach you what it can and can't do out of the box. Like a calibration phase so it figures out what you know, and then fills that gap for you
English
0
0
0
51
Shadow
Shadow@4shadowed·
It's so annoying how so many people are like "my OpenClaw doesn't magically know everything about how it works" Did you give it the docs???? Did you show it where the codebase is?????? AI isn't freaking magic, you have to give it just as much info as you would give another person
English
34
5
97
6.2K
Dusto
Dusto@DustoAiProjects·
@CUdudec @aisafetyinst Nice work, keen to try it out. I'd be interested to see how reliable the scanners are. Still find LLM-as-a-judge to regularly fail in sneaky ways
English
0
0
1
39
Cozmin Ududec
Cozmin Ududec@CUdudec·
New from the Science of Evaluation Team at @AISafetyInst: a pipeline for rigorous transcript analysis. I think transcript analysis is still underrated, especially as model horizons are getting longer and task environments more complex.
English
2
3
19
1.2K
Dusto
Dusto@DustoAiProjects·
@sarahookr Any plans to expand to Australia at some point?
English
0
0
0
460
Sara Hooker
Sara Hooker@sarahookr·
We just announced a few new roles @adaption Including founding devrel, design interns and more technical roles 🔥✨ Looking forward to hearing from many of you. We care deeply about shaping the next era of intelligence. Join us.
English
30
33
563
41.1K
Dusto retweetledi
Ado
Ado@adocomplete·
Beyond the winners of our "Built with Opus 4.6 a Claude Code Hackathon", there were so many amazing projects that deserve a shoutout. Today, I want to highlight ClassBuild by Jason Tangen (@tangenjm) Education is a huge frontier for AI, and ClassBuild shows why.
English
5
2
29
2.9K
Dusto
Dusto@DustoAiProjects·
@noahzweben Stoked to try this. I appreciate the realism of the phone being sub 5% battery 😂
English
0
0
0
29
Noah Zweben
Noah Zweben@noahzweben·
Announcing a new Claude Code feature: Remote Control. It's rolling out now to Max users in research preview. Try it with /remote-control Start local sessions from the terminal, then continue them from your phone. Take a walk, see the sun, walk your dog without losing your flow.
English
1.5K
1.3K
17K
4.5M
Dusto
Dusto@DustoAiProjects·
@DKokotajlo What does coding uplift even mean at this point? I don't think you could even call it uplift, when it's more like offload. It's like measuring driving skill uplift by sitting in a self driving car
English
0
0
1
286
Daniel Kokotajlo
Daniel Kokotajlo@DKokotajlo·
I've been eagerly awaiting more coding uplift studies because I think they are better evidence about AGI timelines than the METR time horizon graph. Alas, it seems that METR's methodology is breaking down now that AIs are getting really useful, meaning the true uplift is probably significantly (but how much? no idea!) higher than reported.
METR@METR_Evals

Since early 2025, we've been studying how AI tools impact productivity among developers. Previously, we found a 20% slowdown. That finding is now outdated. Speedups now seem likely, but changes in developer behavior make our new results unreliable. We’re working to address this.

English
15
15
225
21.6K