lightyear

235 posts

lightyear banner
lightyear

lightyear

@__lightyear__

on the bleeding edge of strangeness

Katılım Ocak 2026
202 Takip Edilen23 Takipçiler
lightyear retweetledi
Ryan Greenblatt
Ryan Greenblatt@RyanPGreenblatt·
Here are some of my top candidates for big pushes to do right now on technical AI safety (low effort notes). Much better model organisms / misalignment analogies: - Doing a wider set of pessimized training runs (lesswrong.com/posts/FG54euEA…) - This may be a good candidate for lots of AI labor automation? Like maybe good to try to set up pipelines for building these envs. - Demonstrating risks from fitness-seekers/reward-seekers (lesswrong.com/s/JR9LzD3mbXva…) empirically - Even on current models with better tests, see: lesswrong.com/posts/FG54euEA… - Demonstrating various types of memetic spread of misalignment? Actually do control: - Build pipelines for red-teaming monitors and the agent itself. For the agent red-teaming, I'd put particular focus on checking whether it continues malign trajectories. - Scaffold integrated control features and other non-monitoring runtime control measures - Human response and auditing - Improving async and sync monitoring - Agent security features - Surveilling for rogue internal deployments (as in, building after-the-fact detection methods for rogue deployments) Preparing for handoff and elicitation - Get AIs generically better at conceptual work - Have a plan for the evals we ultimately need to see if handoff/deference would go well (lesswrong.com/posts/vjAM7F8v…) and start iterating on earlier versions - These presumably will involve a bunch of manual scoring, so we'll need to build a process for it. - Analyze AI biases and epistemics and improve across many domains - Build the anti-slop/anti-mundane-misalignment coalition via doing ratings of AIs and applying some pressure to improve on these ratings. This could focus on a variety of related issues. - The hope is basically that there might be widespread interest in removing/redacting mundane misalignment and other non-misalignment behavioral problems that reduce productivity and large parts of this seem differentially good. So, if we could make this a salient metric, AI companies might improve this. A lot of the difficulty would be in measuring the problem reasonably well. There are a bunch of different ways to apply pressure or increase salience if we had decent metrics, especially if these metrics legibly correspond to a common problem that many people are running into. - Try to do various trend extrapolations on things here to argue we aren't on track? Neuralese decoding prep: Make natural language autoencoders (lesswrong.com/posts/oeYesesa…) much better, build methods for extracting internal CoT (lesswrong.com/posts/oeYesesa…), build better evaluations of how well natural language autoencoders work.
English
4
13
136
6.9K
lightyear retweetledi
Jonas Geiping
Jonas Geiping@jonasgeiping·
We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.
GIF
English
30
113
902
90.6K
Vaibhav (VB) Srivastav
putting together a group chat for Codex power users in London / Europe who are the biggest ballers around?
English
241
4
299
97.3K
lightyear retweetledi
ueaj
ueaj@_ueaj·
for all the larping about nick land tpot seems extremely bad at mapping out the implications of antihumanist philosophy for the singularity The zuckerborg is like 70% of the way there on like 1% of the data and 1% of the intelligence of an ASI. Adding in unstructured data, integration between currently disaggregated sources, agency, and transfer learning, and it'd know you wanted broccoli the whole time Trivially, one could simply replace all the existing capital allocators, market researchers, sales people, etc. with an equivalently capable agent, and this would be automated planning. There's nothing uniquely special about human marketers specifically.
ueaj tweet media
roon@tszzl

@emollick this is wrong, because superintelligence does not mean information flows optimally. classic hayek problem

English
7
3
79
10K
lightyear
lightyear@__lightyear__·
@thdxr there is no solution. cybersec is over. you need mythos-tier agents running 24/7 on your machine
English
0
0
3
1.3K
dax
dax@thdxr·
everyone's ideas for fixing the npm security issue shows how basically no one is capable of thinking at the scale of this problem
English
189
53
1.9K
166.6K
Roy
Roy@usr_bin_roygbiv·
stop what you're doing bun install -g @oh-my-pi/pi-coding-agent omp config set task.maxRecursionDepth -1 omp config set task.maxConcurrency 32 omp config set task.eager true pi /login > codex /model > 5.5 xhigh /plan > the biggest thing you can think of execute report back
Roy@usr_bin_roygbiv

This is now a full time pi hype account

English
4
1
26
1.3K
lightyear
lightyear@__lightyear__·
@usr_bin_roygbiv @oH im too lazy to type in the commands myself so i trust the ai to do it for me. seems to have worked
lightyear tweet media
English
1
0
1
46
Roy
Roy@usr_bin_roygbiv·
@__lightyear__ @oH omp if you're willing to spend 30m on configs
English
1
0
2
74
lightyear
lightyear@__lightyear__·
/goal Check my machine and repos for signs of recent npm/PyPI supply-chain compromise, especially TanStack/Mini Shai-Hulud-style persistence. First do READ-ONLY detection only. Do not delete, clean, reinstall, rotate secrets, or modify files unless I approve. Scan current repo, ~/Code, package locks, .claude/settings.json, .vscode/tasks.json, npm/pnpm/yarn configs, global npm packages, and Python packages for known IOCs like malicious install scripts, fresh suspicious package versions, router_init.js, setup.mjs, .claude SessionStart hooks, VS Code runOn folderOpen tasks, getsession/filev2/seed domains, and unexpected git/tarball deps. Then report: infected/suspicious/clean, exact files/lines, and safe remediation steps. After that, harden installs: - npm: use lockfiles + npm ci + min-release-age - pnpm: minimum-release-age=10080, block-exotic-subdeps=true - Renovate: minimumReleaseAge 7 days
Aikido Security@AikidoSecurity

Update 5:05 PT: The attack has now expanded well beyond @TanStack and @Mistral. 373 malicious package-version entries across 169 npm package names, including @uipath, @squawk, @tallyui, @beproduct, and more. The malware propagates by stealing your CI credentials and using them to publish new compromised versions. Full IOCs, affected package list, and detection steps: aikido.dev/blog/mini-shai…

English
0
0
0
76
lightyear retweetledi
0xSero
0xSero@0xSero·
Summarising advice: 1. Read often 2. Keep experiments simple 3. Keep track of all logs and stats 4. Change only 1 variable at a time 5. Work with others often 6. Stay close to your interests 7. Speak often with experts and seniors 8. Absorb fundamentals of fields Thank you <3
0xSero@0xSero

For any researchers in my network: I want to take research more seriously to produce useful info, I have no academic background. Beyond prompting what resources and practices would you recommend?

English
13
26
492
14.5K
lightyear retweetledi
rohit
rohit@krishnanrohit·
The coolest thing with the previous big tech leap, the internet, was that most people building it were utopians. Kind of the opposite of today in a way.
English
7
5
67
4.2K
lightyear
lightyear@__lightyear__·
@SydSteyerhart Read only? Widen your horizons. There’s a reason why the key to AGI is continuous learning
English
0
0
7
200
lightyear
lightyear@__lightyear__·
@segyges The model just has to stop you from being vague
English
0
1
5
100
SE Gyges
SE Gyges@segyges·
any time an llm doesn't do what i want exactly right it's because i was vague so i actually mostly don't see what you'd do that would make a coding model better to me at this point
English
12
1
40
1.4K
lightyear
lightyear@__lightyear__·
drop everything you're doing and /goal optimizing tests for literally every single app you're developing
English
0
0
0
17
Benedict Kerres
Benedict Kerres@benedictk__·
I think there will be 2 ways to do knowledge work. 1. Fast response llms and quick iteration. 2. Leave the agent running for 1h and get back results. Waiting 5 min or so is just a terrible distraction. What to do while your llm is working.
English
9
0
20
2.2K
lightyear
lightyear@__lightyear__·
@segyges @MugaSofer i mean this was basically orthodoxy until the 70's, it could have been henry ford saying this. the reagan era really fucked with our understanding of a Normal Economy
English
0
0
2
18
SE Gyges
SE Gyges@segyges·
there is only one correct answer to any question given the same premises and you should expect all models to converge on that answer as they improve this may involve them being extremely boring, ideologically hostile, etc. the idea that they should disagree is anthropocentric
roon@tszzl

it is actually worrying that the models seem to have converged on similar beliefs on all important questions. they’re are neobuddhist neolibs which talk about annata and housing policy, including grok and the Chinese models! boring

English
5
1
29
1.4K
lightyear
lightyear@__lightyear__·
Gemini 2.5-pro-0324 was actual agi. i realized that with in one specific occasion: i had it acting as a strategy officer for a project i was building, the idea involved setting up a website on a domain i already owned, but the registrar didn't support email routing or something, details are fuzzy. what i do remember clearly is that the model told me "hey, it turns out this registrar doesn’t support what you need. you’d need to transfer the domain to another one" so i asked "how difficult is that? can you look up how we’d do it?" and so gemini straight up said "no, that’s a waste of our fucking time, what you need is to work on your project and launch it, independently of the domain name" i went holy shit, it’s right. that shouldn’t be possible. a computer model generalizing my actual intention. that's what the nerds in /r/futurology were talking about when i was 13
English
0
0
3
354
Viv
Viv@Vtrivedy10·
ok but how sick would it be if Google re-released that one cracked checkpoint of gemini 2.5 pro from last March I actually thought it was so over at that very moment
English
7
1
38
14K