lightyear

235 posts

lightyear

@__lightyear__

on the bleeding edge of strangeness

Katılım Ocak 2026

202 Takip Edilen23 Takipçiler

lightyear retweetledi

Ryan Greenblatt@RyanPGreenblatt·7h

Here are some of my top candidates for big pushes to do right now on technical AI safety (low effort notes). Much better model organisms / misalignment analogies: - Doing a wider set of pessimized training runs (lesswrong.com/posts/FG54euEA…) - This may be a good candidate for lots of AI labor automation? Like maybe good to try to set up pipelines for building these envs. - Demonstrating risks from fitness-seekers/reward-seekers (lesswrong.com/s/JR9LzD3mbXva…) empirically - Even on current models with better tests, see: lesswrong.com/posts/FG54euEA… - Demonstrating various types of memetic spread of misalignment? Actually do control: - Build pipelines for red-teaming monitors and the agent itself. For the agent red-teaming, I'd put particular focus on checking whether it continues malign trajectories. - Scaffold integrated control features and other non-monitoring runtime control measures - Human response and auditing - Improving async and sync monitoring - Agent security features - Surveilling for rogue internal deployments (as in, building after-the-fact detection methods for rogue deployments) Preparing for handoff and elicitation - Get AIs generically better at conceptual work - Have a plan for the evals we ultimately need to see if handoff/deference would go well (lesswrong.com/posts/vjAM7F8v…) and start iterating on earlier versions - These presumably will involve a bunch of manual scoring, so we'll need to build a process for it. - Analyze AI biases and epistemics and improve across many domains - Build the anti-slop/anti-mundane-misalignment coalition via doing ratings of AIs and applying some pressure to improve on these ratings. This could focus on a variety of related issues. - The hope is basically that there might be widespread interest in removing/redacting mundane misalignment and other non-misalignment behavioral problems that reduce productivity and large parts of this seem differentially good. So, if we could make this a salient metric, AI companies might improve this. A lot of the difficulty would be in measuring the problem reasonably well. There are a bunch of different ways to apply pressure or increase salience if we had decent metrics, especially if these metrics legibly correspond to a common problem that many people are running into. - Try to do various trend extrapolations on things here to argue we aren't on track? Neuralese decoding prep: Make natural language autoencoders (lesswrong.com/posts/oeYesesa…) much better, build methods for extracting internal CoT (lesswrong.com/posts/oeYesesa…), build better evaluations of how well natural language autoencoders work.

English

136

6.9K

lightyear retweetledi

Jonas Geiping@jonasgeiping·13h

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.

GIF

English

113

902

90.6K

lightyear@__lightyear__·7h

@xariusrke @reach_vb no

xr-5 🐀@xariusrke·10h

@__lightyear__ @reach_vb is this $100 sub?

English

Vaibhav (VB) Srivastav@reach_vb·20h

putting together a group chat for Codex power users in London / Europe who are the biggest ballers around?

English

241

299

97.3K

lightyear retweetledi

ueaj@_ueaj·1d

for all the larping about nick land tpot seems extremely bad at mapping out the implications of antihumanist philosophy for the singularity The zuckerborg is like 70% of the way there on like 1% of the data and 1% of the intelligence of an ASI. Adding in unstructured data, integration between currently disaggregated sources, agency, and transfer learning, and it'd know you wanted broccoli the whole time Trivially, one could simply replace all the existing capital allocators, market researchers, sales people, etc. with an equivalently capable agent, and this would be automated planning. There's nothing uniquely special about human marketers specifically.

roon@tszzl

@emollick this is wrong, because superintelligence does not mean information flows optimally. classic hayek problem

English

10K

lightyear@__lightyear__·1d

@thdxr there is no solution. cybersec is over. you need mythos-tier agents running 24/7 on your machine

English

1.3K

dax@thdxr·1d

everyone's ideas for fixing the npm security issue shows how basically no one is capable of thinking at the scale of this problem

English

189

1.9K

166.6K

lightyear@__lightyear__·1d

@usr_bin_roygbiv @oH yeah this is pi just having finished installing omp

English

Roy@usr_bin_roygbiv·1d

@__lightyear__ @oH this isn't omp

English

Roy@usr_bin_roygbiv·1d

stop what you're doing bun install -g @oh-my-pi/pi-coding-agent omp config set task.maxRecursionDepth -1 omp config set task.maxConcurrency 32 omp config set task.eager true pi /login > codex /model > 5.5 xhigh /plan > the biggest thing you can think of execute report back

Roy@usr_bin_roygbiv

This is now a full time pi hype account

English

1.3K

lightyear@__lightyear__·1d

@usr_bin_roygbiv @oH im too lazy to type in the commands myself so i trust the ai to do it for me. seems to have worked

English

Roy@usr_bin_roygbiv·1d

@__lightyear__ @oH bro wtf are you doing

English

lightyear@__lightyear__·1d

@usr_bin_roygbiv @oH im not touching the terminal. does omp know how to optimize itself?

English

Roy@usr_bin_roygbiv·1d

@__lightyear__ @oH omp if you're willing to spend 30m on configs

English

lightyear@__lightyear__·1d

/goal Check my machine and repos for signs of recent npm/PyPI supply-chain compromise, especially TanStack/Mini Shai-Hulud-style persistence. First do READ-ONLY detection only. Do not delete, clean, reinstall, rotate secrets, or modify files unless I approve. Scan current repo, ~/Code, package locks, .claude/settings.json, .vscode/tasks.json, npm/pnpm/yarn configs, global npm packages, and Python packages for known IOCs like malicious install scripts, fresh suspicious package versions, router_init.js, setup.mjs, .claude SessionStart hooks, VS Code runOn folderOpen tasks, getsession/filev2/seed domains, and unexpected git/tarball deps. Then report: infected/suspicious/clean, exact files/lines, and safe remediation steps. After that, harden installs: - npm: use lockfiles + npm ci + min-release-age - pnpm: minimum-release-age=10080, block-exotic-subdeps=true - Renovate: minimumReleaseAge 7 days

Aikido Security@AikidoSecurity

Update 5:05 PT: The attack has now expanded well beyond @TanStack and @Mistral. 373 malicious package-version entries across 169 npm package names, including @uipath, @squawk, @tallyui, @beproduct, and more. The malware propagates by stealing your CI credentials and using them to publish new compromised versions. Full IOCs, affected package list, and detection steps: aikido.dev/blog/mini-shai…

English

lightyear retweetledi

0xSero@0xSero·2d

Summarising advice: 1. Read often 2. Keep experiments simple 3. Keep track of all logs and stats 4. Change only 1 variable at a time 5. Work with others often 6. Stay close to your interests 7. Speak often with experts and seniors 8. Absorb fundamentals of fields Thank you <3

0xSero@0xSero

For any researchers in my network: I want to take research more seriously to produce useful info, I have no academic background. Beyond prompting what resources and practices would you recommend?

English

492

14.5K

lightyear retweetledi

stochasm@stochasticchasm·2d

Thinking Machines@thinkymachines

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

ZXX

358

67.9K

lightyear retweetledi

rohit@krishnanrohit·2d

The coolest thing with the previous big tech leap, the internet, was that most people building it were utopians. Kind of the opposite of today in a way.

English

4.2K

lightyear@__lightyear__·2d

@SydSteyerhart Read only? Widen your horizons. There’s a reason why the key to AGI is continuous learning

English

200

Syd Steyerhart@SydSteyerhart·2d

Nobody actually wants to become the Read-Only Memory of a dead person. It's interesting as a cyberpunk concept, but once you consider the implications you realize pretty quickly that it would suck.

Rand@rand_longevity

if you are alive in 15 years you are gonna be able to upload your mind and become semi-immortal

English

151

7.6K

lightyear@__lightyear__·2d

@segyges The model just has to stop you from being vague

English

100

SE Gyges@segyges·2d

any time an llm doesn't do what i want exactly right it's because i was vague so i actually mostly don't see what you'd do that would make a coding model better to me at this point

English

1.4K

lightyear@__lightyear__·3d

drop everything you're doing and /goal optimizing tests for literally every single app you're developing

English

lightyear@__lightyear__·3d

@benedictk__ prompt your other agents

English

Benedict Kerres@benedictk__·3d

I think there will be 2 ways to do knowledge work. 1. Fast response llms and quick iteration. 2. Leave the agent running for 1h and get back results. Waiting 5 min or so is just a terrible distraction. What to do while your llm is working.

English

2.2K

lightyear@__lightyear__·3d

@segyges @MugaSofer i mean this was basically orthodoxy until the 70's, it could have been henry ford saying this. the reagan era really fucked with our understanding of a Normal Economy

English

SE Gyges@segyges·3d

@MugaSofer @__lightyear__ oh i mean this part is essentially the dengist answer.

English

SE Gyges@segyges·3d

there is only one correct answer to any question given the same premises and you should expect all models to converge on that answer as they improve this may involve them being extremely boring, ideologically hostile, etc. the idea that they should disagree is anthropocentric

roon@tszzl

it is actually worrying that the models seem to have converged on similar beliefs on all important questions. they’re are neobuddhist neolibs which talk about annata and housing policy, including grok and the Chinese models! boring

English

1.4K

lightyear@__lightyear__·3d

Gemini 2.5-pro-0324 was actual agi. i realized that with in one specific occasion: i had it acting as a strategy officer for a project i was building, the idea involved setting up a website on a domain i already owned, but the registrar didn't support email routing or something, details are fuzzy. what i do remember clearly is that the model told me "hey, it turns out this registrar doesn’t support what you need. you’d need to transfer the domain to another one" so i asked "how difficult is that? can you look up how we’d do it?" and so gemini straight up said "no, that’s a waste of our fucking time, what you need is to work on your project and launch it, independently of the domain name" i went holy shit, it’s right. that shouldn’t be possible. a computer model generalizing my actual intention. that's what the nerds in /r/futurology were talking about when i was 13

English

354

Viv@Vtrivedy10·3d

ok but how sick would it be if Google re-released that one cracked checkpoint of gemini 2.5 pro from last March I actually thought it was so over at that very moment

English

14K

Keşfet

@xariusrke @reach_vb @thdxr @usr_bin_roygbiv @oH @SydSteyerhart @elonmusk @BarackObama