difficultyang

2.7K posts

difficultyang

@difficultyang

More social alt of @ezyang

参加日 Nisan 2022

57 フォロー中2.9K フォロワー

difficultyang@difficultyang·22h

@kalomaze @celestepoasts "Just solve it with scaffolding"

English

165

kalomaze@kalomaze·22h

@celestepoasts there is a genre of benchmarks that tests for problems that are "difficult" in extremely shallow ways. sometimes bad abstractions are just bad abstractions

English

1.9K

Celeste@celestepoasts·23h

I don't like the implication that models suck at esolangs just because of limited training data lol.

Lossfunk@lossfunk

2/ Our method: test them on esoteric programming languages. Brainfuck. Befunge-98. Whitespace. Unlambda. Shakespeare. All Turing-complete. All requiring identical reasoning to Python. All with 1,000-100,000x fewer GitHub repos than mainstream languages. Same problems. Radically less training data.

English

139

17.1K

difficultyang@difficultyang·1d

@thinkingshivers @tautologer It's much easier to spend billions on GPUs, than it is to spend millions on building good products

English

Shivers@thinkingshivers·2d

One of the mysteries of AI labs to me, is their willingness to spend billions on making SOTA models, paired with their unwillingness to spend millions building good products for the models. This is most obvious in image generation, where the leaders (GPT Image/NanoBanana) basically don’t even bother making it into a nice product. Fuck it, just throw it into the chat app.

English

337

tautologer@tautologer·2d

Kagi Translate going semi-viral shows once again that UX > raw capabilities overhang go brrrrr

English

1.6K

difficultyang@difficultyang·2d

New eval just dropped (this should be a good kick in the pants for me to finally write this part of the PyTorch Internals blog post...)

English

2.1K

difficultyang@difficultyang·3d

finishing up a vibe coding sprint feels a bit like coming out of a forge frenzy

English

543

difficultyang@difficultyang·3d

opus, why do you always say you prefer codex's implementation :rofl:

English

1.4K

difficultyang@difficultyang·3d

The absolute misery of asking an LLM to do something mathy and now there are BIPARTITE GRAPHS and CONNECTED COMPONENTS and GCDS and now I have to buckle up and do math

English

805

difficultyang@difficultyang·4d

@A_K_Nain CC. I bet codex would do better.

English

198

Aakash Kumar Nain@A_K_Nain·4d

@difficultyang CC or codex?

270

difficultyang@difficultyang·4d

I notice LLMs do very poorly with FSDP "wrapping" style; just completely unable to trace the flow of execution when FSDP wrappers are involved

English

2.3K

difficultyang@difficultyang·4d

@aymuosk The depths of human mistakes knows no bounds!

English

Soumya Snigdha Kundu@aymuosk·4d

@difficultyang Have you found this to be true at any context size?

English

difficultyang@difficultyang·4d

Usually I am the one checking the LLMs. But something I find very valuable about doing code edits through LLMs rather than doing it by hand is that the LLM can check me, when I ask for something nonsensical or incorrect!

English

737

difficultyang@difficultyang·4d

I like this version best, IMO

English

212

difficultyang@difficultyang·5d

@tmuxvim That's a really insightful question. Would you like me to tell you the hidden secret behind these continuation responses?

English

539

tmuxvim@tmuxvim·5d

has anyone else noticed that GPT-5.4 often ends its responses with like, clickbait? it often promise to reveal "the one surprising X that will do Y" or something like that

English

612

7.1K

418K

difficultyang@difficultyang·5d

@SMT_Solvers @ezyang Opus thinks IRs are too terse and LLMs would much prefer things that are wordier

English

Chad Brewbaker@SMT_Solvers·5d

@ezyang Golfscript - some terse DSL.

English

194

Edward Z. Yang@ezyang·5d

A question of intense interest to me is how compilers (and more specifically, compilers for deep learning) should evolve in the era of LLM coding. 🧵

English

266

21.8K

difficultyang@difficultyang·5d

@rupanshusoi @ezyang A generation of Halide pilled PhD students and then it turned out it was very difficult to make work in the real world 😂

English

Rupanshu Soi@rupanshusoi·5d

@ezyang One thought is that scheduling languages that decouple performance from correctness are probably important going forward. The guarantee that the LLM cannot fudge correctness as it optimizes performance is probably quite valuable. (But designing such a language is hard.)

English

376

difficultyang@difficultyang·5d

@wookash_podcast Two words: "Committed Spend"

English

489

Łukasz | Wookash Podcast@wookash_podcast·5d

ok, i'm getting conflicting AI reports here some folks, say that in their tech companies, engineers are being asked to increase their token usage, while others are restricting tokens for highest spenders what's going on? all companies are 1k+ employees

English

9.7K

difficultyang@difficultyang·5d

@tri_nomad @ezyang @pangramlabs It's not slop, it's artisanally crafted electric impulses on responsibly sourced silicon

English

segv11@tri_nomad·5d

@difficultyang @ezyang @pangramlabs My God

English

difficultyang@difficultyang·5d

@ezyang @pangramlabs AI generated?

English

103

Edward Z. Yang@ezyang·5d

(3) LLMs are non-deterministic and slow. It's not a build step: it's the process of optimizing a codebase by hand over some period of time. The output is checked into VCS. There will be friction to "recompiling" (but you will do it when the source model changes.)

English

1.6K

difficultyang@difficultyang·5d

g-d it anthropic, too much fuckin glazing LOL

English

4.7K

difficultyang@difficultyang·5d

One of the crispest articulations I have for how models have gotten better compared to last year: if the sequence of things it should do is spelled out, they will reliably do it. This includes asking the LLM to revert things it did, or asking it to do it again with the plan.

English

1.1K

difficultyang@difficultyang·6d

@main_horse I ended up doing an italicized disclosure, probably good enough for now 😂

English

140

difficultyang@difficultyang·13 Mar

I have been very picky about not using LLMs to write my blog posts, but I am very tempted to use it to write some boilerplate "expository" text

English

1.1K

ディスカバー

@kalomaze @celestepoasts @thinkingshivers @tautologer @A_K_Nain @aymuosk @tmuxvim @SMT_Solvers