PME

1.4K posts

PME banner
PME

PME

@itsyourcode

Pro-grammer building the data agent for truth seekers @probablydatabot

San Francisco, CA Katılım Haziran 2023
2.2K Takip Edilen470 Takipçiler
PME
PME@itsyourcode·
@vansickn @MattRogish here's a draft teaser. will publish when our blog goes live in general which should be any day now
PME tweet media
English
0
0
0
4
PME
PME@itsyourcode·
Under-discussed problem right now with most frontier coding models. A leading contributor to slop and incidental complexity and daily pain Great read @vansickn !
PME tweet media
English
6
3
20
1.6K
kache
kache@yacineMTB·
I fear not the man who has written a thousand codebases But I fear the man that has written the same codebase a thousand times
kache tweet media
English
27
10
340
7.3K
PME
PME@itsyourcode·
@rodinrooh legendary run though
English
1
0
8
1.9K
PME
PME@itsyourcode·
@kitlangton The best part: it's always achievable
English
0
0
0
86
Kit Langton
Kit Langton@kitlangton·
An obvious tip for software design with or without AI that's nonetheless easy to get wrong: Purge all foreknowledge of implementation and edge case from your mind and imagine the platonic user-land API. Think first in terms of high-level intentions. Then see if it's achievable.
English
10
4
238
9.6K
PME
PME@itsyourcode·
@MattRogish @vansickn Oh man yes I have so much to say about this. Planning a blog post on it soon actually
English
1
0
2
18
PME
PME@itsyourcode·
This is actually such a big deal
PME tweet media
English
1
1
2
95
PME
PME@itsyourcode·
The best part about giving your code robot strict operating procedures is that when it deviates (unavoidable at this point) You can just say "I don't like the looks of <slop sighting>" And then it goes "I should fix <slop> with <correct thing> because <rule I ignored>"
English
0
0
1
25
PME
PME@itsyourcode·
@MattRogish @vansickn Totally and that's the dead giveaway because RL envs want strong verifiers which are actually pretty hard to construct without over fitting to said pathologies The less verifiable factors suffer in return
English
1
0
1
28
Matt “Friend of the pod” Rogish 🇺🇸
Ha! Yes, it must've been trained over and over on "don't break things" that it has a pathological over-cautiousness. I see it in commit time, too: * LLM writes code * runs tests, they pass * "Hey human! I wrote the code, please review!" * LGTM, commit and push * "Lemme run the tests a few more times, just in case. Committed the code. Let me run the tests to be sure before I push. I'll run them one last time" It wants to run the full suite ALL THE TIME. "I made a docs change. Lemme run the tests to make sure it didn't break them" - WTF who has their markdown tested?!
English
1
0
2
25
PME
PME@itsyourcode·
inb4 subagents
English
0
0
1
17
PME
PME@itsyourcode·
Just hang in there guys The slop rate tops out at the max output tok/s You just need to review exponentially faster wagmi
PME tweet media
English
1
1
4
65
PME retweetledi
Mira Murati
Mira Murati@miramurati·
Today we're sharing our work on interaction models. A new class of model trained from scratch to handle real-time interaction natively, instead of gluing it onto a turn-based one. youtu.be/A12AVongNN4
YouTube video
YouTube
English
308
912
8.6K
1.1M
PME
PME@itsyourcode·
@MattRogish @vansickn Dead on. It's comical the extent you resort to conventionally bad advice to get them to comply Imagine being a senior eng in 2017 telling your juniors "Never worry about backwards compatibility" "Break interfaces aggressively and update all callers"
English
1
0
3
62
Matt “Friend of the pod” Rogish 🇺🇸
YES I have to spread that in all my prompts, garbage like: "implementation work must replace vestigial object-shape assumptions outright. Do not preserve compatibility in code APIs. Write database migrations. No `TODO`/`pending`/`xit`/`skip` markers, no "implementation deferred" stubs, no dead buttons or unreachable routes. Do not defer something until some "later phase". Do it now." yada yada yada
English
2
0
2
70
PME
PME@itsyourcode·
@henrytdowling It's not a reason to stop using them but it is a reason to use them much more carefully and far less "automatically"
English
0
0
1
14
PME
PME@itsyourcode·
@henrytdowling My straw moment was last year, mainly the constant lies These days I just assume every action they take is >50% wrong and I focus on: Minimizing per-pass error (prompting, AGENTS.md, skills, live review/steering) Maximizing post-pass verification (strong E2E blackbox tests)
English
1
0
1
75
PME
PME@itsyourcode·
Wow suddenly the entire TL is talking about agent code quality Guess this weekend was the final straw for everyone
English
1
0
3
83
signüll
signüll@signulll·
the more thought you put into a post, the less it will resonate on the timeline. what’s a good name for this law?
English
409
18
994
90.2K
PME
PME@itsyourcode·
If you do not understand what I am saying here is a simple example: The model constantly null-checks values in local contexts that were passed in from higher order contexts that guarantee values cannot be null Simple illustrative example only
English
0
0
0
15
PME
PME@itsyourcode·
Would be better if they just executed that process inherently as part of their chain of thought / reasoning process Some models are more steerable than others and will adhere to a process like this via AGENTS.md or harness system prompt overrides
English
1
0
0
14
PME
PME@itsyourcode·
One illusive property of good abstraction is effective use of indirection LLMs lack inherent context and attention to consider indirected invariants in non trivial systems This is why they are so prone to applying _locally plausible_ but _globally incorrect_ edits
English
1
0
1
47