Norin

66 posts

Norin banner
Norin

Norin

@norlava

Building things I think are fun |📍SF | AI/ML, Prev Cisco, Microsoft, start-up | Pear VC PFC

San Francisco Katılım Mayıs 2025
28 Takip Edilen33 Takipçiler
Sabitlenmiş Tweet
Norin
Norin@norlava·
Hi X, I'm Norin, engineer turned PM, now back to engineer. I left big tech (iykyk - vibes have been kinda off) to build better reliability for coding agents, search, and other work I'm excited about. I’ll talk honestly about what I’m actually doing inside of larger codebases (not the news junket), my failures, and I hope to learn from you as well, maybe even meaningfully connect. I want this to be a space that’s more free. Based in SF. Follow along if you're in the same boat and like trying out new things.
English
0
0
1
141
Norin
Norin@norlava·
I think part of the challenge is just also the people, sentiment side of things. There are a lot of teams still where they're under constant pressure of layoffs or half their team was laid off, being expected to double their output, and leadership pushing them to adopt "AI-first mindset" or show ROI. And then there's the narrative that AI native companies push and how we have people sharing online how they've "cracked" some magical code which can make you feel like even more of an outsider who's behind. So I guess if that's your day to day experience, the motivation isn't there to get past the cold start. There are solutions here but it's tough, we need more empathy and realistic examples of what peers are doing as you say not being told to increase our productivity.
English
0
0
0
19
Kevin S Lin
Kevin S Lin@kevins8·
there’s currently a cold start problem for folks getting started with agents, especially in organizations everyone needs to discover best practices and the “right way” to do things. build skills. context. loops we need better ways to discover what peers are doing and what the top 10% do better than everyone else like @steipete suggestion of everyone sharing sanitized codex sessions
English
11
0
14
1.1K
Norin
Norin@norlava·
@kunchenguid This is a great example of why for long running tasks you need and should have the developer in the loop
English
0
0
0
534
Kun Chen
Kun Chen@kunchenguid·
ok be careful with your fable 5. i just ran into a new problem that never happened before - it's now doing things i didn't ask i just told it there is a bug in a repo. without checking with me, it did the fix AND raised a PR using my gh cli, claiming it's following CONTRIBUTING.md the PR was not bad, but it's a big surprise as it - assumed the credentials in my gh cli is the one i wanted to use - assumed i would be happy with the change as is - assumed i was ready to publish the work that's a lot of assumptions from just me telling it to fix a bug. i now feel the need to explicitly tell it NOT to do extra things which increased cognitive load for me and it's not a good feeling
Kun Chen@kunchenguid

ok Claude Fable 5 (Mythos) is finally here if you are on a subscription, go use it NOW because it may be removed from the subscription in a few days

English
77
14
482
145.1K
Norin
Norin@norlava·
Learnings + code from our side quest building reliable long running agents. With improving model capabilities + need for better verification this topic feels like a good one to discuss and improve upon together.
Norin@norlava

x.com/i/article/2064…

English
0
0
0
57
Norin
Norin@norlava·
The asterisk on anthropic's benchmark table shares how starred scores are Mythos 5 but the one your agent actually calls (Fable 5) falls back towards Opus 4.8 on those benchmarks because safety classifiers block answers (e.g. Terminal-Bench 2.1). Interesting... make of that what you will, real world testing is still important.
Norin tweet media
English
0
0
0
65
Daniel San
Daniel San@dani_avila7·
I partly agree… loops are the goal, but you can’t skip the prompting fundamentals It took me years to nail the right execution flow for routines that now run automatically. Not because the models weren’t capable, but because the surrounding software wasn’t ready for LLMs Today it is, and if you want reliable loops, you need a solid harness and proper observability first Those aren’t nice-to-haves, they’re the baseline
Peter Steinberger 🦞@steipete

Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.

English
6
4
45
3.7K
Norin
Norin@norlava·
This is everything🙌, there's push to offload entirely to the model/agent, cut the developer out, doesn't work. We've been building loops/workflows but with the opposite framing - engineer as the one steering with determinism. Sharing the code in case folks are looking for an actual example of what this could look like for production with HIL/manual review/control/visibility. x.com/norlava/status…
Norin@norlava

Lot of buzz around loops/workflows. Many saying they’ve been doing this but not sharing their code. We’re sharing our code. This was not trivial to construct, it’s taken hundreds of hours of studying coding agent implementations, feedback from developers using it in repos that are 10M+ LoC, and re-architecting 3x from scratch. This is a production loop as in you can and are supposed to use it to gain actual ROI with your agent. We built in public you can find it on Github under Atomic (bastani-inc). Real production code needs management across dependencies, teams, and not infinite tokens to burn. We realized you need a way for the developer to define their ‘loop’ explicitly with good design, no provider lock in, review gates, verbatim compaction (not what you see today in coding agents), HIL, and the ability to observe and steer agents mid run. Why share it? Because we think everyone should benefit from knowledge on how to use these because we can get better with each other faster. Less hype, just the code. Overall, write up on our learnings coming soon.

English
1
0
2
157
Norin
Norin@norlava·
Lot of buzz around loops/workflows. Many saying they’ve been doing this but not sharing their code. We’re sharing our code. This was not trivial to construct, it’s taken hundreds of hours of studying coding agent implementations, feedback from developers using it in repos that are 10M+ LoC, and re-architecting 3x from scratch. This is a production loop as in you can and are supposed to use it to gain actual ROI with your agent. We built in public you can find it on Github under Atomic (bastani-inc). Real production code needs management across dependencies, teams, and not infinite tokens to burn. We realized you need a way for the developer to define their ‘loop’ explicitly with good design, no provider lock in, review gates, verbatim compaction (not what you see today in coding agents), HIL, and the ability to observe and steer agents mid run. Why share it? Because we think everyone should benefit from knowledge on how to use these because we can get better with each other faster. Less hype, just the code. Overall, write up on our learnings coming soon.
English
0
0
0
742
Norin
Norin@norlava·
I'm also thinking that it's likely he's heavily optimized the OpenClaw ecosystem and would require serious reworking to generally work on all codebase shapes
English
0
0
0
54
Norin
Norin@norlava·
Okay so peter’s right about the method, it does work altho he’s a bit vague. Execution matters though, we've seen that if you want this to scale you need a legit ‘loop’ engine that is designed with developer in the loop (pun not intended) to avoid slop and costs. Idk if the labs think about that though or just offload it to the model, that doesn’t work (yet) and is too expensive. There’s a DX where human’s don’t ‘slow down’ AI in development but I guess it’s not hype so it doesn’t sell as nicely. We’ve been building like this for a couple of months.
Peter Steinberger 🦞@steipete

Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.

English
1
0
0
192
Norin
Norin@norlava·
@thdxr I feel this way about like most CLI tools
English
0
0
0
821
dax
dax@thdxr·
a whole bunch of companies that had good primitives but never figured out DX just got saved by AI i'm using all these things that were too rough to use before
English
52
17
783
68.9K