Beshr

232 posts

Beshr banner
Beshr

Beshr

@beshr

System Engineer / Architect. https://t.co/d9F61nupoD

P2X-3YZ Bergabung Mart 2008
498 Mengikuti1.5K Pengikut
Indie Game Joe
Indie Game Joe@IndieGameJoe·
This indie dev is making a game where you kick a soccer ball on your way to school through beautiful yet challenging levels - Unlock new levels - Expand your room - Don't be late for school It's called Kick. Would you play this?
English
47
158
2.1K
299.6K
Beshr
Beshr@beshr·
What if you could read RSS feeds like an old RPG / visual novel?
English
1
0
1
25
Beshr
Beshr@beshr·
@dhh Yet I request my own data from @foodora_se and they completely ignore my request, not even an answer, and I have to pursue whatever agency that’s supposed to help me bring claims against them.
English
0
0
0
194
DHH
DHH@dhh·
From a school notice: "Due to GDPR unfortunately, we cannot share the names of the other students in the class." 🤯 When the GDPR prevents you from knowing the names of next year's classmates, something has gone horribly, Europeanly wrong.
English
176
175
5.4K
208.9K
Beshr
Beshr@beshr·
I released Ossature v0.1.0 yesterday. Biggest changes are: AMD components have behavioral contracts now, planning is language-aware and overall better build error handling. Changelog: #010---2026-06-15" target="_blank" rel="nofollow noopener">github.com/ossature/ossat… ossature.dev
English
0
1
1
33
Beshr
Beshr@beshr·
@mitchellh I think the biggest push for this will come from enterprise (when they wise up at least). Not just re availability, Claude now censors and decides what _they_ deem to be ok to use their models for. Enterprise can't be subject to 3rd party moral and judgments.
English
0
0
1
555
Mitchell Hashimoto
Mitchell Hashimoto@mitchellh·
We've gone really quickly from "local models are dogshit" to "local models are good actually" (like, a 12 month window from A to B). I don't think they're actually good ENOUGH yet. We need an Opus 4.5 quality local model. When that happens, I think the world will spill over. Opus 4.5 is/was amazing, and is more than good enough for almost all tasks still as long as you pair with a frontier-level planner/judge. It'll still require a hugely expensive machine to run it, I'm sure, like a $5K or more laptop or mac studio. But, that's going to be pennies compared to the API costs plus all the benefits of guaranteed privacy and so on.
English
176
199
3.9K
245.1K
Beshr
Beshr@beshr·
@yuvadm Yeah, though lots of money in play can pull some strings I guess.
English
0
0
1
16
Yuval Adam
Yuval Adam@yuvadm·
@beshr I don't know if they can fake such a directive for marketing purposes but 100% the hype plays right into their hands.
English
1
0
1
47
Beshr
Beshr@beshr·
@thorstenball How much of the cost is Amp's and how much is Fable?
English
1
0
1
7.3K
Thorsten Ball
Thorsten Ball@thorstenball·
Day 3 with Fable. Gave a huge prompt to implement a feature across CLI, web server, and another server to both Fable and deep^2 in Amp. deep^2 was done before I went to the gym. It stopped short. Sent another prompt. $20. Fable ran for 1hr40min and cost $350. Results: They both understood the assignment and built the same thing. Maybe that's due to my prompt. Fable's worked on first try. Well done. Deep's looks correct but didn't work on first try. $20 vs. $350. I'm sure I could get deep^2 to make it work and we'd end up at, what, $40? While Fable is now at $457 after I asked some follow-up questions.
English
47
16
513
369.2K
Beshr
Beshr@beshr·
I get the AI slop problem, but not accepting all public prs won’t solve it and feels like a dick move. A vouch system like Ghostty would’ve been a more understandable path.
English
0
0
0
66
Beshr
Beshr@beshr·
I think this is a mistake. Ladybird would’ve been an obscure niche hobby project if it wasn’t framed as an opportunity to contribute to a real new browser and a way to learn from Andreas, now that the project is on its feet, they decides to ditch their roots.
Ladybird@ladybirdbrowser

Ladybird is moving into a new phase as we work toward our first alpha release. We are tightening how code enters the project: going forward, code changes will only be introduced by project maintainers, and we will no longer accept public pull requests. ladybird.org/posts/changing…

English
1
0
0
129
Beshr
Beshr@beshr·
@pauliusztin_ @pydantic If you're into composable harnesses, check out Ossature, though calling it a "harness" might be underselling it, it's basically a full build system for spec-driven code gen ;) Also built on @Pydantic AI under the hood (absolute banger of a lib btw) github.com/ossature/ossat…
English
0
0
2
112
Paul Iusztin
Paul Iusztin@pauliusztin_·
Building production-grade AI agents with @Pydantic? Here's one of the most interesting projects I've seen: → github.com/pydantic/pydan… Most agent demos stop at: Prompt + tools Maybe MCP A simple loop But production agents need much more than that. So the Pydantic AI team built Pydantic AI Harness. The idea is simple: Pydantic AI gives you the agent. The harness gives you the system around the agent. And the repo treats capabilities as modular building blocks. You compose only what you need. For example: CodeMode → sandboxed Python execution MCP → connect to external MCP servers Skills → progressive capability loading Memory → persistence across sessions Sub-agents → specialized child agents Verification loops → run tests + auto-fix failures Guardrails → approvals, budgets, secret masking The capability matrix is probably the most valuable part. Because it maps what matters for production agents: Execution Context management Memory Orchestration Reliability Safety And it openly tracks what’s solved vs. what’s still missing. The industry is moving away from giant monolithic agent frameworks. We're now seeing more composable harness layers around smaller agents. Here's the gist: The model is no longer the product... The harness is. Check out the full repo here: github.com/pydantic/pydan…
GIF
English
4
11
41
2.3K
Beshr
Beshr@beshr·
@dbreunig I think it’s a sign of early optimization, or at least the wrong place to start? The more we anthropomorphize the less use cases we’ll find. LLMs we have now are quite powerful, we need better ideas for deterministic tooling and systems to build around them, to make most use.
English
0
0
0
35
Drew Breunig
Drew Breunig@dbreunig·
Been thinking about how heavier, more use case driven post training potentially puts the model “on rails”.
Valerio Capraro@ValerioCapraro

This is the most interesting paper I have read this week. The authors test a wide range of LLMs on a massive dataset of behavioural experiments, with more than 200,000 participants and nearly 26 million human responses. Importantly, they compare base LLMs with post-trained versions. This allows them to test whether post-training make LLMs more or less human-like. The result is impressive: post-training makes models LESS human-like. I think this speaks to a broader problem. Current post-training methods are designed to optimize specific objectives. But optimizing one objective can shift the model in ways that are not localized to that objective. We have now seen several versions of this problem. A Nature paper showed that narrow fine-tuning on coding can induce misalignment in unrelated domains, including claims that humans should be enslaved by artificial intelligence. In our Computers in Human Behavior Reports paper, we showed that GPT treated torturing a woman to prevent a nuclear apocalypse as more acceptable than harassing her for the same purpose. And now this new paper. The emerging picture is that when AI developers optimize a model on one metric, they may be shifting the whole system in uncontrollable ways and produce catastrophic results in other metrics. * Main paper and other references in the first reply

English
3
1
7
2.6K
Beshr
Beshr@beshr·
@DavidKPiano I think the point can be made whether you start with a prototype or spec. You do need to know what you’re doing/using to know how to write the spec well, but I don’t think you can feed that to an LLM. Code is one interpretation of a spec: ossature.dev/blog/what-make…
English
0
0
1
37
David K 🎹
David K 🎹@DavidKPiano·
This is what spec-driven development tools/products get wrong IMO: the spec should fall out of the prototype, not the other way around One prototype is worth 100 spec drafts
Matt Pocock@mattpocockuk

The more I replace plans with prototypes, the better the outputs Who'd have thought that low fidelity prototypes were better than walls of spec Oh yeah, the entire industry for 20 years Stop going against decades of knowledge because someone in SF shipped it as a 'mode'

English
21
15
293
28.3K
Beshr
Beshr@beshr·
SDD done right
Beshr tweet media
English
0
0
1
53
Beshr
Beshr@beshr·
@dok2001 @momito I’ve been working on Ossature for a few months and a lot of ideas are similar I think, but the corner stone for me is the spec to structure the intent ossature.dev
English
0
0
3
1.4K