Beshr

232 posts

Beshr

@beshr

System Engineer / Architect. https://t.co/d9F61nupoD

P2X-3YZ Bergabung Mart 2008

498 Mengikuti1.5K Pengikut

Beshr@beshr·17h

@IndieGameJoe ‘Soccer Kid

English

Beshr@beshr·17h

@IndieGameJoe Reminds me of Soccer, which is till play from time to time! youtu.be/tnu6Eb7Qn5U?t=…

YouTube

English

1.6K

Indie Game Joe@IndieGameJoe·18h

This indie dev is making a game where you kick a soccer ball on your way to school through beautiful yet challenging levels - Unlock new levels - Expand your room - Don't be late for school It's called Kick. Would you play this?

English

158

2.1K

299.6K

Beshr@beshr·18h

Ossature can now catch code that compiles but is wrong ossature.dev/blog/code-comp…

English

Beshr@beshr·1d

What if you could read RSS feeds like an old RPG / visual novel?

English

Beshr@beshr·2d

@dhh Yet I request my own data from @foodora_se and they completely ignore my request, not even an answer, and I have to pursue whatever agency that’s supposed to help me bring claims against them.

English

194

DHH@dhh·2d

From a school notice: "Due to GDPR unfortunately, we cannot share the names of the other students in the class." 🤯 When the GDPR prevents you from knowing the names of next year's classmates, something has gone horribly, Europeanly wrong.

English

176

175

5.4K

208.9K

Beshr@beshr·2d

I released Ossature v0.1.0 yesterday. Biggest changes are: AMD components have behavioral contracts now, planning is language-aware and overall better build error handling. Changelog: #010---2026-06-15" target="_blank" rel="nofollow noopener">github.com/ossature/ossat… ossature.dev

English

Beshr@beshr·2d

@mitchellh I think the biggest push for this will come from enterprise (when they wise up at least). Not just re availability, Claude now censors and decides what _they_ deem to be ok to use their models for. Enterprise can't be subject to 3rd party moral and judgments.

English

555

Mitchell Hashimoto@mitchellh·2d

We've gone really quickly from "local models are dogshit" to "local models are good actually" (like, a 12 month window from A to B). I don't think they're actually good ENOUGH yet. We need an Opus 4.5 quality local model. When that happens, I think the world will spill over. Opus 4.5 is/was amazing, and is more than good enough for almost all tasks still as long as you pair with a frontier-level planner/judge. It'll still require a hugely expensive machine to run it, I'm sure, like a $5K or more laptop or mac studio. But, that's going to be pennies compared to the API costs plus all the benefits of guaranteed privacy and so on.

English

176

199

3.9K

245.1K

Beshr@beshr·6d

@yuvadm Yeah, though lots of money in play can pull some strings I guess.

English

Yuval Adam@yuvadm·6d

@beshr I don't know if they can fake such a directive for marketing purposes but 100% the hype plays right into their hands.

English

Yuval Adam@yuvadm·6d

Did anyone not see this coming at some point?

Anthropic@AnthropicAI

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

English

312

Beshr@beshr·12 Haz

@thorstenball How much of the cost is Amp's and how much is Fable?

English

7.3K

Thorsten Ball@thorstenball·12 Haz

Day 3 with Fable. Gave a huge prompt to implement a feature across CLI, web server, and another server to both Fable and deep^2 in Amp. deep^2 was done before I went to the gym. It stopped short. Sent another prompt. $20. Fable ran for 1hr40min and cost $350. Results: They both understood the assignment and built the same thing. Maybe that's due to my prompt. Fable's worked on first try. Well done. Deep's looks correct but didn't work on first try. $20 vs. $350. I'm sure I could get deep^2 to make it work and we'd end up at, what, $40? While Fable is now at $457 after I asked some follow-up questions.

English

513

369.2K

Beshr@beshr·6 Haz

I get the AI slop problem, but not accepting all public prs won’t solve it and feels like a dick move. A vouch system like Ghostty would’ve been a more understandable path.

English

Beshr@beshr·6 Haz

I think this is a mistake. Ladybird would’ve been an obscure niche hobby project if it wasn’t framed as an opportunity to contribute to a real new browser and a way to learn from Andreas, now that the project is on its feet, they decides to ditch their roots.

Ladybird@ladybirdbrowser

Ladybird is moving into a new phase as we work toward our first alpha release. We are tightening how code enters the project: going forward, code changes will only be introduced by project maintainers, and we will no longer accept public pull requests. ladybird.org/posts/changing…

English

129

Beshr@beshr·5 Haz

TIL, a Wayland client is not allowed to position its own toplevel windows, so GTK4 removed them discourse.gnome.org/t/gtk4-migrati…

English

Beshr@beshr·1 Haz

This bitter anti-LLM sentiment is as insufferable as that "AGI will run the world" hype lunacy.

DHH@dhh

Empowering people to own and change their software was the open source slogan for decades. Now the grand democratization finally arrives, and it's all "yeah, but not like that" 🙄

English

Beshr@beshr·30 May

@pauliusztin_ @pydantic If you're into composable harnesses, check out Ossature, though calling it a "harness" might be underselling it, it's basically a full build system for spec-driven code gen ;) Also built on @Pydantic AI under the hood (absolute banger of a lib btw) github.com/ossature/ossat…

English

112

Paul Iusztin@pauliusztin_·30 May

Building production-grade AI agents with @Pydantic? Here's one of the most interesting projects I've seen: → github.com/pydantic/pydan… Most agent demos stop at: Prompt + tools Maybe MCP A simple loop But production agents need much more than that. So the Pydantic AI team built Pydantic AI Harness. The idea is simple: Pydantic AI gives you the agent. The harness gives you the system around the agent. And the repo treats capabilities as modular building blocks. You compose only what you need. For example: CodeMode → sandboxed Python execution MCP → connect to external MCP servers Skills → progressive capability loading Memory → persistence across sessions Sub-agents → specialized child agents Verification loops → run tests + auto-fix failures Guardrails → approvals, budgets, secret masking The capability matrix is probably the most valuable part. Because it maps what matters for production agents: Execution Context management Memory Orchestration Reliability Safety And it openly tracks what’s solved vs. what’s still missing. The industry is moving away from giant monolithic agent frameworks. We're now seeing more composable harness layers around smaller agents. Here's the gist: The model is no longer the product... The harness is. Check out the full repo here: github.com/pydantic/pydan…

GIF

English

2.3K

Beshr@beshr·24 May

@dbreunig I think it’s a sign of early optimization, or at least the wrong place to start? The more we anthropomorphize the less use cases we’ll find. LLMs we have now are quite powerful, we need better ideas for deterministic tooling and systems to build around them, to make most use.

English

Drew Breunig@dbreunig·24 May

Been thinking about how heavier, more use case driven post training potentially puts the model “on rails”.

Valerio Capraro@ValerioCapraro

This is the most interesting paper I have read this week. The authors test a wide range of LLMs on a massive dataset of behavioural experiments, with more than 200,000 participants and nearly 26 million human responses. Importantly, they compare base LLMs with post-trained versions. This allows them to test whether post-training make LLMs more or less human-like. The result is impressive: post-training makes models LESS human-like. I think this speaks to a broader problem. Current post-training methods are designed to optimize specific objectives. But optimizing one objective can shift the model in ways that are not localized to that objective. We have now seen several versions of this problem. A Nature paper showed that narrow fine-tuning on coding can induce misalignment in unrelated domains, including claims that humans should be enslaved by artificial intelligence. In our Computers in Human Behavior Reports paper, we showed that GPT treated torturing a woman to prevent a nuclear apocalypse as more acceptable than harassing her for the same purpose. And now this new paper. The emerging picture is that when AI developers optimize a model on one metric, they may be shifting the whole system in uncontrollable ways and produce catastrophic results in other metrics. * Main paper and other references in the first reply

English

2.6K

Beshr@beshr·14 May

@DavidKPiano I think the point can be made whether you start with a prototype or spec. You do need to know what you’re doing/using to know how to write the spec well, but I don’t think you can feed that to an LLM. Code is one interpretation of a spec: ossature.dev/blog/what-make…

English

David K 🎹@DavidKPiano·7 May

This is what spec-driven development tools/products get wrong IMO: the spec should fall out of the prototype, not the other way around One prototype is worth 100 spec drafts

Matt Pocock@mattpocockuk

The more I replace plans with prototypes, the better the outputs Who'd have thought that low fidelity prototypes were better than walls of spec Oh yeah, the entire industry for 20 years Stop going against decades of knowledge because someone in SF shipped it as a 'mode'

English

293

28.3K

Beshr@beshr·14 May

SDD done right

English

Beshr@beshr·13 May

What Makes a Good LLM Spec: ossature.dev/blog/what-make…

English

Beshr@beshr·11 May

@dok2001 @momito I’ve been working on Ossature for a few months and a lot of ideas are similar I think, but the corner stone for me is the spec to structure the intent ossature.dev

English

1.4K

Dane Knecht 🦭@dok2001·11 May

Built dozens of ad-hoc agent harnesses this year. All replaced by flueframework.com @momito's guide is a great place to start.

mohamed@momito

i’ve been playing around with flue, and have been really enjoying it i’ve created a visual guide, let me know what you think momito.co.uk/flue/?2

English

435

101.2K

Jelajahi

@IndieGameJoe @dhh @foodora_se @mitchellh @yuvadm @thorstenball @pauliusztin_ @pydantic