software archæologist

553 posts

software archæologist

@archeologistdev

building & scaling mission critical systems @Canva as simple as possible, but no simpler

loving life in Melbourne☕️🐾🌄 Beigetreten Nisan 2009

787 Folgt88 Follower

Angehefteter Tweet

software archæologist@archeologistdev·21 Şub

I have this old Intel NUC I got back in 2019 running Fedora. It never seemed that useful, but I kept it around anyways. It survived a break in where almost everything else was stolen. Fast forward to 2026 and it's now my vibecoding hub (no mac mini needed!). I can run coding agents like pi or OpenCode on it and prompt them no matter where I am in the world, for FREE, thanks to @Tailscale. It even sends me a daily briefing with news from Twitter, HN and Lobsters that's relevant to me (thanks @ashebytes for the inspiration). It really is the little NUC that could. Can't wait to use it to build more awesome stuff!

English

448

software archæologist@archeologistdev·1d

@thdxr @stolinski > but as with everything in opencode you can change it how?

English

540

dax@thdxr·1d

@stolinski this is here because whenever we support a model that has its own harness we try to emulate the same environment so this is coming from codex cli and we haven't deviated from it yet but as with everything in opencode you can change it

English

142

8.8K

Scott Tolinski - Syntax.fm@stolinski·1d

I love OpenCode but "Don't rely on flat, single-color backgrounds; use gradients, shapes, or subtle patterns to build atmosphere." should not be a choice it makes for me.

English

248

34.7K

software archæologist@archeologistdev·1d

@alanscodelog @stolinski @thdxr You can obviously add even more tokens to your agent prompt to override this, but that wastes context and further confuses the model which burns additional tokens where it tries to reconcile the conflicting instructions.

English

software archæologist@archeologistdev·1d

I have to agree. I tried to get my agent to commit frequently as it goes along, but it kept ignoring my instructions. Took me ages to realise it was due to the OpenCode system prompt explicitly telling the model NEVER to commit unless specifically asked. @thdxr says “like everything in OpenCode, you can just change it” but I’m unclear how.

English

software archæologist retweetet

Mario Zechner@badlogicgames·3d

i can't speak for david. what i see is this: if you let agents build or extend a codebase with only minor or no supervision, you get unmaintainable garbage, because the agent makes terrible decisions that compound, both big and small. those decisions make it hard for both you and the agent to keep modifying the code base, until eventually it's unrecoverable. why does the agent make bad decisions? i can't tell for sure, but my gut tells me that training data can currently not capture the holistic thinking needed to design and evolve complex systems. that's one part of the problem. related to that, and oversimplified: agents output the "mean quality" of the code they saw during training. most of that code is very bad. specifically tests, which humans are terrible at writing at. another part of the problem is that specification via prompt is not precise enough, so the agent has to fill in the blanks, giving it enough rope to hang itself. the more detailed your spec gets, so the agent gets constrained and less likely to produce crap, the closer you are to handwriting the code yourself, as that's the most detailed version of the spec that can exist. so then you gain nothing. back to prompt spec it is, which means the agent fills in blanks, which means we get suboptimal or truely bad results. using agents can still be a net productivity boost (see other posts in my thread), but it is not easy to come up with consistent workflows that produce both production quality maintainable code while retaining the speed advantages agents give you.

English

286

14.4K

software archæologist@archeologistdev·6d

@austinc3301 Openblock? More like openblocked lol

English

863

Agus 🔸@austinc3301·6d

Turns out they just committed straight-up fraud? Their package prebundles solutions for the benchmark and then gives them to the agent when it starts

English

796

93.8K

software archæologist@archeologistdev·6d

@flyosity @kristinatastic 1) What

English

849

Mike Rundle@flyosity·13 Mar

@kristinatastic quarters pounder

English

529

38K

kristina v. saint@kristinatastic·13 Mar

I've been working on this important list for a couple of years now. What am I missing?

English

1.1K

569

20.1K

922K

software archæologist@archeologistdev·6d

@samlambert What amazes me even more is how much of the response time is Ruby, suggesting there’s a lot of room for improvement

English

214

Sam Lambert@samlambert·13 Mar

ZXX

Sam Lambert@samlambert·13 Mar

GitHub's performance in 2013. The web used to be so fast.

English

1.1K

250.3K

software archæologist@archeologistdev·11 Mar

@glcst @jorandirkgreef @_mm_pause Genuine question: why was that particular language feature so important?

English

Glauber Costa@glcst·11 Mar

on the one hand, it is true that the team matters more than the language and I am sure you'd have found a way, Joran. But on the other hand, it's also too dismissive to say that the language doesn't matter at all. When we built Scylla, it was very clear that without C++11 style captures, that would have been an herculean task.

English

289

_mm_pause()@_mm_pause·11 Mar

I dislike language wars. language is only 1% of what matters. in systems programming, it's the team & engineering that count. look at tigerbeetle & bun great engineers drive huge impact. those projects could’ve easily been built in c, c++, or rust

'(Robert Smith)@stylewarning

I'm overhearing a FAANG tech meeting about how this 10?-year product written in C is being transitioned ("modernized") to C++. Started by changing to a C++ compiler, and slowly rewriting to use classes/exceptions, &c. It's been 3 months, and the C++ service keeps failing in prod.

English

software archæologist@archeologistdev·10 Mar

@opencode No longer as impressed since GLM5 started randomly going off the rails and speaking Chinese to me, or just spouting nonsensical token streams once it gets close to 50% context usage

English

software archæologist@archeologistdev·21 Şub

@opencode x.com/thdxr/status/2…

dax@thdxr

GLM5 is free in opencode for this week

QME

software archæologist@archeologistdev·21 Şub

Kimi K2.5 felt like an approximation of Opus: friendly, a bit over-enthusiastic, needs lot of steering or it goes haywire. In contrast, GLM 5 is giving strong Codex vibes. Comes up with sensible plans and executes them meticulously, happy to keep cooking for hours. I like it! 🫡

English

103

software archæologist@archeologistdev·10 Mar

@nateberkopec Agree that optimising a web app is much more open ended though, for two reasons - hard to define a single success metric: many possible different yet good outcomes - many inputs

English

software archæologist@archeologistdev·10 Mar

@nateberkopec I guess it is in the sense that it’s optimising a single metric. But not in the sense that it’s just tweaking a single input

English

Nate Berkopec@nateberkopec·10 Mar

While this is cool, optimizing a single variable for a nondeterministic process is the simplest possible thing you could optimize. Optimizing e.g. a web-app is 100x the dimensions here, along with a far more stringent correctness requirement. We have a long way to go.

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

9.2K

software archæologist@archeologistdev·7 Mar

@stainless_code I do agree that simpler is better. But do you use libraries? An operating system? ISA? Those are all interfaces we take for granted. Consider device drivers. For maximal efficiency and simplicity they should all be inlined into the kernel. Why doesn’t Linux (etc) work that way?

English

software archæologist@archeologistdev·8 Mar

@AgileJebrim Those details are linked in the first paragraph 🙃 but that’s completely beside the point. How do you handle time in the software you build and why is it better than this “irrelevant slop”?

English

Jebrim@AgileJebrim·7 Mar

@archeologistdev I was hoping this would be more technical about low level details of how system clocks work and the errors that can occur when sampling them, but this is just irrelevant slop built on top. Completely pointless.

English

software archæologist@archeologistdev·7 Mar

What separates (most) software from rocket engines is that it needs to change over time to accommodate new requirements. Well-designed interfaces and modules lower the cost of change (typically at the expense of execution time and increased LOC). Beware of overfitting to today’s requirements or you’ll pay the price tomorrow.

Jebrim@AgileJebrim

The way you achieve Raptor 3-style software is by eliminating interfaces and modules. Inline your code to streamline it. This is effectively what SpaceX did, going against the high cohesion, low coupling advice that dominates the industry. Raptor 4 will have fewer interfaces.

English

845

software archæologist@archeologistdev·7 Mar

@AgileJebrim Their design need only change to achieve greater efficiency in serving a well known and stable set of requirements. The design space for most software is vastly less constrained.

English

Jebrim@AgileJebrim·7 Mar

@archeologistdev Also SpaceX’s rocket engines clearly need to change their design a bunch as well.

English

software archæologist@archeologistdev·7 Mar

@TheYoungJI “Just” is doing a lot of work in that sentence

English

147

John Cunningham@TheYoungJI·7 Mar

@archeologistdev Pay the price all the time instead of just rewriting code later?

English

Yuriy Stets@stainless_code·7 Mar

Even though abstractions, sometimes, can be useful and siplify things, most of the time they only makes the system harder to understand and maintain. Code that is readable from top to bottom, without jumping around or having to read in-between the lines to understand the meta structure, make it much simpler and cheaper to modify, extend or rewrite.

English

software archæologist@archeologistdev·7 Mar

@AgileJebrim Sometimes you can invert the interface - but it’s still an interface! tigerbeetle.com/blog/2025-10-2…

English

software archæologist@archeologistdev·7 Mar

@AgileJebrim Not necessarily. Well designed interfaces have the opposite effect. Accessing the current wall clock time is a good example. By depending on an interface rather than directly referencing the system clock, we gain determinism which helps with testing.

English

Entdecken

@thdxr @stolinski @alanscodelog @austinc3301 @flyosity @kristinatastic @samlambert @glcst