software archæologist

553 posts

software archæologist banner
software archæologist

software archæologist

@archeologistdev

building & scaling mission critical systems @Canva as simple as possible, but no simpler

loving life in Melbourne☕️🐾🌄 Beigetreten Nisan 2009
787 Folgt88 Follower
Angehefteter Tweet
software archæologist
software archæologist@archeologistdev·
I have this old Intel NUC I got back in 2019 running Fedora. It never seemed that useful, but I kept it around anyways. It survived a break in where almost everything else was stolen. Fast forward to 2026 and it's now my vibecoding hub (no mac mini needed!). I can run coding agents like pi or OpenCode on it and prompt them no matter where I am in the world, for FREE, thanks to @Tailscale. It even sends me a daily briefing with news from Twitter, HN and Lobsters that's relevant to me (thanks @ashebytes for the inspiration). It really is the little NUC that could. Can't wait to use it to build more awesome stuff!
English
0
1
2
448
dax
dax@thdxr·
@stolinski this is here because whenever we support a model that has its own harness we try to emulate the same environment so this is coming from codex cli and we haven't deviated from it yet but as with everything in opencode you can change it
English
2
0
142
8.8K
Scott Tolinski - Syntax.fm
I love OpenCode but "Don't rely on flat, single-color backgrounds; use gradients, shapes, or subtle patterns to build atmosphere." should not be a choice it makes for me.
Scott Tolinski - Syntax.fm tweet media
English
17
1
248
34.7K
software archæologist
software archæologist@archeologistdev·
@alanscodelog @stolinski @thdxr You can obviously add even more tokens to your agent prompt to override this, but that wastes context and further confuses the model which burns additional tokens where it tries to reconcile the conflicting instructions.
English
0
0
0
9
software archæologist
software archæologist@archeologistdev·
I have to agree. I tried to get my agent to commit frequently as it goes along, but it kept ignoring my instructions. Took me ages to realise it was due to the OpenCode system prompt explicitly telling the model NEVER to commit unless specifically asked. @thdxr says “like everything in OpenCode, you can just change it” but I’m unclear how.
English
2
0
0
16
software archæologist retweetet
Mario Zechner
Mario Zechner@badlogicgames·
i can't speak for david. what i see is this: if you let agents build or extend a codebase with only minor or no supervision, you get unmaintainable garbage, because the agent makes terrible decisions that compound, both big and small. those decisions make it hard for both you and the agent to keep modifying the code base, until eventually it's unrecoverable. why does the agent make bad decisions? i can't tell for sure, but my gut tells me that training data can currently not capture the holistic thinking needed to design and evolve complex systems. that's one part of the problem. related to that, and oversimplified: agents output the "mean quality" of the code they saw during training. most of that code is very bad. specifically tests, which humans are terrible at writing at. another part of the problem is that specification via prompt is not precise enough, so the agent has to fill in the blanks, giving it enough rope to hang itself. the more detailed your spec gets, so the agent gets constrained and less likely to produce crap, the closer you are to handwriting the code yourself, as that's the most detailed version of the spec that can exist. so then you gain nothing. back to prompt spec it is, which means the agent fills in blanks, which means we get suboptimal or truely bad results. using agents can still be a net productivity boost (see other posts in my thread), but it is not easy to come up with consistent workflows that produce both production quality maintainable code while retaining the speed advantages agents give you.
English
18
34
286
14.4K
Agus 🔸
Agus 🔸@austinc3301·
Turns out they just committed straight-up fraud? Their package prebundles solutions for the benchmark and then gives them to the agent when it starts
Agus 🔸 tweet media
English
24
34
796
93.8K
kristina v. saint
kristina v. saint@kristinatastic·
I've been working on this important list for a couple of years now. What am I missing?
kristina v. saint tweet media
English
1.1K
569
20.1K
922K
software archæologist
software archæologist@archeologistdev·
@samlambert What amazes me even more is how much of the response time is Ruby, suggesting there’s a lot of room for improvement
English
0
0
0
214
Sam Lambert
Sam Lambert@samlambert·
GitHub's performance in 2013. The web used to be so fast.
Sam Lambert tweet media
English
41
24
1.1K
250.3K
Glauber Costa
Glauber Costa@glcst·
on the one hand, it is true that the team matters more than the language and I am sure you'd have found a way, Joran. But on the other hand, it's also too dismissive to say that the language doesn't matter at all. When we built Scylla, it was very clear that without C++11 style captures, that would have been an herculean task.
English
1
0
6
289
_mm_pause()
_mm_pause()@_mm_pause·
I dislike language wars. language is only 1% of what matters. in systems programming, it's the team & engineering that count. look at tigerbeetle & bun great engineers drive huge impact. those projects could’ve easily been built in c, c++, or rust
'(Robert Smith)@stylewarning

I'm overhearing a FAANG tech meeting about how this 10?-year product written in C is being transitioned ("modernized") to C++. Started by changing to a C++ compiler, and slowly rewriting to use classes/exceptions, &c. It's been 3 months, and the C++ service keeps failing in prod.

English
1
0
9
5K
software archæologist
software archæologist@archeologistdev·
@opencode No longer as impressed since GLM5 started randomly going off the rails and speaking Chinese to me, or just spouting nonsensical token streams once it gets close to 50% context usage
English
0
0
0
42
software archæologist
software archæologist@archeologistdev·
Kimi K2.5 felt like an approximation of Opus: friendly, a bit over-enthusiastic, needs lot of steering or it goes haywire. In contrast, GLM 5 is giving strong Codex vibes. Comes up with sensible plans and executes them meticulously, happy to keep cooking for hours. I like it! 🫡
software archæologist tweet media
English
1
0
2
103
software archæologist
software archæologist@archeologistdev·
@nateberkopec Agree that optimising a web app is much more open ended though, for two reasons - hard to define a single success metric: many possible different yet good outcomes - many inputs
English
0
0
0
67
software archæologist
software archæologist@archeologistdev·
@nateberkopec I guess it is in the sense that it’s optimising a single metric. But not in the sense that it’s just tweaking a single input
English
1
0
1
68
Nate Berkopec
Nate Berkopec@nateberkopec·
While this is cool, optimizing a single variable for a nondeterministic process is the simplest possible thing you could optimize. Optimizing e.g. a web-app is 100x the dimensions here, along with a far more stringent correctness requirement. We have a long way to go.
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
10
0
23
9.2K
software archæologist
software archæologist@archeologistdev·
@stainless_code I do agree that simpler is better. But do you use libraries? An operating system? ISA? Those are all interfaces we take for granted. Consider device drivers. For maximal efficiency and simplicity they should all be inlined into the kernel. Why doesn’t Linux (etc) work that way?
English
0
1
1
35
software archæologist
software archæologist@archeologistdev·
@AgileJebrim Those details are linked in the first paragraph 🙃 but that’s completely beside the point. How do you handle time in the software you build and why is it better than this “irrelevant slop”?
software archæologist tweet media
English
1
0
0
42
Jebrim
Jebrim@AgileJebrim·
@archeologistdev I was hoping this would be more technical about low level details of how system clocks work and the errors that can occur when sampling them, but this is just irrelevant slop built on top. Completely pointless.
English
1
0
0
27
software archæologist
software archæologist@archeologistdev·
What separates (most) software from rocket engines is that it needs to change over time to accommodate new requirements. Well-designed interfaces and modules lower the cost of change (typically at the expense of execution time and increased LOC). Beware of overfitting to today’s requirements or you’ll pay the price tomorrow.
Jebrim@AgileJebrim

The way you achieve Raptor 3-style software is by eliminating interfaces and modules. Inline your code to streamline it. This is effectively what SpaceX did, going against the high cohesion, low coupling advice that dominates the industry. Raptor 4 will have fewer interfaces.

English
3
0
2
845
software archæologist
software archæologist@archeologistdev·
@AgileJebrim Their design need only change to achieve greater efficiency in serving a well known and stable set of requirements. The design space for most software is vastly less constrained.
English
1
0
0
23
Jebrim
Jebrim@AgileJebrim·
@archeologistdev Also SpaceX’s rocket engines clearly need to change their design a bunch as well.
English
1
0
3
63
Yuriy Stets
Yuriy Stets@stainless_code·
Even though abstractions, sometimes, can be useful and siplify things, most of the time they only makes the system harder to understand and maintain. Code that is readable from top to bottom, without jumping around or having to read in-between the lines to understand the meta structure, make it much simpler and cheaper to modify, extend or rewrite.
English
1
1
2
31
software archæologist
software archæologist@archeologistdev·
@AgileJebrim Not necessarily. Well designed interfaces have the opposite effect. Accessing the current wall clock time is a good example. By depending on an interface rather than directly referencing the system clock, we gain determinism which helps with testing.
English
1
0
0
46