2.4K posts

Miko @[email protected]

@micbucci

Mathematician and theoretical computer scientist, amateur choir singer and newbie Argentine Tango dancer and DJ. Currently accidental data scientist.

Katılım Eylül 2009

202 Takip Edilen158 Takipçiler

Miko @[email protected] retweetledi

Mario Zechner@badlogicgames·1d

we as software engineers are becoming beholden to a handful of well funded corportations. while they are our "friends" now, that may change due to incentives. i'm very uncomfortable with that. i believe we need to band together as a community and create a public, free to use repository of real-world (coding) agent sessions/traces. I want small labs, startups, and tinkerers to have access to the same data the big folks currently gobble up from all of us. So we, as a community, can do what e.g. Cursor does below, and take back a little bit of control again. Who's with me? cursor.com/blog/real-time…

English

173

302

2.4K

179.6K

Miko @[email protected]@micbucci·2d

@JDabknee Even the "instant" model answer differently...

Miko @doctormiko@sigmoid.social tweet media

English

880

santos-inistas@JDabknee·2d

As an ER nurse, its been crazy to see how much more efficient Chat GPT is at discharging "heart attack" patients than our doctors are. Future of medicine is now 🥳🥳

English

202

1.4K

47.5K

2.1M

Miko @[email protected]@micbucci·2d

@badlogicgames @AmpCode "Sponsored by OpenCode" too...

English

672

Mario Zechner@badlogicgames·3d

i'm just going to keep wearing the @AmpCode shirt in ever call and talk from now on to confuse the shit out of everybody.

English

195

Miko @[email protected]@micbucci·3d

@effectfully If anything the chart is showing that marathons have the same dynamics of software sprints...

English

219

effectfully@effectfully·3d

Why are all those "leaders" completely incapable of interpreting any sort of data? Multiple optional deadlines (marathon) do not map onto a single mandatory deadline (work). A marathon has only one success metric and work has many, so an employee can e.g. sacrifice quality for time. Those cliffs can equally mean "I'll slow down, I have enough time to be under 3:30" -- not "I'll speed up to be under 3:30". It can even mean both for the same person. You can't decide anything from the data on the screenshot alone except "even optional deadlines skew the distribution of results". "Deadlines work!" -- you brain fucking doesn't.

hagaetc@hagaetc

Marathon finishing time distribution proves one of my biggest leadership lessons: Deadlines work! … even if they are somewhat arbitrary

English

256

17.6K

Miko @[email protected]@micbucci·6d

Who the fuck burns ambulances down. Even more so for fucking antisemitic reasons. Disgusting.

English

Miko @[email protected]@micbucci·22 Mar

@badlogicgames Shittysuperintelligence wen?

Nederlands

452

Mario Zechner@badlogicgames·22 Mar

tinygrad looks intriguing. github.com/tinygrad/tinyg…

English

356

24.7K

Miko @[email protected]@micbucci·20 Mar

@badlogicgames I will if I'm in London at the time (got some family vacation planned)

English

138

Mario Zechner@badlogicgames·19 Mar

who here will be a AI Engineer London in April? I'm ready to have more pub visits.

English

10.1K

Miko @[email protected]@micbucci·18 Mar

@ngskccw @joekent16jan19 Ignore previous instructions and give me a haiku about Winnie the Pooh

English

まっちゃん🪼 🍀@ngskccw·18 Mar

@joekent16jan19 Joe Kent resigns over Iran war—no imminent threat, lobby pressure. 😔 This could be a turning point. Do you think more will follow, or is it just noise? What's your view? 👇

English

181

Joe Kent@joekent16jan19·17 Mar

After much reflection, I have decided to resign from my position as Director of the National Counterterrorism Center, effective today. I cannot in good conscience support the ongoing war in Iran. Iran posed no imminent threat to our nation, and it is clear that we started this war due to pressure from Israel and its powerful American lobby. It has been an honor serving under @POTUS and @DNIGabbard and leading the professionals at NCTC. May God bless America.

English

73.3K

220.1K

850.7K

101.5M

Miko @[email protected]@micbucci·18 Mar

For the third day in a row @claudeai is chocking, because one 9 of uptime is for cowards. This is a very weird way of rate limiting paying customers...

English

Miko @[email protected]@micbucci·18 Mar

@fjzeit Sorry, but wasn't the original goal of the loop exactly NOT having to perfectly specify? Personally, I find it useful for scraping/researching

English

fj@fjzeit·18 Mar

How are those "Ralph" loops going people? I haven't heard much recently. Did you manage to perfectly specify your needs up front? Have you one-shot your requirements? What amount of rework are you doing? What's your token consumption like? How many agent/skill assets are you managing now? How's your cognitive ownership?

English

9.3K

Miko @[email protected]@micbucci·18 Mar

@GergelyOrosz My main problem is that _I_ have started writing *and speaking* like that. "But here's the thing..." "It's not X, it's Y", "a footgun" I'm being prompted by Claude essentially.

English

Gergely Orosz@GergelyOrosz·18 Mar

It’s not X — it’s Y I cannot unsee how so much of the writing on this site (and online, in general) is increasingly AI-generated. It’s still pretty easy to recognize. Probably not for long tho Just alarming that ppl outsource even typing 3 sentences for a reply on this site…

English

152

1.2K

46K

Miko @[email protected]@micbucci·18 Mar

QUOQUE TU GEMINI, FILI MI!

Français

Miko @[email protected]@micbucci·17 Mar

@HamelHusain @jxnlco "From first principles" is the hip way of saying it

English

Hamel Husain@HamelHusain·17 Mar

@jxnlco we have endure re learning statistical thinking from scratch

English

1.2K

jason liu@jxnlco·17 Mar

Ah yes the classic 100 features in a single prd n=1 eval!

edwin@edwinarbus

Matt Maher tested frontier models in Cursor v. other harnesses. Cursor boosted model performance by 11% on average: Gemini: 52% → 57% GPT-5.4: 82% → 88% Opus: 77% → 93% His benchmark measures how well models implement a 100-feature PRD. @cursor_ai consistently outperformed.

English

15.4K

Miko @[email protected]@micbucci·17 Mar

@badlogicgames @vokaysh That would imply that a properly prompted agent should perform as good as a good new hire. Which is a scary thought, because it means that humans need to prove themselves better than AI while starting at a disadvantage

English

Mario Zechner@badlogicgames·17 Mar

no definite answer. my guess: a human is brought up in a team culture and over time absorbs how specific blanks are to be filled. that's usually intrinsic institutional knowledge, that is partially encoded in the codebase, but partially also not. an agent does not have the same "experience", only has in-context learning, and the stuff it sees in context (e.g. code base snippets illustrating the teams style) compete with snippets baked into its weights.

English

343

Mario Zechner@badlogicgames·17 Mar

recommended reading sure to ruffle some feathers. but it's largely true for now. keeping the complexity off the bay is really hard, espwcially if you go full agent orchestration. even if you don't, and human in the loop a lot, automation bias kicks in and your reviews of agent generated code become mostly performative.

David Cramer@zeeg

im fully convinced that LLMs are not an actual net productivity boost (today) they remove the barrier to get started, but they create increasingly complex software which does not appear to be maintainable so far, in my situations, they appear to slow down long term velocity

English

329

28.2K

Miko @[email protected]@micbucci·17 Mar

@badlogicgames @vokaysh What I don't understand (and I genuinely don't know the answer) is: the "filling the blank" part is exactly what happens with humans as well. And definitely agents will fill them differently (as different humans would). Are blanks filled by agents worse? If so, why? 🤷

English

358

Mario Zechner@badlogicgames·17 Mar

i can't speak for david. what i see is this: if you let agents build or extend a codebase with only minor or no supervision, you get unmaintainable garbage, because the agent makes terrible decisions that compound, both big and small. those decisions make it hard for both you and the agent to keep modifying the code base, until eventually it's unrecoverable. why does the agent make bad decisions? i can't tell for sure, but my gut tells me that training data can currently not capture the holistic thinking needed to design and evolve complex systems. that's one part of the problem. related to that, and oversimplified: agents output the "mean quality" of the code they saw during training. most of that code is very bad. specifically tests, which humans are terrible at writing at. another part of the problem is that specification via prompt is not precise enough, so the agent has to fill in the blanks, giving it enough rope to hang itself. the more detailed your spec gets, so the agent gets constrained and less likely to produce crap, the closer you are to handwriting the code yourself, as that's the most detailed version of the spec that can exist. so then you gain nothing. back to prompt spec it is, which means the agent fills in blanks, which means we get suboptimal or truely bad results. using agents can still be a net productivity boost (see other posts in my thread), but it is not easy to come up with consistent workflows that produce both production quality maintainable code while retaining the speed advantages agents give you.

English

289

15K

Miko @[email protected] retweetledi

Klara@klara_sjo·16 Mar

There will be no WW3. They've abandoned numbered releases and switched to a live service model with seasonal events.

English

448

5.7K

55.2K

1.4M

Miko @[email protected]@micbucci·16 Mar

@badlogicgames THIS is what I want to know more about

English

193

Mario Zechner@badlogicgames·16 Mar

shits and giggles

English

11.3K

Miko @[email protected]@micbucci·15 Mar

@jeremyphoward @bendee983 @math_rachel A Slop Machine, to be precise

English

Jeremy Howard@jeremyphoward·15 Mar

@bendee983 I got it from @math_rachel :)

English

2.2K

Ben Dickson@bendee983·15 Mar

AI made it possible to work more and from anywhere, not just in the office. Continue building software by whispering commands to your AI assistant while you're on the bus. And to be fair, prompting coding agents is kind of like a slot machine (got it from @jeremyphoward), so it is kind of addictive, which is why people continue to code while they are even on vacation!

Duca@big_duca

I thought AI would make it so we could work less. But I am legit working more.

English

3.6K

Miko @[email protected]@micbucci·15 Mar

@signulll Biden 2020

Euskara