akira

4.8K posts

akira

@realmcore_

Making an autonomous swe • @0xrandomlabs Incepto ne desistam Pax aeternum Memento Mori

elysium Katılım Kasım 2021

770 Takip Edilen9.6K Takipçiler

Sabitlenmiş Tweet

akira@realmcore_·12 Mar

x.com/i/article/2031…

ZXX

197

650.3K

akira@realmcore_·10h

@badlogicgames “Just use TDD”

English

140

Mario Zechner@badlogicgames·16h

smilarly great signals: - managing agents is like managing a team of humans - but i have review agents - spec is all you need

English

285

Mario Zechner@badlogicgames·16h

i actually don't want this "but you don't review compiler output either" meme to die. it's the perfect signal for being immediately able to ignore someone in this space.

solst/ICE of Astarte@IceSolst

Interesting article on treating agent output like compiler output (and why) skiplabs.io/blog/codegen_a…

English

1.3K

76.7K

akira@realmcore_·1d

@JoshPurtell what did he mean by this

English

130

Josh@JoshPurtell·1d

The story for harness opt in multi-agent settings is likely conclusive imo Long horizon, tbd

justin@justinsunyt

@JoshPurtell especially for multi-agent yes!! here raw intelligence !== output quality at all

English

1.4K

akira retweetledi

Tejas Bhakta@tejasybhakta·2d

probably still nothing

Tejas Bhakta@tejasybhakta

morph is 2 people we spend 10x more on gpus than salary we’re hiring for the first sub-10-person billion-dollar company. join us

English

9.2K

akira@realmcore_·2d

@arb8020 Not necessarily true, but theres a trick to it

English

arb8020@arb8020·3d

however if you're using gpt 5.5 it will automatically take at least 1.5x as much time

arb8020@arb8020

you can refactor anything in two weeks

English

2.2K

akira@realmcore_·2d

@arb8020 Yes

arb8020@arb8020·3d

you can refactor anything in two weeks

English

2.7K

akira@realmcore_·2d

@tejasybhakta It’s quite stable in general in how it handles problem solving It’s code is still too dense and abstracted but mostly pretty good

English

Tejas Bhakta@tejasybhakta·3d

@realmcore_ It’s the first openai model I’ve thought is good. Concerningly good at writing kernels

English

akira@realmcore_·3d

FYI 5.5 kind of fixes this. Lots of theories on intermediate stuff but yeah Fixes it mostly

akira@realmcore_

5.3 codex xhigh is significantly better than 5.4 xhigh for coding. It's genuinely nuts

English

869

akira@realmcore_·2d

@arb8020 Actually

English

arb8020@arb8020·3d

@realmcore_ cursed +/-

English

akira@realmcore_·3d

Been a bit quiet around here We were in fact Cooking Learned a ton about the models and automating the tools as well Will share in due time Needless to say GPT 5.5 is a very very interesting model, and the shape over which autonomy works is jagged

English

952

akira@realmcore_·3d

What a great great day To fight the goblins in the coding dungeon

English

332

akira retweetledi

Pranjali Awasthi@raidingAI·4d

Announcing @slashyai iMessage bot The first email client with a blue bubble You can do anything on Slashy via text. → Draft & send emails in seconds → Get pinged the second a customer emails you → Schedule or reschedule anything instantly → Build automations that actually work → Send voice memos → Literally anything else you could want And yes we still have a web/mobile/desktop app :) And yes we are still just $30 per month to get started

English

4.6K

akira@realmcore_·24 Nis

@arb8020 man.

arb8020@arb8020·24 Nis

@realmcore_ twitter

English

arb8020@arb8020·23 Nis

if i had a nickel for every time there was a large twitter presence who went to go work there to work on performance and then left within the year. i'd have two nickels. which isn't a lot. but its weird that it happened twice.

English

1.2K

akira@realmcore_·18 Nis

I mean end to end dev setup actually! Certain dev setups lend themselves particularly well to specific models and more generally full autonomy Ex: I notice 5.4 Xhigh has a strong bias towards intermediate type validation/transformation If you codebase is written in a way where this is expected then the resulting code will be much more acceptable than in a codebase where there is no validation or where it is centralized Same for error handling patterns Anything in particular you find 5.4 to be better for than 5.3?

English

Ryan Brewer@ryanbrewer·17 Nis

@realmcore_ We have free will for model choices and aren’t constrained to the newest one if that’s what you mean. We haven’t changed much in terms of prompting / setup

English

akira@realmcore_·15 Nis

5.3 codex xhigh is significantly better than 5.4 xhigh for coding. It's genuinely nuts

akira@realmcore_

gpt 5.4 How do you guys get this model to not do random algorithmic garbage and just write straightforward procedural code I do not see why it should be this hard for a model to write code like a first year college student probably skill issue tbh

English

399

88.1K

akira@realmcore_·17 Nis

@ryanbrewer Presumably you also have everything set up for this to be the case?

English

Ryan Brewer@ryanbrewer·16 Nis

@realmcore_ Internally my team is all on 5.4 xhigh fwiw

English

126

akira@realmcore_·16 Nis

@VJain47 In jest!

English

Garry Fan@VJain47·16 Nis

@realmcore_ If the labs had me they would’ve solved everything

English

akira@realmcore_·16 Nis

The labs in fact have not yet solved everything In light of recent events we might be on the proliferation timeline

Shannon Sands@max_paperclips

none of this is true btw. claude code still sucks and crashes often, google is notorious for dropping projects, openai has a trail of abandoned ideas and then you've got vendor lock-in to worry about on top of that there's plenty of room, same as always

English

1.7K

akira retweetledi

Shannon Sands@max_paperclips·16 Nis

alex fazio@alxfazio

the window for experimenting with llms has basically closed now. the megacorps have fully hit escape velocity and are shipping new products and new features daily. the shift is that they’re not just shipping llms anymore, they’re using llms to build products and improve existing ones at scale. the wild west era of llms isn’t really the wild west anymore. a year ago, this could’ve been an indie dev side project, maybe even a monetizable product. it was literally so easy that the only real bottleneck was your free time. now, whatever idea you have, you should basically assume google/anthropic/oai will build some version of it within a week and wipe out most of the startup surface area around it

English

503

25.1K

akira@realmcore_·15 Nis

@Gana_L_ So that you dear reader can share in the black hole sun of economically viable super intelligence (The alternative was all caps)

English

Gana@Gana_L_·15 Nis

@realmcore_ Why are u talking like gpt 5.4?

English

akira@realmcore_·10 Nis

English

116

115.6K

akira@realmcore_·15 Nis

@merlindru Yeah opus is great at matching intent

English

merlin@merlindru·15 Nis

@realmcore_ main reason i use Opus for anything beyond very focused changes i feel like if there's exactly one good way to do something, GPT-5.4 does beautifully as soon as something is even slightly open ended it goes haywire really hoping Spud fixes this (tmrw?)

English

163

akira@realmcore_·15 Nis

5.4 is so incredibly prone to overabstraction

English

5.4K

Keşfet

@badlogicgames @JoshPurtell @arb8020 @tejasybhakta @slashyai @elonmusk @BarackObama @taylorswift13