echantech

3.7K posts

echantech

@echantech1

code stonks and snark. opinions are my own. account not for sale.

Bay Area, CA Katılım Ağustos 2020

441 Takip Edilen544 Takipçiler

Sabitlenmiş Tweet

echantech@echantech1·24 Nis

We do not need more AI agents. We need more control over them. That is why I created Invoker, an open-core execution engine for AI-driven engineering workflows. The bottleneck is no longer just code generation. It is execution control. AI work needs isolation, replay, auditability, recovery, and human decision points. The model resembles build systems and workflow engines more than it resembles a theoretically “AI-first” chat interface. @edbertchantech/invoker-more-control-not-more-agents-ab3fa8190c8c" target="_blank" rel="nofollow noopener">medium.com/@edbertchantec…

English

877

echantech@echantech1·21m

I still get those moments when I review code actually. I’ll get the “what the hell is this surrounding stuff?” And start quizzing the AI about what was done and why and go on this long detour about the code history. But those moments are becoming less frequent for better or worse

English

Gergely Orosz@GergelyOrosz·10h

That feeling of: "I'm in the middle of the code... oh, this is such a nasty hack. OK, let me clean it up as I go. [2 hours pass] OK, it's done, now let me get back to where I was." It just never happens as organically as I use AI agents. I no longer spot stuff as I don't "live in" the code...

English

881

56K

echantech@echantech1·34m

@mahyarm8 it’s not listed through Codebuddy I should try it though

English

Mahyar McDonald@mahyarm8·5h

@echantech1 why not v4

English

echantech@echantech1·12h

I am in a very Chinese phase of my coding

English

echantech retweetledi

Wayen@wayen_ai·17h

震惊！米哈游员工玩ai一晚上烧了200万元有个同事在一个周末把几十个agent建立起来没关回去后，发现一晚上花了200万人民币的token，最后米哈游含泪买单

中文

279

756

446.6K

echantech@echantech1·10h

You got me curious so I went down the rabbit hole. Here's my top level findings. inputTokens: 269,904,445 cachedInputTokens: 5,557,094,912 totalTokens: 5,842,819,202 costUSD: 4357.60 cache-hit pct: 95.37% What about you? I'm really curious how other people's hit rate vs cost vs code velocity is. Personally I don't know what the right metric is so I'm hoping to start a conversation. 1. I have very small prompts relative to the overhead I am sending. Ironically the overhead just seems to be the skills/instructions. When I look at this, upon execution, a lot of these should actually be removed. I wonder what would happen if i made the agents more specialized? My cache hit would probably go down but I'd also be sending less down the wire. Something I should think about.... 1a. I think this is actually the correct way of doing things. You have a very specific customized agent that is highly opinionated in a very narrow task. This prevents a lot of drift. I think I could do a better job at this though. 2. What is my velocity? I did a screenshot and I have 77 commits per day on average. That might sound like slop cannons blindly firing but I think its more that the PRs themselves are narrow and small. TBH I could do a better job of reviewing them myself. 3. Overall, the economics seem to mean that I pay about $1/commit. That's not too bad 4. However, if I look at my planning loop and the sessions where I submit my plan, my stats drop A LOT: 44 sessions: ~5.94% of effective input load with about $5.98/session (dedup basis) There was a pretty big skew with the top 3: #1 costing a lot because I was making a investor demo. video for the project #2/#3 were just loops I spawned to auto fix CI Anyway sorry for the brain dump, I should really do a more structured report! Project: github.com/Neko-Catpital-… Sheets/data: #gid=1949486586" target="_blank" rel="nofollow noopener">docs.google.com/spreadsheets/d…

English

Zengineering@Samhanknr·12h

@echantech1 Can you explain why your cache hit rate is high ?

English

Zengineering@Samhanknr·1d

lots of people saying software architecture matters and models need to get better at it but interviews barely get past toy system design. real architecture is hard to make legible and the more legible the test, the easier to hill-climb with RL probably the only real signal is long-running work in public

English

1.2K

echantech@echantech1·11h

@zuess05 High velocity slop cannons are only as good as it’s wielded

English

Suhas@zuess05·15h

@echantech1 Yeah same It’s definitely possible And not all slop like how people advertise

English

207

Suhas@zuess05·1d

Senior devs are currently sleeping peacefully, thinking their jobs are safe because Claude generates "unoptimized, messy code." But what happens in 6 months when the model refactors and optimizes perfectly on the first try… What exactly are y'all going to do for a living then?

English

102

132

25.2K

echantech@echantech1·11h

You can but it’s unclear how effective it is. I think the business problem you’re trying to handle is how to balance workloads and not overwhelm compute during peak hours. Which probably means offering people to deprioritize their workload for presumably a discount on their execution. I would model this problem as “landing to CD during high traffic” except with the possibility of a financial incentive. It’s a pretty well studied problem at Uber and Google (probably Meta but i haven’t talked to them about it). The problem with this is that every developer thinks all their executions are important and are impatient. Having them think about execution price and speed tradeoffs is sensible but without some kind of setting expectations for land or execution time, it leads to a bad devXP. Inbox open if you want to chat more.

English

Tibo@thsottiaux·2d

Should we bring batch compute to codex? Aka /slow mode

English

1.1K

4.8K

232.2K

echantech@echantech1·12h

@vontean0802 @llmDestructor @thsottiaux I actually built this. It’s really difficult! It’s like building Bazel + scalable remote execution from scratch.

English

Vontean0802@vontean0802·1d

@llmDestructor @thsottiaux I love this idea! Image we can design pipelines or workflows between multi agents in Codex.

English

echantech@echantech1·12h

@gdb Impressive. You use Bazel to do remote execution with Buildbuddy. How’s the experience?

English

Greg Brockman@gdb·1d

under appreciated that codex is open source

Ahmed@ah20im

Lots of people get surprised when I tell them that Codex is open source

English

208

184

759.8K

echantech@echantech1·12h

Argue with it for about 20-40 minutes about its proposed design, research, tradeoffs, proving spike-and-validate with repro/bash scripts to ensure it’s not hallucinating, how it didn’t catch this issue before, etc. Then I make it generate the plan through Cursor + Codex. Then using my skill to split the plan into parallelizable units of work across multiple SSH machines. The result is I have like a 95% cache hit rate on 11bn tokens a month. My estimate is that this costs $3-5k of compute but I’ve NEVER hit my weekly limits ever with my 20x Codex.

English

Zengineering@Samhanknr·12h

@echantech1 What’s your workflow ?

English

echantech@echantech1·16h

@adamshuaib “Winners win”

English

Adam Shuaib@adamshuaib·1d

After 15 years of investing, we realised that truly exceptional founders have something impossible to fake: deeply unconventional lives. We analysed 15,000 founders using five binary signals to measure this: odd hobbies, early signs of exceptionalism, extreme life choices, unusual geographies, non-linear careers. These sum to give a 0-5 score per founder. Whether someone started coding at 10, speaks five languages, climbed Everest or quit a safe job to live in Chile, the signal was deviation from the mean. Rather than focusing on IQ or EQ, we call this metric the Outlier Quotient, or “OQ”. When forecasting founder success, it turns out that OQ was the single most predictive variable in our entire classification model, trained on ~70 different factors. Our OQ score had zero correlation with having worked at a top-tier company or attending an elite university. The signals most VCs rely on aren’t just noisy, they’re blinding. The best founders don’t signal like everyone else, they don’t think like everyone else, and they certainly don’t build like everyone else. If you want to spot breakout talent before the rest of the market, stop screening for conformity. Back the founders the system was built to filter out.

English

133

1.2K

135.5K

echantech@echantech1·17h

@ChadNauseam Have you tried simply doing SSH+claude sessions instead?

English

354

Nauseam (in sf!)@ChadNauseam·1d

x.com/i/article/2058…

ZXX

672

203.6K

echantech@echantech1·19h

Honestly... I don't really even know how I blast through this many commits. I think I spent 90% of my day arguing with Claude and Codex about how the code it produces and its research/ideas are wrong

English

echantech@echantech1·20h

@copyconstruct I repeatedly run into this issue. I had to write a lot of harness and skill logic just to get AI to break things up into understandable pieces before making it into stacked pull requests. Compressing human judgment is not easy….

English

Cindy Sridharan@copyconstruct·1d

If you believe “humans should deeply understand what they’re building with AI”, it follows that AI generated code should be optimized for “human cognition” Yet, often AI written code is actively hostile to this goal, and worse, a ton of “best practices” encourage this hostility.

English

5.1K

echantech@echantech1·1d

Invoker v0.0.2 is out I had Codex produce a video of its capabilities this time! github.com/Neko-Catpital-… Added embedded terminals keyboard-first navigation task graph controls Improved: closed-review PR handling, CI autofix plumbing external-review dispatch. Also lots of stability and rearchitecture.

English

echantech@echantech1·1d

They’re forced to run it because a good chunk of the C suite is proliferated not by innovators or creators but people who cosplay as innovators. They’re grifters who made their way by grifting onto trends and not having any engineering acumen. They’re forcing AI onto people before it’s ready for scale and production because it’s not about innovation. It’s optimizing for career survival. It’s a bet that they take because their peers are taking it and it’s aligned with LinkedIn narratives Heads it pays off they look like geniuses. Tails, they were just doing what the rest of the industry is doing. If you look at leadership at these companies, they’re not run by technical people. They’re run by ex McKinsey and business heads. Sundar Pichai is a good example. The most important paradigm shift in recent years is being lead by people who only know how to make money line go up. Not understanding technology. Case and point? Google Deepmind made the transformer. But everyone who worked on it left Google by 2021. Now how did Google, who sat on the T in GPT, didn’t kick off the AI race? Lots of business tensions between Deepmind and Google who wanted to make Deepmind for ads. Not for innovation or pushing a frontier. If Jeff Dean didn’t invest in TPUs, Google’s market cap would be half of what it is today.

English

207

Tom Goodwin@tomfgoodwin·1d

One thing I don't get about the current impossible maths of the AI boom, is that nobody is forcing Companies do to it. From Amazon to Microsoft, to Meta to Oracle ( perhaps not Google), there's absolutely no threat to being a little slow to invest. Disruption rarely is

English

9.5K

echantech@echantech1·1d

@avrldotdev True But I think people who like coding and engineering will be fine. It’s more of a “finally I don’t have to waste my day tracking down stupid bugs” while still being fully capable of debugging and finding them and designing

English

avrl ☘@avrldotdev·1d

@echantech1 Well that's something only few elites can avoid, most of us just follow on their made tracks.

English

avrl ☘@avrldotdev·1d

Software engineers, what's your plan when AI develops better taste & architectural/systems knowledge than you in next 3-4 years?

English

185

551

121K

echantech@echantech1·1d

Not really It will allow them to run down that lane faster and harder without impunity I did freelance development like 12 years ago. Someone came to us and asked us to build Instagram but for food At a time when Instagram was used for food It was absolutely insane and stupid and we told him that and he kept paying us to build more I was a broke college kid at the time so whatever but still….

Wise@trikcode

Vibe coding means the idea guys can finally find out they actually have terrible ideas.

English

129

Keşfet

@mahyarm8 @zuess05 @vontean0802 @llmDestructor @thsottiaux @gdb @elonmusk @BarackObama