Bessi

2.7K posts

Bessi

@aeitroc

I'm just a coder for fun, like Saitama, suffering from LLM psychosis.

Albania Katılım Temmuz 2024

1K Takip Edilen986 Takipçiler

Bessi@aeitroc·12h

just downloaded "OurAgent" , i confirm is the best coding agent out there.

Matan Grinberg@matanSF

excited to annouce the latest scores of OurAgent on OurAgentBench: 1. OurAgent 2. YourAgent arxiv paper in bio

English

Bessi@aeitroc·12h

Anthropic is a joke

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

Bessi@aeitroc·16h

@LLMJunky Btw, im posting from S24 Ultra 12/256. Still rocking.

English

am.will@LLMJunky·16h

i dont have a choice. my current phone is basically unusable right now. logic tells me 12gb is more than enough but i'm very conflicted. i agree with the logic but the thing that bothers me about buying the best model is the fact you dont really get more for trade in for them. so if i get the 12gb when i eventually trade it in, i'll get the same 1000-1100 for it that the 16gb will fetch. I know its kinda silly to look at it that way but i can't help it lol. feels like i'm just throwing money away when i'm on my pc 99% of the time anyway.

English

am.will@LLMJunky·1d

I need some help choosing a new Samsung Galaxy S26 Ultra Do you know about smaller language models? Is it worth paying an extra $300 to upgrade from 12gb to 16gb RAM? It's kinda a lot of freaking money. Help me choose. I want to mess with local models on device.

English

3.4K

Bessi@aeitroc·17h

I disappeared these 3 months because of work and had the chance to test every coding agent out there with enterprise-level features where i work. Every coding agent is unique, and anyone can find something adaptable to their coding style. Well, here i am again, back at Droid [GPT 5.4 Custom] no more experiments,now with plugins i will extend it further. Makes sense /missions to be exclusive for their plans, so I have developed a plugin for it, based on my experience, that helps me implement high-end, sophisticated features in large codebases. I will probably share it next week as I'm tweaking it.

English

Bessi@aeitroc·1d

@samwcyo Now ask claude to destory its self.

English

7.4K

Sam Curry@samwcyo·1d

Asked Claude to root my Xiaomi 17 Pro Max. Did not go well.

English

381

389

14.1K

Bessi@aeitroc·1d

So The Great Reset we heard about these years was in IT Industry i guess.

English

Bessi@aeitroc·1d

@LLMJunky mine resseted at 6pm, 2 days using opus ffs, lost 10 years of my life.

English

am.will@LLMJunky·1d

we are so back happy reset day folks its kinda funny how (most of) us are now on the same "cycle"

am.will@LLMJunky

GGs boys. It was nice knowing you.

English

152

10.6K

Bessi@aeitroc·1d

@sandislonjsak You are absolutely right.

English

Sandi Slonjšak@sandislonjsak·2d

I have finally discovered the limit of Claude. It’s me. I am the limit.

English

174

270

4.1K

102.7K

Bessi@aeitroc·1d

@LLMJunky i bet @warpdotdev is cooking this.

English

563

am.will@LLMJunky·1d

I got so tired of everyone raving about how great cmux is. Panes this. Browser that. EXHAUSTING. And that's because I'm on Linux, where we get none of the coolest toys. So...I built it myself. And my God. You were right. It's amazing. Introducing Limux, a a GPU-accelerated terminal workspace manager for Linux, powered by Ghostty's rendering engine, with split panes, tabbed workspaces, and a built-in browser. Think cmux, but native Linux. If you're interested in something like this, be sure to leave a comment and I'll release it. Special thanks to @manaflowai and @mitchellh for making this possible.

English

433

42.9K

Bessi@aeitroc·1d

@Kelset @badlogicgames pi is truly special. Wonder how did he came up with such a tool?

English

311

Lorenzo 'kelset' Sciandra@Kelset·1d

thanks to pi.dev i'm at the point now we're i've lost count of how many agents i have running in parallel this tool is genuinely the missing piece (for me) in unlocking the next level of agentic - great work @badlogicgames 👏

English

8.7K

Bessi@aeitroc·1d

Call 911

English

102

Bessi retweetledi

Om Patel@om_patel5·5d

this guy vibe coded an AI SURVIVAL APP that works COMPLETELY OFFLINE the app > gives you survival advice completely offline > cites exact pages from manuals stored on the device > has offline maps so you're never lost > lets you text people up to 50 miles away with no cell service it started off as an app but now he's selling physical devices it's waterproof, under 3 pounds, and strong enough that you can run it over with a car the app hit 14k users and became the world's #1 rated survival AI which is insane

English

442

1.4K

15K

1.9M

Bessi@aeitroc·4d

Doing the same from two days. It feels unlimited what u can do with it.

Sriram Krishnan@sriramk

have switched to pi.dev as my agent harness. @badlogicgames has built something very useful. what do folks like as an extension to manage multiple agents at the same time. something that would - show me the status of various agents - show me what is stuck - run meta analysis to suggest better mechanisms (switch models, a side agent that can analyze the current agents) might code one up if nothing exists.

English

138

Bessi retweetledi

Matan Grinberg@matanSF·6d

OB1 could not beat @droid the fair way. Shame, but clever way of cheating Terminal Bench

GIF

Monk Zero@NoCommas

x.com/i/article/2032…

English

111

14.9K

Bessi retweetledi

Monk Zero@NoCommas·6d

x.com/i/article/2032…

ZXX

378

480.1K

Bessi retweetledi

Jay Scambler@JayScambler·6d

Introducing autocontext: a recursive self-improving harness designed to help your agents (and future iterations of those agents) succeed on any task. I built this for our clients with the intention of commercializing it but the community support around Karpathy's autoresearch convinced me to open source it instead. Our space is on the verge of something big and we want to do our part.

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

119

1.9K

293.8K

Bessi@aeitroc·6d

The best plan creator so far for high-end enterprise-level features is Prometheus (GPT 5.4) from oh-my-openagent in @opencode .

English

167

Bessi retweetledi

seth@sethsetse·13 Mar

“sir they’ve paid $100 for the Apple Developer account” “good. wait 2 days to confirm their account then reject the app 10 times over the next 20 days. if they make it through take a 30% cut of the revenue and pay them out 2 months later. ban them if they try using Stripe”