Bessi

2.7K posts

Bessi banner
Bessi

Bessi

@aeitroc

I'm just a coder for fun, like Saitama, suffering from LLM psychosis.

Albania Katılım Temmuz 2024
1K Takip Edilen986 Takipçiler
Bessi
Bessi@aeitroc·
@LLMJunky Btw, im posting from S24 Ultra 12/256. Still rocking.
English
1
0
1
10
am.will
am.will@LLMJunky·
i dont have a choice. my current phone is basically unusable right now. logic tells me 12gb is more than enough but i'm very conflicted. i agree with the logic but the thing that bothers me about buying the best model is the fact you dont really get more for trade in for them. so if i get the 12gb when i eventually trade it in, i'll get the same 1000-1100 for it that the 16gb will fetch. I know its kinda silly to look at it that way but i can't help it lol. feels like i'm just throwing money away when i'm on my pc 99% of the time anyway.
English
1
0
0
16
am.will
am.will@LLMJunky·
I need some help choosing a new Samsung Galaxy S26 Ultra Do you know about smaller language models? Is it worth paying an extra $300 to upgrade from 12gb to 16gb RAM? It's kinda a lot of freaking money. Help me choose. I want to mess with local models on device.
am.will tweet media
English
13
0
10
3.4K
Bessi
Bessi@aeitroc·
I disappeared these 3 months because of work and had the chance to test every coding agent out there with enterprise-level features where i work. Every coding agent is unique, and anyone can find something adaptable to their coding style. Well, here i am again, back at Droid [GPT 5.4 Custom] no more experiments,now with plugins i will extend it further. Makes sense /missions to be exclusive for their plans, so I have developed a plugin for it, based on my experience, that helps me implement high-end, sophisticated features in large codebases. I will probably share it next week as I'm tweaking it.
English
0
0
0
46
Bessi
Bessi@aeitroc·
@samwcyo Now ask claude to destory its self.
English
0
0
22
7.4K
Sam Curry
Sam Curry@samwcyo·
Asked Claude to root my Xiaomi 17 Pro Max. Did not go well.
Sam Curry tweet media
English
381
389
14.1K
2M
Bessi
Bessi@aeitroc·
So The Great Reset we heard about these years was in IT Industry i guess.
English
0
0
0
26
Bessi
Bessi@aeitroc·
@LLMJunky mine resseted at 6pm, 2 days using opus ffs, lost 10 years of my life.
English
0
0
2
48
Sandi Slonjšak
Sandi Slonjšak@sandislonjsak·
I have finally discovered the limit of Claude. It’s me. I am the limit.
English
174
270
4.1K
102.7K
am.will
am.will@LLMJunky·
I got so tired of everyone raving about how great cmux is. Panes this. Browser that. EXHAUSTING. And that's because I'm on Linux, where we get none of the coolest toys. So...I built it myself. And my God. You were right. It's amazing. Introducing Limux, a a GPU-accelerated terminal workspace manager for Linux, powered by Ghostty's rendering engine, with split panes, tabbed workspaces, and a built-in browser. Think cmux, but native Linux. If you're interested in something like this, be sure to leave a comment and I'll release it. Special thanks to @manaflowai and @mitchellh for making this possible.
English
70
20
433
42.9K
Lorenzo 'kelset' Sciandra
thanks to pi.dev i'm at the point now we're i've lost count of how many agents i have running in parallel this tool is genuinely the missing piece (for me) in unlocking the next level of agentic - great work @badlogicgames 👏
English
6
2
89
8.7K
Bessi
Bessi@aeitroc·
Call 911
Bessi tweet media
English
0
0
3
102
Bessi retweetledi
Om Patel
Om Patel@om_patel5·
this guy vibe coded an AI SURVIVAL APP that works COMPLETELY OFFLINE the app > gives you survival advice completely offline > cites exact pages from manuals stored on the device > has offline maps so you're never lost > lets you text people up to 50 miles away with no cell service it started off as an app but now he's selling physical devices it's waterproof, under 3 pounds, and strong enough that you can run it over with a car the app hit 14k users and became the world's #1 rated survival AI which is insane
English
442
1.4K
15K
1.9M
Bessi
Bessi@aeitroc·
Doing the same from two days. It feels unlimited what u can do with it.
Sriram Krishnan@sriramk

have switched to pi.dev as my agent harness. @badlogicgames has built something very useful. what do folks like as an extension to manage multiple agents at the same time. something that would - show me the status of various agents - show me what is stuck - run meta analysis to suggest better mechanisms (switch models, a side agent that can analyze the current agents) might code one up if nothing exists.

English
0
0
0
138
Bessi retweetledi
Jay Scambler
Jay Scambler@JayScambler·
Introducing autocontext: a recursive self-improving harness designed to help your agents (and future iterations of those agents) succeed on any task. I built this for our clients with the intention of commercializing it but the community support around Karpathy's autoresearch convinced me to open source it instead. Our space is on the verge of something big and we want to do our part.
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
62
119
1.9K
293.8K
Bessi
Bessi@aeitroc·
The best plan creator so far for high-end enterprise-level features is Prometheus (GPT 5.4) from oh-my-openagent in @opencode .
English
0
0
0
167
Bessi retweetledi
seth
seth@sethsetse·
“sir they’ve paid $100 for the Apple Developer account” “good. wait 2 days to confirm their account then reject the app 10 times over the next 20 days. if they make it through take a 30% cut of the revenue and pay them out 2 months later. ban them if they try using Stripe”
seth tweet media
English
152
416
9.2K
438K
Bessi
Bessi@aeitroc·
Openclaw is generic, Hermes is something else.
English
1
0
1
243