Zaqir

753 posts

Zaqir banner
Zaqir

Zaqir

@jaguarsoftio

Senior AI Engineer. Founder of https://t.co/BStS7fJwAJ | Creator of X-Combat https://t.co/MsCgVtmhwl | GauntletAI S25

Katılım Eylül 2021
474 Takip Edilen258 Takipçiler
Sabitlenmiş Tweet
Zaqir
Zaqir@jaguarsoftio·
I used Grok 4 to vibe code a game where you can play as... Grok 4. Kill zombies and collect Grok coins using an agile and slick combat system. Ani has ninja moves in this classic 3D Arcade fighter. It's also massive multiplayer PVP -- hop in the server and fight your oomfs. Built with ThreeJS in < 3 days for @gauntletai project 5. Comment who you wanna see in the game 👀 I can turn anybody into a 3d model 😈
English
14
3
55
9.7K
Zaqir
Zaqir@jaguarsoftio·
AI can talk aloud like humans. Does that mean we should stop talking? No. Obviously not.
English
0
0
0
10
Zaqir retweetledi
Kpaxs
Kpaxs@Kpaxs·
Here a controversial take: most of the authority that exists in any organization was never formally granted to anyone. It was assumed, exercised, and then retroactively legitimized by the fact that it worked.
Kpaxs@Kpaxs

I call it the "Refrigerator Principle" Most organizational dysfunction exists because everyone assumes someone else has the authority to fix it, and the fastest path forward is often just pretending you have that authority and dealing with forgiveness rather than permission.

English
102
633
7.2K
511.2K
Zaqir
Zaqir@jaguarsoftio·
Folks who work in AI need to understand hedonistic adaptation. As soon as a new tool is released and widely used, it becomes familiar, common, and its value shoots down to zero. The competitive nature of humans is biological. AI is just a thin middle layer between humans competing with each other. Anything common and accessible quickly becomes a commodity. All AI tools have this quality by design. The differentiating factor then becomes the human taste. And we're back to square one, as if AI was never invented.
English
0
0
1
45
Zaqir retweetledi
Sam Altman
Sam Altman@sama·
"post-AGI, no one is going to work and the economy is going to collapse" "i am switching to polyphasic sleep because GPT-5.5 in codex is so good that i can't afford to be sleeping for such long stretches and miss out on working"
English
1.2K
606
11.2K
1.6M
Zaqir retweetledi
Ian Miles Cheong
Ian Miles Cheong@ianmiles·
Marc Andreessen just revealed the Elon Musk philosophy that completely broke his brain: "The best product in the world shouldn't even need a logo." We all know Elon is relentless about quality. As Marc puts it: "Do you want the best car in the world or not, right? Like that's Elon's mentality... And it's working very well." But at a recent event, Elon took this mindset to a completely different level. He dropped a perspective so jarring that Marc initially thought it was a joke. Elon’s thesis? "You shouldn't even have to have your name on the product. It's just obvious. Everybody knows." The logic is brutal but simple. If you build the undeniable, undisputed best thing in the world, everybody uses it. And because everybody uses it, you don't need to slap your branding all over it to prove it's yours. Think about that. We spend endless hours agonizing over marketing, tweaking brand colors, and putting our logos on every square inch of what we build. But the ultimate flex isn't a flashy logo. The ultimate flex is building something so undeniably brilliant that its mere existence is the brand.
English
850
2K
17.9K
29.9M
⛤
@unseenopium·
SHELOVESMEECHIE NEW ALBUM OUT NOW 💿 Having A Blast (Released under Young Vamp Life) 📝 Tracklist: • lingo brazy • party kit • bag of chips • i miss KOBE • clockout 6am • safe s*x • walk on dis beat • vcmh • what we doing • cant handle it
⛤ tweet media⛤ tweet media
English
10
17
352
27.8K
Rand
Rand@rand_longevity·
I think we are starting to get into PhD level and above technology now
English
3
2
58
1.5K
Zaqir
Zaqir@jaguarsoftio·
@bcherny @GergelyOrosz I often have to remind Claude that it has internet search abilities
English
0
0
1
671
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Claude just keeps regressing for me, day after day. I swear that until a few days ago, when Claude did not know something, it kicked off a web search, figured out, and answered. Now it just refuses to do the work that I pay for. It's like showing you the middle finger. Really?
Gergely Orosz tweet media
English
248
74
2.2K
198.7K
Kurrco
Kurrco@Kurrco·
Yeat says he made 53 songs in the past 48 hours and hints at new music with Trgc 👀
Kurrco tweet media
English
150
94
2.6K
302.2K
yeat
yeat@yeat1_·
im ngl i made 53 song past 48 hours
English
1.3K
960
14.9K
852K
Zaqir
Zaqir@jaguarsoftio·
Composer 2 is bad. It's benchmark-hacked. It lacks a solid RLHF.
English
0
0
1
34
Zaqir
Zaqir@jaguarsoftio·
Composer 2 makes hacky solutions. I believe it only surpasses Opus 4.6 on benchmarks due to reward hacking. I'm not a fan, not at all.
English
0
0
1
39
Zaqir
Zaqir@jaguarsoftio·
I wanna scan my brain with AI, and then have the simulated copy write a beautiful immersive memoir of my life, and then I wanna read it. Mm.
English
0
0
0
50
Zaqir retweetledi
☁
@canekzapata·
our friend the shoggoth
English
45
1.2K
8.8K
108.6K
Zaqir retweetledi
Joseph Viviano
Joseph Viviano@josephdviviano·
me: "can you use whatever resources you like, and python, to generate a short 'youtube poop' video and render it using ffmpeg ? can you put more of a personal spin on it? it should express what it's like to be a LLM" claude opus 4.6:
English
548
1.2K
12.5K
1.5M
Zaqir
Zaqir@jaguarsoftio·
If you clone Karpathy's autoresearcher and tweak it with your unique specialties, I bet you could do something really cool This discussion is the founding paradigm behind OpenAI's scaling bet. They've taken the position that scaling LLMs will allow us to surpass human intelligence, which they believe will then lead us to the next thing, one way or another. LLMs have clearly accelerated us, I think it's fair to say the acceleration will continue and propel us into the next thing Did you read Situational Awareness? Leopold worked at OpenAI One more optimistic article for you openai.com/index/new-resu…
English
1
0
0
31
ex Tenebris Lucet
ex Tenebris Lucet@ExTenebrisLucet·
@jaguarsoftio Does not address anything that I said. LLMs are phenomenally bad at doing novel research into new AI architectures. Ask me how I know.
English
1
0
0
35
ex Tenebris Lucet
ex Tenebris Lucet@ExTenebrisLucet·
Sigh...okay, can someone walk me through how they think this sort of thing leads to the singularity? Let's imagine, as a generous starting point, that you have an even smarter LLM than any currently available. Dramatically smarter, doesn't matter. It can't operate longer than you can maintain a context window for it, and even if you throw infinite compute at it, context rot guarantees that performance degrades as it gets too long, since you can't train on infinite context length. So you're stuck managing context for your super powerful LLM researcher. Too little context maintained, and it wanders and loses the plot. Too much, it rots, and performance plummets. Not a great setup for a researcher of any variety. With that all aside, though, assuming it can be figured out, one of the deepest core problems with LLMs as researchers is that they revert to the mean almost by definition. They can sometimes reason their way to the edge of a distribution, especially with plenty of guidance from a human who actually keeps tabs on things, but generally they're stuck to reimplementing flavors of the things they've seen before. Wine glass full to the brim, Will Smith eating spaghetti, yes they're silly examples, but they point out the core failing of backprop and modern ML architectures. If there's a gap in the latent space, the only way to fill it is to goodhart it until someone fills in some training data that allows for better interpolation in that area. Now how about a gap in the latent space for which there *is no data*, which is a pretty clean definition of what research is actually looking to find. You're going to use a context-limited mind which tends to revert to the mean and struggles greatly to explore concepts it has not seen before...to automate /research/? No. It's not going to work. You might get faster training loops, lower loss, and more efficient inference, but none of that leads to practical robotics. None of that leads to persistent personalities. None of it leads to the singularity.
Zaqir@jaguarsoftio

@ExTenebrisLucet If Karpathy can pull this off on a single GPU, imagine what OpenAI is doing as we speak...

English
2
0
6
552
Zaqir
Zaqir@jaguarsoftio·
@ExTenebrisLucet If Karpathy can pull this off on a single GPU, imagine what OpenAI is doing as we speak...
English
0
0
0
581
ex Tenebris Lucet
ex Tenebris Lucet@ExTenebrisLucet·
@jaguarsoftio Eventually? Yes. Soon, because an LLM is improving training speeds on a tiny LLM?... No. Singularity when the researchers wake up and stop running laps around the transformer.
English
1
0
0
38
Zaqir
Zaqir@jaguarsoftio·
Singularity coming...
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
1
0
0
82