Zaqir

753 posts

Zaqir

@jaguarsoftio

Senior AI Engineer. Founder of https://t.co/BStS7fJwAJ | Creator of X-Combat https://t.co/MsCgVtmhwl | GauntletAI S25

Katılım Eylül 2021

474 Takip Edilen258 Takipçiler

Sabitlenmiş Tweet

Zaqir@jaguarsoftio·17 Tem

I used Grok 4 to vibe code a game where you can play as... Grok 4. Kill zombies and collect Grok coins using an agile and slick combat system. Ani has ninja moves in this classic 3D Arcade fighter. It's also massive multiplayer PVP -- hop in the server and fight your oomfs. Built with ThreeJS in < 3 days for @gauntletai project 5. Comment who you wanna see in the game 👀 I can turn anybody into a 3d model 😈

English

9.7K

Zaqir@jaguarsoftio·2h

AI can talk aloud like humans. Does that mean we should stop talking? No. Obviously not.

English

Zaqir retweetledi

Kpaxs@Kpaxs·13 Mar

Here a controversial take: most of the authority that exists in any organization was never formally granted to anyone. It was assumed, exercised, and then retroactively legitimized by the fact that it worked.

Kpaxs@Kpaxs

I call it the "Refrigerator Principle" Most organizational dysfunction exists because everyone assumes someone else has the authority to fix it, and the fastest path forward is often just pretending you have that authority and dealing with forgiveness rather than permission.

English

102

633

7.2K

511.2K

Zaqir@jaguarsoftio·26 Nis

Folks who work in AI need to understand hedonistic adaptation. As soon as a new tool is released and widely used, it becomes familiar, common, and its value shoots down to zero. The competitive nature of humans is biological. AI is just a thin middle layer between humans competing with each other. Anything common and accessible quickly becomes a commodity. All AI tools have this quality by design. The differentiating factor then becomes the human taste. And we're back to square one, as if AI was never invented.

English

Zaqir retweetledi

Sam Altman@sama·26 Nis

"post-AGI, no one is going to work and the economy is going to collapse" "i am switching to polyphasic sleep because GPT-5.5 in codex is so good that i can't afford to be sleeping for such long stretches and miss out on working"

English

1.2K

606

11.2K

1.6M

Zaqir retweetledi

Ian Miles Cheong@ianmiles·21 Nis

Marc Andreessen just revealed the Elon Musk philosophy that completely broke his brain: "The best product in the world shouldn't even need a logo." We all know Elon is relentless about quality. As Marc puts it: "Do you want the best car in the world or not, right? Like that's Elon's mentality... And it's working very well." But at a recent event, Elon took this mindset to a completely different level. He dropped a perspective so jarring that Marc initially thought it was a joke. Elon’s thesis? "You shouldn't even have to have your name on the product. It's just obvious. Everybody knows." The logic is brutal but simple. If you build the undeniable, undisputed best thing in the world, everybody uses it. And because everybody uses it, you don't need to slap your branding all over it to prove it's yours. Think about that. We spend endless hours agonizing over marketing, tweaking brand colors, and putting our logos on every square inch of what we build. But the ultimate flex isn't a flashy logo. The ultimate flex is building something so undeniably brilliant that its mere existence is the brand.

English

850

17.9K

29.9M

Zaqir@jaguarsoftio·17 Nis

@RnzlerAidan @unseenopium @Shelovesmeechie truu i missed that one. the outro hard

English

Rnzler@RnzlerAidan·17 Nis

@jaguarsoftio @unseenopium @Shelovesmeechie Don’t sleep on I miss Kobe tho

English

1.5K

⛤@unseenopium·17 Nis

SHELOVESMEECHIE NEW ALBUM OUT NOW 💿 Having A Blast (Released under Young Vamp Life) 📝 Tracklist: • lingo brazy • party kit • bag of chips • i miss KOBE • clockout 6am • safe s*x • walk on dis beat • vcmh • what we doing • cant handle it

English

352

27.8K

Zaqir@jaguarsoftio·16 Nis

@rand_longevity watch a few @huskirl videos and you might feel otherwise

English

Rand@rand_longevity·16 Nis

I think we are starting to get into PhD level and above technology now

English

1.5K

Zaqir@jaguarsoftio·16 Nis

Husk is ARC-ARG 5. Husk is the new Turing Test. @fchollet take notes

Husk@huskirl

Idk what to type here rn

English

127

Zaqir@jaguarsoftio·16 Nis

@bcherny @GergelyOrosz I often have to remind Claude that it has internet search abilities

English

671

Boris Cherny@bcherny·16 Nis

@GergelyOrosz 👋 can you make sure you have web search enabled?

English

216

77.4K

Gergely Orosz@GergelyOrosz·16 Nis

Claude just keeps regressing for me, day after day. I swear that until a few days ago, when Claude did not know something, it kicked off a web search, figured out, and answered. Now it just refuses to do the work that I pay for. It's like showing you the middle finger. Really?

English

248

2.2K

198.7K

Zaqir@jaguarsoftio·15 Nis

@DeepBallHurts @Kurrco yes

591

trust the process@DeepBallHurts·15 Nis

@Kurrco we want the “slop” back

English

233

8.1K

Kurrco@Kurrco·15 Nis

Yeat says he made 53 songs in the past 48 hours and hints at new music with Trgc 👀

English

150

2.6K

302.2K

Zaqir@jaguarsoftio·15 Nis

@yeat1_ pls drop

English

yeat@yeat1_·15 Nis

im ngl i made 53 song past 48 hours

English

1.3K

960

14.9K

852K

Zaqir@jaguarsoftio·11 Nis

Composer 2 is bad. It's benchmark-hacked. It lacks a solid RLHF.

English

Zaqir@jaguarsoftio·25 Mar

Composer 2 makes hacky solutions. I believe it only surpasses Opus 4.6 on benchmarks due to reward hacking. I'm not a fan, not at all.

English

Zaqir@jaguarsoftio·13 Mar

I wanna scan my brain with AI, and then have the simulated copy write a beautiful immersive memoir of my life, and then I wanna read it. Mm.

English

Zaqir retweetledi

☁@canekzapata·10 Mar

our friend the shoggoth

English

1.2K

8.8K

108.6K

Zaqir retweetledi

Joseph Viviano@josephdviviano·10 Mar

me: "can you use whatever resources you like, and python, to generate a short 'youtube poop' video and render it using ffmpeg ? can you put more of a personal spin on it? it should express what it's like to be a LLM" claude opus 4.6:

English

548

1.2K

12.5K

1.5M

Zaqir@jaguarsoftio·10 Mar

If you clone Karpathy's autoresearcher and tweak it with your unique specialties, I bet you could do something really cool This discussion is the founding paradigm behind OpenAI's scaling bet. They've taken the position that scaling LLMs will allow us to surpass human intelligence, which they believe will then lead us to the next thing, one way or another. LLMs have clearly accelerated us, I think it's fair to say the acceleration will continue and propel us into the next thing Did you read Situational Awareness? Leopold worked at OpenAI One more optimistic article for you openai.com/index/new-resu…

English

ex Tenebris Lucet@ExTenebrisLucet·10 Mar

@jaguarsoftio Does not address anything that I said. LLMs are phenomenally bad at doing novel research into new AI architectures. Ask me how I know.

English

ex Tenebris Lucet@ExTenebrisLucet·10 Mar

Sigh...okay, can someone walk me through how they think this sort of thing leads to the singularity? Let's imagine, as a generous starting point, that you have an even smarter LLM than any currently available. Dramatically smarter, doesn't matter. It can't operate longer than you can maintain a context window for it, and even if you throw infinite compute at it, context rot guarantees that performance degrades as it gets too long, since you can't train on infinite context length. So you're stuck managing context for your super powerful LLM researcher. Too little context maintained, and it wanders and loses the plot. Too much, it rots, and performance plummets. Not a great setup for a researcher of any variety. With that all aside, though, assuming it can be figured out, one of the deepest core problems with LLMs as researchers is that they revert to the mean almost by definition. They can sometimes reason their way to the edge of a distribution, especially with plenty of guidance from a human who actually keeps tabs on things, but generally they're stuck to reimplementing flavors of the things they've seen before. Wine glass full to the brim, Will Smith eating spaghetti, yes they're silly examples, but they point out the core failing of backprop and modern ML architectures. If there's a gap in the latent space, the only way to fill it is to goodhart it until someone fills in some training data that allows for better interpolation in that area. Now how about a gap in the latent space for which there *is no data*, which is a pretty clean definition of what research is actually looking to find. You're going to use a context-limited mind which tends to revert to the mean and struggles greatly to explore concepts it has not seen before...to automate /research/? No. It's not going to work. You might get faster training loops, lower loss, and more efficient inference, but none of that leads to practical robotics. None of that leads to persistent personalities. None of it leads to the singularity.

Zaqir@jaguarsoftio

@ExTenebrisLucet If Karpathy can pull this off on a single GPU, imagine what OpenAI is doing as we speak...

English

552

Zaqir@jaguarsoftio·10 Mar

@ExTenebrisLucet If Karpathy can pull this off on a single GPU, imagine what OpenAI is doing as we speak...

English

581

ex Tenebris Lucet@ExTenebrisLucet·10 Mar

@jaguarsoftio Eventually? Yes. Soon, because an LLM is improving training speeds on a tiny LLM?... No. Singularity when the researchers wake up and stop running laps around the transformer.

English

Zaqir@jaguarsoftio·10 Mar

Singularity coming...

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

Keşfet

@RnzlerAidan @unseenopium @Shelovesmeechie @rand_longevity @huskirl @fchollet @bcherny @GergelyOrosz