AIGuys

821 posts

AIGuys

@RealAIGuys

✌I like to deflate AI hype and talk about real research. ✍AI Book: https://t.co/U4mvOd0cZf ☛ Editor & Blogger @ AIGuys

Katılım Ekim 2021

122 Takip Edilen306 Takipçiler

Sabitlenmiş Tweet

AIGuys@RealAIGuys·16 Nis

Ultimate Neural Network Programming with Python is doing quite good in the market. Please take a look Amazon International 👉rb.gy/xc8m46 Amazon India 👉rb.gy/aqdqei #AI #GPT #ChatGPT #AIBook #books #DataScience #ArtificialIntelligence #LLMs #LLM #Data

English

2.6K

AIGuys@RealAIGuys·3d

AI is easy. Hold my maths ;)

English

AIGuys@RealAIGuys·24 Mar

Agentic AI faces no consequences for its action, and that's what separates us and machines.

English

AIGuys@RealAIGuys·10 Mar

#AI will solve everything but human greed and hunger for power. #IranWar #IranUSWar

English

542

AIGuys retweetledi

Rohan Paul@rohanpaul_ai·9 Mar

Anthropic's own study proves Vibe-Coding and AI coding assistants harm skill building. "AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average" Developers learning 1 new Python library scored 17% lower on tests when using AI. Delegating code generation to AI stops you from actually understanding the software. Using AI did not make the programmers statistically faster at completing tasks. Participants wasted time writing prompts instead of actually coding. Scores crashed below 40% when developers let AI write everything. Developers who only asked AI for simple concepts scored above 65%. Managers should not pressure engineers to use AI for endless productivity. Forcing top speed means workers lose the ability to debug systems later. ---- Paper Link – arxiv. org/abs/2601.20245 Paper Title: "How AI Impacts Skill Formation"

English

184

741

47.2K

AIGuys@RealAIGuys·10 Mar

@karpathy Isn't this like Neural Architecture Search, but with LLMs?

English

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

968

2.1K

19.5K

3.6M

AIGuys@RealAIGuys·10 Mar

Surprising Ineffectiveness of Agentic AI in Real Businesses #LLMs #AgenticAI #AIagents #AI medium.com/aiguys/surpris…

English

160

AIGuys@RealAIGuys·6 Mar

The world needs better @LinkedIn. Qty over quality shouldn't be prioritized. With LLMs, it's child play to say the most generic things, and these days it has become just generic LLM content all over the place.

English

AIGuys@RealAIGuys·27 Şub

@GaryMarcus And here's one of them in nutshell.

English

Gary Marcus@GaryMarcus·26 Şub

My time here been a failure. I tried to get the Twitterverse to wake up before things got bad. Now we are here. Things are bad. And about to get worse. Most people still don’t realize how bad. It’s not that AI is inherently impossible or immoral. It’s that most of the people pushing it don’t give a damn.

English

173

132

1.1K

85.7K

AIGuys@RealAIGuys·27 Şub

@sama in nutshell. When do we have AGI, when the AI god says so. @rao2z #AI #AGI

English

AIGuys retweetledi

Tech with Mak@techNmak·18 Şub

The person who built Claude Code just mass-leaked the thinking behind it. 45 minutes of design decisions, mistakes, and where it's all going. This is rare. Creators at this level don't usually talk this openly.

English

679

5.7K

538.9K

AIGuys@RealAIGuys·15 Şub

@GaryMarcus We missed you @GaryMarcus, I know this place can get toxic very soon, but there are always people who value your opinions and takes. And I'm one of them.

English

Gary Marcus@GaryMarcus·14 Şub

That’s exactly why I came back.

Ryan • Web AI@DontFearAI

Gary Marcus is back on X to bring balance back to the AI hype train.

English

408

20.6K

AIGuys@RealAIGuys·13 Şub

@WesRoth And somehow no one really answers what the hell people will actually do without work, not everyone is artist or social worker and most people will descent into chaos without work. No one is able to justify why do we need so much automation, especially on creative tasks?

English

Wes Roth@WesRoth·13 Şub

Anthropic CEO on the transition from human-AI collaboration (Centaur) to full AI dominance. Amodei warns that while demand for coders might spike briefly, the "Centaur" window could be very short before full automation.

English

149

1.4K

261.4K

AIGuys@RealAIGuys·6 Şub

Can't stress it enough.

Robert Youssef@rryssf_

the number they don't cite: multi-agent llm systems fail 41-86.7% of the time in production. not edge cases. not adversarial attacks. standard deployment across 7 SOTA frameworks. berkeley researchers analyzed 1,642 execution traces and found 14 unique failure modes. most failures? system design and coordination issues.

English

AIGuys@RealAIGuys·4 Şub

The current state of AI safety is, frankly, a bit of a joke. We spend billions on RLHF and DPO, but all we’ve really built is a more polite mask for the same underlying danger. medium.com/aiguys/why-ai-…

English

AIGuys@RealAIGuys·4 Şub

The current state of AI safety is, frankly, a bit of a joke. 💰 We spend billions on RLHF and DPO, but all we’ve really built is a more polite mask for the same underlying danger. medium.com/aiguys/why-ai-…

English

AIGuys@RealAIGuys·4 Şub

🧐 The current state of AI safety is, frankly, a bit of a joke. 💰 We spend billions on RLHF and DPO, but all we’ve really built is a more polite mask for the same underlying danger. medium.com/aiguys/why-ai-…

English

AIGuys@RealAIGuys·21 Oca

Why does @GeminiApp struggles to write big code in its UI? After 800-1000 lines it surely misses something. I know it is capable of doing that, but not in UI, and I don't know why!

English

AIGuys@RealAIGuys·21 Oca

Not all task benefits equally by vibe coding. For big complex code bases, vibe coding might actually hurt performance and speed, if not used properly. medium.com/aiguys/no-vibe…

English

AIGuys@RealAIGuys·21 Oca

Not all task benefits equally by vibe coding. For big complex code bases, vibe coding might actually hurt performance and speed, if not done properly. medium.com/aiguys/no-vibe…

English

AIGuys@RealAIGuys·21 Oca

Not all task benefits equally by vibe coding. For big complex code bases, vibe coding might actually hurt performance and speed, if not done properly. medium.com/aiguys/no-vibe…

English

Keşfet

@karpathy @LinkedIn @GaryMarcus @sama @rao2z @WesRoth @elonmusk @BarackObama