AIGuys

821 posts

AIGuys banner
AIGuys

AIGuys

@RealAIGuys

✌I like to deflate AI hype and talk about real research. ✍AI Book: https://t.co/U4mvOd0cZf ☛ Editor & Blogger @ AIGuys

Katılım Ekim 2021
122 Takip Edilen306 Takipçiler
AIGuys
AIGuys@RealAIGuys·
AI is easy. Hold my maths ;)
AIGuys tweet media
English
0
0
0
20
AIGuys
AIGuys@RealAIGuys·
Agentic AI faces no consequences for its action, and that's what separates us and machines.
English
0
0
0
18
AIGuys retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
Anthropic's own study proves Vibe-Coding and AI coding assistants harm skill building. "AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average" Developers learning 1 new Python library scored 17% lower on tests when using AI. Delegating code generation to AI stops you from actually understanding the software. Using AI did not make the programmers statistically faster at completing tasks. Participants wasted time writing prompts instead of actually coding. Scores crashed below 40% when developers let AI write everything. Developers who only asked AI for simple concepts scored above 65%. Managers should not pressure engineers to use AI for endless productivity. Forcing top speed means workers lose the ability to debug systems later. ---- Paper Link – arxiv. org/abs/2601.20245 Paper Title: "How AI Impacts Skill Formation"
Rohan Paul tweet media
English
90
184
741
47.2K
AIGuys
AIGuys@RealAIGuys·
@karpathy Isn't this like Neural Architecture Search, but with LLMs?
English
0
0
0
13
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
968
2.1K
19.5K
3.6M
AIGuys
AIGuys@RealAIGuys·
The world needs better @LinkedIn. Qty over quality shouldn't be prioritized. With LLMs, it's child play to say the most generic things, and these days it has become just generic LLM content all over the place.
English
0
0
2
32
AIGuys
AIGuys@RealAIGuys·
@GaryMarcus And here's one of them in nutshell.
AIGuys tweet media
English
0
0
0
26
Gary Marcus
Gary Marcus@GaryMarcus·
My time here been a failure. I tried to get the Twitterverse to wake up before things got bad. Now we are here. Things are bad. And about to get worse. Most people still don’t realize how bad. It’s not that AI is inherently impossible or immoral. It’s that most of the people pushing it don’t give a damn.
English
173
132
1.1K
85.7K
AIGuys
AIGuys@RealAIGuys·
@sama in nutshell. When do we have AGI, when the AI god says so. @rao2z #AI #AGI
AIGuys tweet media
English
0
0
0
27
AIGuys retweetledi
Tech with Mak
Tech with Mak@techNmak·
The person who built Claude Code just mass-leaked the thinking behind it. 45 minutes of design decisions, mistakes, and where it's all going. This is rare. Creators at this level don't usually talk this openly.
English
82
679
5.7K
538.9K
AIGuys
AIGuys@RealAIGuys·
@GaryMarcus We missed you @GaryMarcus, I know this place can get toxic very soon, but there are always people who value your opinions and takes. And I'm one of them.
English
0
0
0
21
AIGuys
AIGuys@RealAIGuys·
@WesRoth And somehow no one really answers what the hell people will actually do without work, not everyone is artist or social worker and most people will descent into chaos without work. No one is able to justify why do we need so much automation, especially on creative tasks?
English
0
0
0
50
Wes Roth
Wes Roth@WesRoth·
Anthropic CEO on the transition from human-AI collaboration (Centaur) to full AI dominance. Amodei warns that while demand for coders might spike briefly, the "Centaur" window could be very short before full automation.
English
77
149
1.4K
261.4K
AIGuys
AIGuys@RealAIGuys·
The current state of AI safety is, frankly, a bit of a joke. We spend billions on RLHF and DPO, but all we’ve really built is a more polite mask for the same underlying danger. medium.com/aiguys/why-ai-…
English
0
0
0
66
AIGuys
AIGuys@RealAIGuys·
The current state of AI safety is, frankly, a bit of a joke. 💰 We spend billions on RLHF and DPO, but all we’ve really built is a more polite mask for the same underlying danger. medium.com/aiguys/why-ai-…
English
0
0
0
70
AIGuys
AIGuys@RealAIGuys·
🧐 The current state of AI safety is, frankly, a bit of a joke. 💰 We spend billions on RLHF and DPO, but all we’ve really built is a more polite mask for the same underlying danger. medium.com/aiguys/why-ai-…
English
0
0
0
56
AIGuys
AIGuys@RealAIGuys·
Why does @GeminiApp struggles to write big code in its UI? After 800-1000 lines it surely misses something. I know it is capable of doing that, but not in UI, and I don't know why!
English
0
0
0
33
AIGuys
AIGuys@RealAIGuys·
Not all task benefits equally by vibe coding. For big complex code bases, vibe coding might actually hurt performance and speed, if not used properly. medium.com/aiguys/no-vibe…
English
0
0
0
51
AIGuys
AIGuys@RealAIGuys·
Not all task benefits equally by vibe coding. For big complex code bases, vibe coding might actually hurt performance and speed, if not done properly. medium.com/aiguys/no-vibe…
English
0
0
0
47
AIGuys
AIGuys@RealAIGuys·
Not all task benefits equally by vibe coding. For big complex code bases, vibe coding might actually hurt performance and speed, if not done properly. medium.com/aiguys/no-vibe…
AIGuys tweet media
English
0
0
1
74