hkey

332 posts

hkey banner
hkey

hkey

@hkeydesign

dev @EclipseFND

加入时间 Mayıs 2021
445 关注1.4K 粉丝
Christoffer Bjelke
Christoffer Bjelke@chribjel·
bro just had to add a 67 joke in the settings of t3 code lmao
Christoffer Bjelke tweet media
English
11
1
251
10.3K
Ayush
Ayush@_ayushbhatia·
@sethsetse They surely have the payout insured. Even though it’s statistically impossible I’d be suprised if they didn’t
English
2
0
22
15.1K
seth
seth@sethsetse·
POV: The Kalshi legal team pulling up to court after someone wins $1 Billion but forgot to read Article G Subsection 41 on page 379 of the contest rules
GIF
Kalshi@Kalshi

The $1 Billion Kalshi Perfect Bracket Challenge $1 Billion for a perfect bracket $1 Million guaranteed to the top scoring bracket $1 Million to charity and scholarships See the full rules and submit your bracket: kalshi.com/billion-dollar… No purchase or deposit required. SIG Parametrics, LLC, a member of the Susquehanna International Group of Companies, is financially backing this promotion.

English
34
81
5.4K
543.2K
sydney
sydney@0xSydney·
sneak peak
sydney tweet media
English
6
3
29
3.9K
hkey
hkey@hkeydesign·
@karpathy This feels like auto ml
English
0
0
5
181
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
959
2.1K
19.3K
3.4M
Teknium (e/λ)
Teknium (e/λ)@Teknium·
What's the Human API twitter again? Need to get those humans integrated as a tool in the agent...
English
8
0
59
4.8K
celeste
celeste@vmfunc·
following the whole @discord and persona drama, decided to make our own chat app fully encrypted, faster than signal, element, etc, custom crypto algorithm. flly free and open source stay tuned!!
celeste tweet media
English
84
38
583
70.9K
hkey
hkey@hkeydesign·
created these 3b1b style videos using @NousResearch's Hermes agent. I used @elevenlabs for TTS and manim for the animations. hermes successfully compiled everything and produced the videos!
English
2
0
6
221
hkey
hkey@hkeydesign·
sha-256 explanation. cost per video is ~$4 with opus 4.6
English
0
0
2
84
hkey
hkey@hkeydesign·
@shawmakesmagic this is called HNDL; harvest now, decrypt later
English
0
0
2
77
Shaw (spirit/acc)
Shaw (spirit/acc)@shawmakesmagic·
If you can exfiltrate encrypted data now You will be able to decrypt it in 20 years
English
15
2
63
6.4K
lowbie
lowbie@archivepilled·
Introducing: Number Research Inc. At Number Research Inc., we are attempting to find and document all* available numbers. This is a volunteer-lead research position, where anyone is able to contribute. Simply type a number in, and we'll check if we've got it. If we have, no worries, just try another. If it is a new number, then thank you for your hard work!
lowbie tweet media
English
312
260
4.1K
398.1K
TheStandupPod
TheStandupPod@thestanduppod·
Why OpenClaw users buy Mac minis
English
265
334
6.4K
364.5K
hkey
hkey@hkeydesign·
does @eigencloud really offer temperature=0 as a service?
hkey tweet media
English
0
0
6
198
hkey
hkey@hkeydesign·
@fikunmi_ap yeah claude code/hermes can run commands on your computer, but I was sick of waiting 5 seconds for a simple command. chatgpt has always been my first choice for quick debugging tasks. I can do things like that too :3
hkey tweet media
English
0
0
2
82
fikunmi
fikunmi@fikunmi_ap·
@hkeydesign Isn't that what agent tuis like Claude code are for
English
1
0
0
70
hkey
hkey@hkeydesign·
jumping between chatgpt and terminal was slowing me down. didn't want to spin up cursor just to run a command built a small extension instead.
English
1
0
9
340
hkey
hkey@hkeydesign·
@thdxr only one country has nukes = bad everyone has nukes = good
English
0
0
39
2.2K
dax
dax@thdxr·
i respect everyone worried about AI safety and believe their concerns are genuine but we work to make open source AI more of a thing so that it's not owned by a small group of people this also makes it accessible to bad actors the two goals are totally incompatible
English
62
21
939
93.7K
hkey
hkey@hkeydesign·
@thekitze it's discord. it shows the actual time the message was sent on the left
English
0
0
2
337
kitze 🛠️ tinkerer.club
kitze 🛠️ tinkerer.club@thekitze·
i asked openclaw to do a cron job every 1 min since last night and it completely lost its mind 🤣 i wish cron jobs were stable...
English
39
6
166
30K
Yatharth
Yatharth@yatharthmaan·
@protosphinx I don't think these people understand how many resources it takes to train an LLM from scratch.
English
4
0
57
5K
sphinx
sphinx@protosphinx·
No, he didn’t train his own LLM. He fine-tuned Qwen2.5-Coder-32B and essentially bench-maxxed it. That’s a hard engineering problem - not taking that away from him - but it’s nowhere close to training a model from scratch.
alex fazio@alxfazio

pewdiepie just trained his own llm, and it beats gpt-4o on coding benchmarks. an apocalyptic, civilization-ending catastrophe of laughably, cosmically disproportionate magnitude for the entire ml research job category

English
48
105
2.8K
110K