hkey

332 posts

hkey

@hkeydesign

dev @EclipseFND

加入时间 Mayıs 2021

445 关注1.4K 粉丝

hkey@hkeydesign·2d

@chribjel @theo 30 yo man btw

Christoffer Bjelke@chribjel·2d

bro just had to add a 67 joke in the settings of t3 code lmao

English

251

10.3K

hkey@hkeydesign·3d

@_ayushbhatia @sethsetse it's not impossible

English

1.6K

Ayush@_ayushbhatia·3d

@sethsetse They surely have the payout insured. Even though it’s statistically impossible I’d be suprised if they didn’t

English

15.1K

seth@sethsetse·3d

POV: The Kalshi legal team pulling up to court after someone wins $1 Billion but forgot to read Article G Subsection 41 on page 379 of the contest rules

GIF

Kalshi@Kalshi

The $1 Billion Kalshi Perfect Bracket Challenge $1 Billion for a perfect bracket $1 Million guaranteed to the top scoring bracket $1 Million to charity and scholarships See the full rules and submit your bracket: kalshi.com/billion-dollar… No purchase or deposit required. SIG Parametrics, LLC, a member of the Susquehanna International Group of Companies, is financially backing this promotion.

English

5.4K

543.2K

hkey@hkeydesign·3d

what about reselling the remaining usage?🙃

Elvis@elvissun

if you have unused weekly limits the best way to burn them is just spamming fan-out deep research in cc/codex: - 0 review cycles needed - context-dense files you reuse forever - no slop generated (it's source material, not final output) - feeds into content, product, marketing or competitor intel later ran 22 parallel research agents to burn through ~15% of weekly usage in 20 minutes. tokens very well spent.

English

170

hkey@hkeydesign·10 Mar

@0xSydney tuff

English

120

sydney@0xSydney·10 Mar

sneak peak

English

3.9K

hkey@hkeydesign·10 Mar

@karpathy This feels like auto ml

English

181

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

959

2.1K

19.3K

3.4M

hkey@hkeydesign·8 Mar

@Teknium .@humanapi

QAM

110

Teknium (e/λ)@Teknium·8 Mar

What's the Human API twitter again? Need to get those humans integrated as a tool in the agent...

English

4.8K

hkey@hkeydesign·8 Mar

@vmfunc @discord "custom crypto algorithm"

English

475

celeste@vmfunc·8 Mar

following the whole @discord and persona drama, decided to make our own chat app fully encrypted, faster than signal, element, etc, custom crypto algorithm. flly free and open source stay tuned!!

English

583

70.9K

hkey@hkeydesign·6 Mar

@To_The_Mars_ @NousResearch @elevenlabs I think it should be cheaper than this. I'll try kimi k2.5

English

beebee.base.eth@To_The_Mars_·6 Mar

@hkeydesign @NousResearch @elevenlabs Very cool and quite cheap!

English

hkey@hkeydesign·6 Mar

created these 3b1b style videos using @NousResearch's Hermes agent. I used @elevenlabs for TTS and manim for the animations. hermes successfully compiled everything and produced the videos!

English

221

hkey@hkeydesign·6 Mar

sha-256 explanation. cost per video is ~$4 with opus 4.6

English

hkey@hkeydesign·5 Mar

@shawmakesmagic this is called HNDL; harvest now, decrypt later

English

Shaw (spirit/acc)@shawmakesmagic·4 Mar

If you can exfiltrate encrypted data now You will be able to decrypt it in 20 years

English

6.4K

hkey@hkeydesign·4 Mar

@archivepilled 🙏🙏

QME

172

6.3K

lowbie@archivepilled·3 Mar

Introducing: Number Research Inc. At Number Research Inc., we are attempting to find and document all* available numbers. This is a volunteer-lead research position, where anyone is able to contribute. Simply type a number in, and we'll check if we've got it. If we have, no worries, just try another. If it is a new number, then thank you for your hard work!

English

312

260

4.1K

398.1K

hkey@hkeydesign·3 Mar

@thestanduppod lol

TheStandupPod@thestanduppod·3 Mar

Why OpenClaw users buy Mac minis

English

265

334

6.4K

364.5K

hkey@hkeydesign·3 Mar

hkey@hkeydesign

I can confirm that Eclipse is AI-powered

ZXX

189

hkey@hkeydesign·2 Mar

does @eigencloud really offer temperature=0 as a service?

English

198

hkey@hkeydesign·1 Mar

@fikunmi_ap yeah claude code/hermes can run commands on your computer, but I was sick of waiting 5 seconds for a simple command. chatgpt has always been my first choice for quick debugging tasks. I can do things like that too :3

English

fikunmi@fikunmi_ap·1 Mar

@hkeydesign Isn't that what agent tuis like Claude code are for

English

hkey@hkeydesign·28 Şub

jumping between chatgpt and terminal was slowing me down. didn't want to spin up cursor just to run a command built a small extension instead.

English

340

hkey@hkeydesign·1 Mar

@thdxr only one country has nukes = bad everyone has nukes = good

English

2.2K

dax@thdxr·1 Mar

i respect everyone worried about AI safety and believe their concerns are genuine but we work to make open source AI more of a thing so that it's not owned by a small group of people this also makes it accessible to bad actors the two goals are totally incompatible

English

939

93.7K

hkey@hkeydesign·28 Şub

@thekitze it's discord. it shows the actual time the message was sent on the left

English

337

kitze 🛠️ tinkerer.club@thekitze·28 Şub

i asked openclaw to do a cron job every 1 min since last night and it completely lost its mind 🤣 i wish cron jobs were stable...

English

166

30K

hkey@hkeydesign·27 Şub

@yatharthmaan @protosphinx training an LLM is easy, but making it "intelligent" as gpt4 is hard

English

548

Yatharth@yatharthmaan·27 Şub

@protosphinx I don't think these people understand how many resources it takes to train an LLM from scratch.

English

sphinx@protosphinx·27 Şub

No, he didn’t train his own LLM. He fine-tuned Qwen2.5-Coder-32B and essentially bench-maxxed it. That’s a hard engineering problem - not taking that away from him - but it’s nowhere close to training a model from scratch.

alex fazio@alxfazio

pewdiepie just trained his own llm, and it beats gpt-4o on coding benchmarks. an apocalyptic, civilization-ending catastrophe of laughably, cosmically disproportionate magnitude for the entire ml research job category

English

105

2.8K

110K

发现

@chribjel @theo @_ayushbhatia @sethsetse @0xSydney @karpathy @Teknium @humanapi