Jonas Eschmann

709 posts

Jonas Eschmann

@jonas_eschmann

PhD student @UCBerkeley Working on reinforcement learning for continuous control @rl_tools

Berkeley, CA Katılım Mart 2023

1.1K Takip Edilen665 Takipçiler

Sabitlenmiş Tweet

Jonas Eschmann@jonas_eschmann·17 Eyl

We present RAPTOR! 🚁 A tiny foundation policy flying any quadrotor 🦾 Tested on 10 real quadrotors from 32g to 2.4kg ⚡️ Adapts within milliseconds, zero-shot ⚙️ Runs inside PX4/Betaflight/etc.

English

102

790

68.3K

Jonas Eschmann@jonas_eschmann·14h

@docmilanfar *an amended william gibson quote :)

English

337

Peyman Milanfar@docmilanfar·14h

@jonas_eschmann that's not another perspective. that's a william gibson quote

English

1.8K

Peyman Milanfar@docmilanfar·15h

Karpathy is misunderstood. he isn’t an oracle; he’s a teacher. we credit him with forecasting the weather, but he’s really just looking out the window and describing the rain very well. he’s an absolutely brilliant explainer. but don’t look to him as a clairvoyant seer of future

English

917

47.1K

Jonas Eschmann@jonas_eschmann·19 Nis

@SzymonOzog_ Would love to see Pallas in this picture

English

159

SzymonOzog@SzymonOzog_·19 Nis

Is all you need

English

3.8K

Jonas Eschmann@jonas_eschmann·15 Nis

@kalomaze Summer 24 4o was my soulmate ngl

English

kalomaze@kalomaze·15 Nis

echoes of that one deprecated gemini ckpt, and early pre-sycophancy 4o checkpoints (yes, the early ones that introduced the em dash pre-RLHF makeover, and also seemed to really really like curly quotes beyond all logical justification)

English

2.7K

kalomaze@kalomaze·15 Nis

opus 4.5 was better

English

7.1K

Jonas Eschmann@jonas_eschmann·9 Nis

@kalomaze @snwy_me for us it was Dark_Alex but I might be dating myself

English

kalomaze@kalomaze·8 Nis

@snwy_me SciresM 🐐

English

1.8K

kalomaze@kalomaze·8 Nis

the claude mythos thing where it apparently found a way to get full kernel access via execution of normal javascript on an ordinary web page. dear God

English

107

212

536.8K

Jonas Eschmann@jonas_eschmann·6 Nis

@8teAPi Back when monitoring The Situation was still fun

English

341

Prakash@8teAPi·5 Nis

The sad sad thing is that I’ll likely never write anything as good LK-99. The universe co-operated with me miraculously live for a few days.

English

11.7K

Jonas Eschmann@jonas_eschmann·5 Nis

@yacineMTB @rl_tools Thank you! Looking forward to see what you are building! Seems like you have all the right ingredients/takes!

English

kache@yacineMTB·5 Nis

@jonas_eschmann @rl_tools hey, i've referenced your work! thank you for it!

English

Jonas Eschmann@jonas_eschmann·5 Nis

One of the main premises of @rl_tools: All sizes of tensors and loops are known at compile time

kache@yacineMTB

Dynamic allocation / heap allocation is enemy number one. If your computer program is well designed, you should know how much resources it is going to take up before you run it. If you don't, then it isn't a good program Allocate everything on the stack

English

1.8K

Jonas Eschmann@jonas_eschmann·25 Mar

@karpathy 1/100 reasons @rl_tools has no core dependencies

English

Andrej Karpathy@karpathy·24 Mar

Software horror: litellm PyPI supply chain attack. Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords. LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm. Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks. Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages. Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.

Daniel Hnyk@hnykda

LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below

English

1.4K

5.4K

28.1K

66.5M

Jonas Eschmann@jonas_eschmann·24 Mar

@__tinygrad__ This is the way! Slap OpenEvolve on it and call it a day!

English

the tiny corp@__tinygrad__·23 Mar

And it's not close. It's 1.8x times faster. This is using the tinygrad DSL. The replacement for BEAM will be LLM.

English

114

9.6K

the tiny corp@__tinygrad__·23 Mar

It's important to confirm your library can be used by LLMs. That LLM coded flash attention in tinygrad outperforms the AOTriton one in PyTorch on my AMD Strix Halo.

harshbajpai@bajpaiharsh244

Haha, geohot is tagging PRs with the line "ai slop" XD

English

399

72.6K

Jonas Eschmann@jonas_eschmann·10 Mar

@ptrschmdtnlsn „Harvard remains squarely focused on achieving its dual-mandate goals of maximum employment and inflated grades for the benefit of the upcoming elites, their parents who spend 65k a year, the endowment and the legacy admissions.“

English

335

Peter Schmidt-Nielsen@ptrschmdtnlsn·9 Mar

I kind of think that Harvard should just target 2% grade inflation per year forever, and start giving out letters above A. It's fine if the average GPA is a 7.0 at Harvard in 30 years, so long there's still variation and therefore signal.

English

2.6K

124.5K

Jonas Eschmann@jonas_eschmann·8 Mar

@SamoBurja Might as well complete the Blue Banana: en.wikipedia.org/wiki/Blue_Bana…

English

1.4K

Samo Burja@SamoBurja·8 Mar

Greater Switzerland has never been tried.

English

608

423

7.2K

1.3M

Jonas Eschmann@jonas_eschmann·7 Mar

@karpathy Thank you for using BPB! The focus on loss/perplexity was a mistake because it confounds with the tokenizer.

English

177

Andrej Karpathy@karpathy·6 Mar

ah yes, this is what post-agi feels like :) i didn't touch anything. brb sauna

English

174.6K

Andrej Karpathy@karpathy·6 Mar

nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.

English

335

563

6.5K

627.9K

Jonas Eschmann@jonas_eschmann·6 Mar

The Rule of Three is dead

English

127

Jonas Eschmann@jonas_eschmann·4 Mar

@m_sirovatka @giffmana I call it fractional reserve computing, if it has a scheduler I don’t want it

English

2.7K

Matej Sirovatka@m_sirovatka·4 Mar

at this point pretraining researchers can start shooting perf engineers in the head

Gökdeniz Gülmez@ActuallyIsaak

Today I’m sharing a new research paper that explores a new idea in mixture of experts architecture called “DynaMoE”. DynaMoE is a Mixture-of-Experts framework where: - the number of active experts per token is dynamic. - the number of all experts can be scheduled differently across layers. From my findings the best model has a descending expert scheduler, where beginning layers have the most experts and the end layer have the least (1 expert). This removes the rigid Top-K routing used in most MoE models and improves parameter efficiency and training stability. Paper: arxiv.org/abs/2603.01697

English

624

82.7K

Jonas Eschmann@jonas_eschmann·4 Mar

@geofflangdale @analytichegel Depends on the kind of ML you are doing @rl_tools wink wink

English

226

Geoff Langdale@geofflangdale·4 Mar

@analytichegel It's not a moronic thing to think that, since C++ is generally much faster than Python, that somehow doing ML in C++ might be faster than doing it in Python. It's uninformed (given the bottleneck is in already-tuned libraries), but hardly moronic.

English

Jules@analytichegel·3 Mar

The concentration of morons on tech twitter is truely astounding

lyv ⌘@wholyv

What if C++ was used for machine learning instead of Python. given C++ evolved to allow that. I wonder how faster, our AI models would be these days.

English

2.1K

139.8K

Jonas Eschmann@jonas_eschmann·1 Mar

@andrewgwils I think it is also connected to the junkie-level consumption. Music hits completely differently after a few weeks of abstinence.

English

131

Andrew Gordon Wilson@andrewgwils·28 Şub

Sometimes people proudly announce to me that they "don't get music", that it "does nothing for them". I feel truly bad for that. It's another sense, another dimension. For me, nothing else is as moving or thought provoking. The most beautiful writing is as close as it gets.

English

Jonas Eschmann@jonas_eschmann·26 Şub

The Qwen numbers roughly suggest dense > MoE for reasoning/intelligence and MoE > dense for knowledge. Seems plausible given that routing is not perfect. I believe having some ginormous model (probably MoE) for prefill and a lightweight decoder that taps into that knowledge via cross-attention might be the way to go. As Noams scriptures foretold

English

You Jiacheng@YouJiacheng·25 Şub

Dense 27B > MoE 397B-A17B? huge.

Victor M@victormustar

wow Qwen3.5-27B score on Humanity's Last Exam 🚀

English

6.8K

Jonas Eschmann@jonas_eschmann·23 Şub

@Citrini7 We all know the “solution” to the Moltdown will start with H and end with oney

English

117

Citrini@citrini·22 Şub

JUNE 2028. The S&P is down 38% from its highs. Unemployment just printed 10.2%. Private credit is unraveling. Prime mortgages are cracking. AI didn’t disappoint. It exceeded every expectation. What happened? citriniresearch.com/p/2028gic

English

1.9K

4.2K

27.8K

28.7M

Jonas Eschmann@jonas_eschmann·21 Şub

@ValKatayev

QME

Val Katayev@ValKatayev·20 Şub

Took a Delta flight from NYC to Caribbean. They overbooked it so Delta started to offer $$ for 4 seats to move to the next available flight. Legally they must keep going up until someone takes it. Here’s the outcome: 1st person took $400 (flight available in 3 hours) 2nd and 3rd around $1500-2000 (I think also booked to flight 3 hours later) The 4th seat took a long time to find a taker….offer kept going up. This one was a rebook to next day morning flight. $2500 - no taker $3000 - no taker $4000 - no taker $5000 - no taker $6000 - no taker It took $7,000 for someone in economy to give up a day.

English

429

741

32.9K

2.3M

Keşfet

@docmilanfar @SzymonOzog_ @kalomaze @snwy_me @8teAPi @yacineMTB @rl_tools @karpathy