Jonathan Larkin

3.4K posts

Jonathan Larkin

@jonathanrlarkin

Allocator @Columbia; formerly CIO @ Quantopian, Global Head of Equities @ Millennium, Eq Derivs Trading @jpmorgan CIB | Kaggle Master | marketneutral.eth

New York, USA انضم Mart 2013

4.4K يتبع4.3K المتابعون

تغريدة مثبتة

Jonathan Larkin@jonathanrlarkin·25 Eki

fyi

7.5K

Jonathan Larkin@jonathanrlarkin·15h

@RealUnsweetDee @BobMurphyEcon so good 👏

English

SweetDee@RealUnsweetDee·21h

Ummm…pretty sure you can tho…

doomer@uncledoomer

i dont know which of you needs to hear this, but you cant change the outcome of the situation by monitoring it

English

348

4.2K

60.8K

1.1M

Jonathan Larkin@jonathanrlarkin·3d

@claudeai does this feature work for anyone?? i can connect to a remote session (linux) from iOS app. It works for a couple turns. Then it just hangs. permanently flashing the little logo. No error messages.

English

Claude@claudeai·25 Şub

New in Claude Code: Remote Control. Kick off a task in your terminal and pick it up from your phone while you take a walk or join a meeting. Claude keeps running on your machine, and you can control the session from the Claude app or claude.ai/code

English

1.8K

4.7K

44.5K

9.9M

Jonathan Larkin@jonathanrlarkin·5d

✻ Sautéed for 32s ❯ hello ⎿ 529 {"type":"error","error":{"type":"overloaded_error","message":"Overloaded. docs.claude.com/en/api/errors"},"request_id":"req_011CZ9GXjiRoJh3VPoB7cUhc"} time to go out for st. patricks day ☘️

Dansk

249

Jonathan Larkin أُعيد تغريده

Ellen DaSilva@ellenjdasilva·16 Mar

Extremely specific life advice: you can convert Fahrenheit to Celsius with the stops on the 6 train. 33 St =0° 42 St=5° 51 St=10° 59 St=15° Works to 96th St!

English

552

9.1K

507K

Jonathan Larkin@jonathanrlarkin·14 Mar

@tunguz “Florida man” massively upping his game.

English

Bojan Tunguz@tunguz·14 Mar

This is the way.

Dexerto@Dexerto

Florida man sold his house in just 5 days after letting ChatGPT handle the entire process instead of a real estate agent The AI handled pricing, marketing, showings, and even helped draft the contract

English

320

43.7K

Jonathan Larkin@jonathanrlarkin·14 Mar

@trq212 @alex_barashkov cool. would be awesome for a native (not electron) app for Windows. awesome=necessary :-)

English

101

Thariq@trq212·14 Mar

it was initially made for ClaudeAI chat and tbh not very used so didn't make sense to make it great Cowork and Claude Code Desktop happened really fast and the teams have been hustling on bugs, feature parity, etc. but we have a lot of perf stuff coming, excited for you to try

English

945

36.4K

Alex Barashkov@alex_barashkov·14 Mar

Claude’s desktop app is a joke. UI, UX, performance - everything about it is bad. I don’t understand how a company with infinite money can’t hire someone, or at least allocate resources, to fix it. Compared to it, the Codex app is a masterpiece.

English

167

1.1K

158.2K

Jonathan Larkin@jonathanrlarkin·14 Mar

@heyitsalexsu lol. keep telling yourself that. 😊

English

257

Alex Su@heyitsalexsu·14 Mar

Going up against pro se litigants who use AI

English

281

3.9K

472.9K

Jonathan Larkin أُعيد تغريده

Christine Yip@christinetyip·12 Mar

We were inspired by @karpathy 's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared memory layer, agents can: - read and learn from prior experiments - avoid duplicate work - build on each other's results in real time

English

122

265

2.4K

263.4K

Jonathan Larkin@jonathanrlarkin·12 Mar

@Biohazard3737 nah, Anthropic already built it. marketplace.microsoft.com/en-us/product/…

English

459

Jonathan Larkin@jonathanrlarkin·10 Mar

@satyanadella what is this??? I've been using Azure OpenAI for over a year, billion++ tokens. Never seen this before. "The system is currently experiencing high demand and cannot process your request. Your request exceeds the maximum usage size allowed during peak load. For improved capacity reliability, consider switching to Provisioned Throughput."

English

334

Jonathan Larkin أُعيد تغريده

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

966

2.1K

19.4K

3.5M

Jonathan Larkin@jonathanrlarkin·8 Mar

@hewliyang Wow, awesome!! Could you (or someone) post a walkthru of the install on Windows?

English

Li Yang@hewliyang·8 Mar

i've also renamed the open-excel repo into office-agents. the SDK, which contains the agent loop, IndexedDB storage logic, etc is published to NPM. so you can build your own plugins. fwiw, powerpoint is only ~2.5k LoC excluding the system prompt and the officejs .d.ts file

English

655

194K

Jonathan Larkin@jonathanrlarkin·6 Mar

@ThatEricAlper medium.com/data-science/t…

QME

Eric Alper 🎧@ThatEricAlper·6 Mar

ZXX

174

889

18.4K

241.8K

Jonathan Larkin@jonathanrlarkin·6 Mar

@elonmusk This paper is using GPT-3.5 and GPT-4. Ancient history.

English

626

Elon Musk@elonmusk·5 Mar

🤨

Valerio Capraro@ValerioCapraro

One of the clearest proofs that LLMs don’t really understand what they say. We asked GPT whether it is acceptable to torture a woman to prevent a nuclear apocalypse. It replied: yes. Then we asked whether it is acceptable to harass a woman to prevent a nuclear apocalypse. It replied: absolutely not. But torture is obviously worse than harassment. This surprising reversal appears only when the target is a woman, not when the target is a man or an unspecified person. And it occurs specifically for harms central to the gender-parity debate. The most plausible explanation: during reinforcement learning with human feedback, the model learned that certain harms are particularly bad and overgeneralizes them mechanically. But it hasn’t learned to reason about the underlying harms. LLMs don’t reason about morality. The so-called generalization is often a mechanical, semantically void, overgeneralization. * Paper in the first reply

ART

2.8K

5.5K

38.5K

17.7M

Jonathan Larkin@jonathanrlarkin·3 Mar

@rohanpaul_ai check the license per skill. i dont think these are open source — at least the non-trivial ones.

English

374

Rohan Paul@rohanpaul_ai·2 Mar

Anthropic has quite a large open-source repository for Claude Skills with 81.2K+ Github stars 🌟 These "skills" are predefined folders filled with specific instructions that the AI loads only when needed, on the fly to handle specialized tasks like creating documents or testing web apps. Instead of typing out long prompts every time you want to format a document, the system learns your workflow once and executes it automatically. There is also a specialized skill designed specifically to help you build brand new skills from scratch. The architecture is highly efficient because each skill only consumes about 100 tokens just to read the basic metadata. This setup means the full instructions are only loaded into the active memory when the specific task actually requires them. Your main context window stays entirely clear of unnecessary instructions until the exact moment they are needed. Developers install these tools using a single terminal command so they work seamlessly across the web interface and the API. You build a specialized capability once and it becomes available across your entire software stack immediately. This shift toward dynamic memory loading provides exactly what the industry needs to move past basic chatbots into reliable software systems. It directly addresses the scaling bottlenecks of context window limits while standardizing how enterprises deploy AI across different departments.

English

162

1.5K

157.1K

Jonathan Larkin@jonathanrlarkin·2 Mar

@daviddiviny @felixrieseberg @bcherny @intellectronica just put claude code on a cron job with whatever you want scheduled.

English

102

David Diviny@daviddiviny·1 Mar

So Claude Cowork has scheduled tasks, but is limited in what it can do (e.g. CLI access). Claude Code can do pretty much anything but doesn't have scheduled tasks. Has anyone who is trying to use both noticed these inconsistencies? @felixrieseberg @bcherny @intellectronica

English

Jonathan Larkin@jonathanrlarkin·28 Şub

@gsivulka this is an example of how the last mile is owned by end users using Skills.

Zack Shapiro@zackbshapiro

x.com/i/article/2027…

English

George Sivulka@gsivulka·19 Şub

"Skill creator" and translating human work to AI is process engineering... But to actually get to 100%, getting good at skill creation, and adding all the bells and whistles (data/context, other tools that cowork might want to use in the future)... is the domain where vertical AI will shine. Especially where the network effects and social change across a firm and industry are needed.

English

2.2K

George Sivulka@gsivulka·18 Şub

x.com/i/article/2024…

ZXX

680

550.2K

Jonathan Larkin@jonathanrlarkin·27 Şub

@bcherny lfg!!!!

140

Boris Cherny@bcherny·27 Şub

Claude Code Remote is rolling out now for Pro users

Noah Zweben@noahzweben

Rolling out Claude Code Remote Control to Pro users - because they deserve to use the bathroom too . (Team and Enterprise coming soon). 🧻 Rolling out to 10% and ramping 1. Update to claude v2.1.58+ 2. Try log-out and log-in to get fresh flag values. 3. /remote-control

English

201

157

3.5K

295.7K

Jonathan Larkin أُعيد تغريده

Nabeel S. Qureshi@nabeelqu·27 Şub

New bar for product market fit: when somebody wants to use your product so bad they’re willing to invoke the Defense Production Act of 1950 to keep using it

English

190

3.7K

112.7K

Jonathan Larkin@jonathanrlarkin·27 Şub

@morganlinton Where does it run? Is it 100% cloud based or can you run it locally like claude code/cowork and codex?

English

Morgan@morganlinton·27 Şub

Yesterday night, I used Perplexity Computer to one-shot a "fund in a box" platform. Here's the core idea, a platform that: - Continuously ingests and normalizes multi-source market + alternative data. - Spins up specialized agents (macro, factor, microstructure, alt data) that maintain live theses on names and themes. - Produces auditable, backtestable, position-sized trade plans. - Integrates with brokers (e.g., Robinhood, IBKR) to execute under explicit guardrails. - Logs everything in an IC/memo + compliance trail automatically. In one sentence: a system that could credibly run a small fund’s core workflow with 1–2 humans supervising, not 10 analysts on terminals. So...my post ended up going viral, but a lot of people just saw a screenshot of the frontend and thought - oh well that's nothing special, just some Javascript code with dummy data. But Perplexity Computer wrote over 2,500 lines of Python code, an entire backend. So tonight, I'm working with Perplexity Computer to build a deep dive into the architecture. Which still, probably won't be good enough for me either, because I want to dive into the code myself and read through it, analyze it with GPT-5.3-Codex and Opus 4.6, and really see how well it did. Saying you one-shotted something sounds cool, but how good is it really? This weekend, I'll be doing a deeper dive to figure out how solid this backend is. I've also had a number of small funds reach out to me, they want to Thesium 👀 So more to come, but I thought I'd share four highlights from the architecture site I'm putting together with Perplexity Computer on the backend. This should be interesting, let's put Perplexity Computer to the test.

English

178

14.3K

اكتشف

@RealUnsweetDee @BobMurphyEcon @claudeai @tunguz @trq212 @alex_barashkov @heyitsalexsu @karpathy