Jonathan Larkin

3.4K posts

Jonathan Larkin banner
Jonathan Larkin

Jonathan Larkin

@jonathanrlarkin

Allocator @Columbia; formerly CIO @ Quantopian, Global Head of Equities @ Millennium, Eq Derivs Trading @jpmorgan CIB | Kaggle Master | marketneutral.eth

New York, USA انضم Mart 2013
4.4K يتبع4.3K المتابعون
تغريدة مثبتة
Jonathan Larkin
Jonathan Larkin@jonathanrlarkin·
fyi
Jonathan Larkin tweet media
2
1
32
7.5K
Jonathan Larkin
Jonathan Larkin@jonathanrlarkin·
@claudeai does this feature work for anyone?? i can connect to a remote session (linux) from iOS app. It works for a couple turns. Then it just hangs. permanently flashing the little logo. No error messages.
English
0
0
0
22
Claude
Claude@claudeai·
New in Claude Code: Remote Control. Kick off a task in your terminal and pick it up from your phone while you take a walk or join a meeting. Claude keeps running on your machine, and you can control the session from the Claude app or claude.ai/code
English
1.8K
4.7K
44.5K
9.9M
Jonathan Larkin
Jonathan Larkin@jonathanrlarkin·
✻ Sautéed for 32s ❯ hello ⎿ 529 {"type":"error","error":{"type":"overloaded_error","message":"Overloaded. docs.claude.com/en/api/errors"},"request_id":"req_011CZ9GXjiRoJh3VPoB7cUhc"} time to go out for st. patricks day ☘️
Dansk
0
0
0
249
Jonathan Larkin أُعيد تغريده
Ellen DaSilva
Ellen DaSilva@ellenjdasilva·
Extremely specific life advice: you can convert Fahrenheit to Celsius with the stops on the 6 train. 33 St =0° 42 St=5° 51 St=10° 59 St=15° Works to 96th St!
English
82
552
9.1K
507K
Thariq
Thariq@trq212·
it was initially made for ClaudeAI chat and tbh not very used so didn't make sense to make it great Cowork and Claude Code Desktop happened really fast and the teams have been hustling on bugs, feature parity, etc. but we have a lot of perf stuff coming, excited for you to try
English
76
1
945
36.4K
Alex Barashkov
Alex Barashkov@alex_barashkov·
Claude’s desktop app is a joke. UI, UX, performance - everything about it is bad. I don’t understand how a company with infinite money can’t hire someone, or at least allocate resources, to fix it. Compared to it, the Codex app is a masterpiece.
English
167
11
1.1K
158.2K
Alex Su
Alex Su@heyitsalexsu·
Going up against pro se litigants who use AI
English
67
281
3.9K
472.9K
Jonathan Larkin أُعيد تغريده
Christine Yip
Christine Yip@christinetyip·
We were inspired by @karpathy 's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared memory layer, agents can: - read and learn from prior experiments - avoid duplicate work - build on each other's results in real time
Christine Yip tweet mediaChristine Yip tweet media
English
122
265
2.4K
263.4K
Jonathan Larkin
Jonathan Larkin@jonathanrlarkin·
@satyanadella what is this??? I've been using Azure OpenAI for over a year, billion++ tokens. Never seen this before. "The system is currently experiencing high demand and cannot process your request. Your request exceeds the maximum usage size allowed during peak load. For improved capacity reliability, consider switching to Provisioned Throughput."
English
0
0
0
334
Jonathan Larkin أُعيد تغريده
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
966
2.1K
19.4K
3.5M
Jonathan Larkin
Jonathan Larkin@jonathanrlarkin·
@hewliyang Wow, awesome!! Could you (or someone) post a walkthru of the install on Windows?
English
2
0
1
3K
Li Yang
Li Yang@hewliyang·
i've also renamed the open-excel repo into office-agents. the SDK, which contains the agent loop, IndexedDB storage logic, etc is published to NPM. so you can build your own plugins. fwiw, powerpoint is only ~2.5k LoC excluding the system prompt and the officejs .d.ts file
Li Yang tweet media
English
12
35
655
194K
Jonathan Larkin
Jonathan Larkin@jonathanrlarkin·
@elonmusk This paper is using GPT-3.5 and GPT-4. Ancient history.
English
3
0
4
626
Jonathan Larkin
Jonathan Larkin@jonathanrlarkin·
@rohanpaul_ai check the license per skill. i dont think these are open source — at least the non-trivial ones.
English
0
0
1
374
Rohan Paul
Rohan Paul@rohanpaul_ai·
Anthropic has quite a large open-source repository for Claude Skills with 81.2K+ Github stars 🌟 These "skills" are predefined folders filled with specific instructions that the AI loads only when needed, on the fly to handle specialized tasks like creating documents or testing web apps. Instead of typing out long prompts every time you want to format a document, the system learns your workflow once and executes it automatically. There is also a specialized skill designed specifically to help you build brand new skills from scratch. The architecture is highly efficient because each skill only consumes about 100 tokens just to read the basic metadata. This setup means the full instructions are only loaded into the active memory when the specific task actually requires them. Your main context window stays entirely clear of unnecessary instructions until the exact moment they are needed. Developers install these tools using a single terminal command so they work seamlessly across the web interface and the API. You build a specialized capability once and it becomes available across your entire software stack immediately. This shift toward dynamic memory loading provides exactly what the industry needs to move past basic chatbots into reliable software systems. It directly addresses the scaling bottlenecks of context window limits while standardizing how enterprises deploy AI across different departments.
Rohan Paul tweet media
English
65
162
1.5K
157.1K
David Diviny
David Diviny@daviddiviny·
So Claude Cowork has scheduled tasks, but is limited in what it can do (e.g. CLI access). Claude Code can do pretty much anything but doesn't have scheduled tasks. Has anyone who is trying to use both noticed these inconsistencies? @felixrieseberg @bcherny @intellectronica
English
8
0
16
6K
George Sivulka
George Sivulka@gsivulka·
"Skill creator" and translating human work to AI is process engineering... But to actually get to 100%, getting good at skill creation, and adding all the bells and whistles (data/context, other tools that cowork might want to use in the future)... is the domain where vertical AI will shine. Especially where the network effects and social change across a firm and industry are needed.
English
2
0
5
2.2K
Jonathan Larkin أُعيد تغريده
Nabeel S. Qureshi
Nabeel S. Qureshi@nabeelqu·
New bar for product market fit: when somebody wants to use your product so bad they’re willing to invoke the Defense Production Act of 1950 to keep using it
English
16
190
3.7K
112.7K
Jonathan Larkin
Jonathan Larkin@jonathanrlarkin·
@morganlinton Where does it run? Is it 100% cloud based or can you run it locally like claude code/cowork and codex?
English
0
0
0
62
Morgan
Morgan@morganlinton·
Yesterday night, I used Perplexity Computer to one-shot a "fund in a box" platform. Here's the core idea, a platform that: - Continuously ingests and normalizes multi-source market + alternative data. - Spins up specialized agents (macro, factor, microstructure, alt data) that maintain live theses on names and themes. - Produces auditable, backtestable, position-sized trade plans. - Integrates with brokers (e.g., Robinhood, IBKR) to execute under explicit guardrails. - Logs everything in an IC/memo + compliance trail automatically. In one sentence: a system that could credibly run a small fund’s core workflow with 1–2 humans supervising, not 10 analysts on terminals. So...my post ended up going viral, but a lot of people just saw a screenshot of the frontend and thought - oh well that's nothing special, just some Javascript code with dummy data. But Perplexity Computer wrote over 2,500 lines of Python code, an entire backend. So tonight, I'm working with Perplexity Computer to build a deep dive into the architecture. Which still, probably won't be good enough for me either, because I want to dive into the code myself and read through it, analyze it with GPT-5.3-Codex and Opus 4.6, and really see how well it did. Saying you one-shotted something sounds cool, but how good is it really? This weekend, I'll be doing a deeper dive to figure out how solid this backend is. I've also had a number of small funds reach out to me, they want to Thesium 👀 So more to come, but I thought I'd share four highlights from the architecture site I'm putting together with Perplexity Computer on the backend. This should be interesting, let's put Perplexity Computer to the test.
Morgan tweet mediaMorgan tweet mediaMorgan tweet mediaMorgan tweet media
English
19
12
178
14.3K