John D. Patterson: [email protected]

332 posts

John D. Patterson: jdpttrsn@bsky.social banner
John D. Patterson: jdpttrsn@bsky.social

John D. Patterson: [email protected]

@jdpttrsn

AI Engineer @ Hupside, Prev. Research Professor @ Penn State University—cognitive science, computational modeling, category learning, creativity, education.

Cleveland, OH Katılım Kasım 2014
1.2K Takip Edilen263 Takipçiler
Chris Worsey
Chris Worsey@Chris_Worsey·
I took the @karpathy autoresearch loop and pointed it at markets. 25 AI agents debate macro, rates, commodities, sectors, and single stocks daily. Every recommendation scored against real outcomes. Worst agent by rolling Sharpe gets its prompt rewritten by the system. Keep or revert. Same loop, prompts are the weights, Sharpe is the loss function. Trained the agents on 18 months of market data. 378 iterations. 54 prompt modifications, 16 survived. The system learned which agents to trust using Darwinian weights — geopolitical, commodities, and the @BillAckman quality compounder rose to the top. The agents even figured out their own portfolio manager was the weakest link before we did! Deployed the trained agents. +22% in 173 days. Best pick: AVGO at $152, held for +128%. The final prompts are evolutionary products — shaped by market feedback, not human intuition. Now running live with my own capital. github.com/chrisworsey55/… Part hedge fund, part research experiment :)
Andrej Karpathy@karpathy

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English
157
232
3.9K
773.3K
Chris Laub
Chris Laub@ChrisLaubAI·
This Stanford paper just proved that 90% of prompt engineering advice is wrong. I spent 6 months testing every "expert" technique. Most of it is folklore. Here's what actually works (backed by real research):
English
40
78
669
153.6K
Ari Holtzman
Ari Holtzman@universeinanegg·
I am teaching a ~60 person class that involves a lot of Transformers and Language Modeling in the new year. What is the cheapest and easiest solution to getting my students just a bit of compute to play around with?
English
64
13
411
57K
John D. Patterson: jdpttrsn@bsky.social
@karpathy Skeptical view: This is baked into some providers’ tools bc they’re paid by the token. A next area for competition is for one (or more) providers to streamline token usage. As a user, I’m frankly tired of ‘thinking’ models using inane numbers of tokens for a standard response.
English
0
0
0
12
Andrej Karpathy
Andrej Karpathy@karpathy·
I'm noticing that due to (I think?) a lot of benchmarkmaxxing on long horizon tasks, LLMs are becoming a little too agentic by default, a little beyond my average use case. For example in coding, the models now tend to reason for a fairly long time, they have an inclination to start listing and grepping files all across the entire repo, they do repeated web searchers, they over-analyze and over-think little rare edge cases even in code that is knowingly incomplete and under active development, and often come back ~minutes later even for simple queries. This might make sense for long-running tasks but it's less of a good fit for more "in the loop" iterated development that I still do a lot of, or if I'm just looking for a quick spot check before running a script, just in case I got some indexing wrong or made some dumb error. So I find myself quite often stopping the LLMs with variations of "Stop, you're way overthinking this. Look at only this single file. Do not use any tools. Do not over-engineer", etc. Basically as the default starts to slowly creep into the "ultrathink" super agentic mode, I feel a need for the reverse, and more generally good ways to indicate or communicate intent / stakes, from "just have a quick look" all the way to "go off for 30 minutes, come back when absolutely certain".
English
762
754
10.3K
1M
Gary Marcus
Gary Marcus@GaryMarcus·
my feed right now
Gary Marcus tweet mediaGary Marcus tweet media
English
40
16
517
40.9K
John D. Patterson: [email protected] retweetledi
Andy J. Wills
Andy J. Wills@ajwills72·
3 x Funded Ph.D. positions at School of Psychology, University of Plymouth, UK. I am available for PhD supervision in my areas of expertise, as are many of my colleagues. Please RT. jobs.ac.uk/job/DHH203/tea…
English
1
8
6
1.3K
John D. Patterson: [email protected] retweetledi
Roger Beaty
Roger Beaty@Roger_Beaty·
Postdoc opportunity! Join the Cognitive Neuroscience of Creativity Lab @PennState for NSF-funded projects on AI, creativity assessment, & neuroimaging. Send CV & research interests to rebeaty@psu.edu. Job ad coming soon. Come do creativity science with us!
GIF
English
2
31
61
15.8K
John D. Patterson: jdpttrsn@bsky.social
@WitteKristin @toby_wise @docqhuys @cpilab Excited to give this a read! I was very surprised by the findings showcased in the tweeprint at first. After further consideration though it made more sense. If the current is bad (anx/depress), what’s the opportunity cost to shifting? Dep on severity, how much worse can it get?
English
0
0
1
88
Kristin Witte
Kristin Witte@WitteKristin·
Anxiety and depression might be related to increased, not decreased, exploration behaviour. What could be possible mechanisms underlying this? Beyond thrilled to share my very first first-author preprint with @toby_wise @docqhuys and @cpilab. osf.io/preprints/psya… 🧵⬇️ [1/n]
English
8
41
132
14.9K
John D. Patterson: [email protected] retweetledi
John D. Patterson: jdpttrsn@bsky.social
Metaphor is abstract and uniquely comprehensible to humans, right? In work led by @PaulVDiStefano3, we found that LLMs could be trained to predict human creativity judgments for metaphors (r >= .7)—on par with arguably less complex/abstract, creativity tasks—check it out!
Roger Beaty@Roger_Beaty

Here's our latest paper on automated creativity assessment, led by CNCL grad student @PaulVDiStefano3. We trained Large Language Models to predict human ratings of metaphor creativity, extending AI creativity scoring to figurative language. doi.org/10.1080/104004…

English
0
1
7
266
John D. Patterson: [email protected] retweetledi
cocktail peanut
cocktail peanut@cocktailpeanut·
Dalai Alpaca is here! cocktailpeanut.github.io/dalai Now you can run the Alpaca LLM on your computer (Mac, Windows, Linux) with just ONE command! Best of all, all you need is just around 4.2GB of disk space! Just run this:
cocktail peanut tweet media
English
56
320
1.6K
294.2K
John D. Patterson: [email protected] retweetledi
steven t. piantadosi
steven t. piantadosi@spiantado·
Large language models change everything for linguistics, starting with Chomsky. Featuring: LLMs as scientific theories, response to prior takes, "why" questions in language, acquisition... and how the field should have seen this coming. Paper is here: lingbuzz.net/lingbuzz/007180
steven t. piantadosi tweet media
English
95
395
2K
877.1K