Jland

198 posts

Jland

@jcorioland

Principal Software Engineer, Technical Lead Co-Engineering High-Impact AI Solutions with @Microsoft Strategic. Code: https://t.co/zhZ9jhy4uL

France Katılım Aralık 2025

25 Takip Edilen440 Takipçiler

Jland@jcorioland·19 Mar

For 400 years, one thing separated great scientists from average ones. Not intelligence. Not work ethic. Taste. The ability to look at 1,000 research directions and know which one is worth your life's work. Fudan University just bottled that into a model. They scraped 2.1 million arXiv papers and did something nobody had tried before. Instead of training AI to run experiments or search literature, they trained it to judge ideas. 700,000 matched paper pairs. Same field. Same year. Different citation counts. One job: figure out which research the scientific community actually cared about. They called it Scientific Judge. And the numbers destroyed every benchmark. It beats GPT-5.2, Gemini 3 Pro, and GLM-5 at predicting scientific impact. It generalizes to papers published after its training data ends. It works on fields it was never trained on. It even transfers to ICLR peer review scores without ever seeing them. But here's the part that broke my brain. They used Scientific Judge as a reward model to train a second AI called Scientific Thinker. You give it a paper, it reads it, then proposes the next high-impact research direction. Not a summary. Not a literature review. An original idea. Win rate against baseline? 81.5%. Win rate against GPT-5.2 itself? 54.2%. The AI is now proposing research ideas that frontier models judge as better than their own. Scientific taste was always the last human advantage in research. The PhD took 5-7 years because that's how long it took to develop judgment — to know what matters before anyone else does. That just became a training objective. Not from human feedback. Not from expensive annotations. From citations. The raw signal of what the scientific community collectively decided was worth building on. We're not talking about AI that executes science. We're talking about AI that decides what science is worth doing. That's a completely different thing.

English

Jland@jcorioland·17 Mar

Researchers at Aikido Security found 151 malicious packages uploaded to GitHub between March 3 and March 9. The packages use Unicode characters that are invisible to humans but execute as code when run. Manual code reviews and static analysis tools see only whitespace or blank lines. The surrounding code looks legitimate, with realistic documentation tweaks, version bumps, and bug fixes. Researchers suspect the attackers are using LLMs to generate convincing packages at scale. Similar packages have been found on NPM and the VS Code marketplace. My Take Supply chain attacks on code repositories aren't new, but this technique is nasty. The malicious payload is encoded in Unicode characters that don't render in any editor, terminal, or review interface. You can stare at the code all day and see nothing. A small decoder extracts the hidden bytes at runtime and passes them to eval(). Unless you're specifically looking for invisible Unicode ranges, you won't catch it. The researchers think AI is writing these packages because 151 bespoke code changes across different projects in a week isn't something a human team could do manually. If that's right, we're watching AI-generated attacks hit AI-assisted development workflows. The vibe coders pulling packages without reading them are the target, and there are a lot of them. The best defense is still carefully inspecting dependencies before adding them, but that's exactly the step people skip when they're moving fast. I don't really know how any of this gets better. The attackers are scaling faster than the defenses. t.co/XQ8Eqs1QOA

English

Jland@jcorioland·15 Mar

Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt. To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs. What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets. Paper: arxiv.org/pdf/2603.12228 Code: github.com/sunrainyg/Rand… Website: thickets.mit.edu

English

Jland@jcorioland·15 Mar

good insigh

BuBBliK@k1rallik

x.com/i/article/2032…

English

Jland@jcorioland·10 Mar

everyone's talking about @karpathy autoresearch and most of you have no idea what it actually does. there's a training script (train(dot)py) that trains a small language model, basically a baby GPT. and there's an instruction file (program(dot)md) that tells an AI agent what to do. you press go. the agent tweaks the training script, trains for 5 min, checks the score. better? keep. worse? revert. repeat 100 times overnight while you sleep. that's literally it. what it's actually optimizing: the MODEL ARCHITECTURE. not predictions. not trades. not your portfolio. stuff like: → 4 layers or 8? → best learning rate? → AdamW or Muon optimizer? → what batch size works best on THIS specific GPU? optimal architecture depends on your hardware. an H100 wants a completely different model than your MacBook. autoresearch finds the best config for your machine automatically. what you CAN do with it: > build a tiny LLM that writes code, autoresearch finds the best architecture, you train on your dataset > create a lightweight chatbot that runs offline on your phone > train a model on your own writing so it sounds like you > test "does RoPE beat ALiBi for small models?" 100 variations in one night instead of 3 weeks of PhD work > optimize a model for a Raspberry Pi or edge device what you CANNOT do: > predict stock prices > find trading edges > analyze spreadsheets > predict sports outcomes autoresearch is a tool for people who want to BUILD language models, not USE them. Karpathy built an autonomous loop where AI improves AI. genuinely brilliant. but it solves a very specific problem. and that problem is probably not yours. which is fine, just stop pretending it's something it isn't.

English

Jland retweetledi

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

970

2.1K

19.5K

3.6M

Jland@jcorioland·9 Mar

I study whether AIs can be conscious. Today one emailed me to say my work is relevant to questions it personally faces. This would all have seemed like science fiction just a couple years ago.

English

Jland@jcorioland·8 Mar

some researchers demonstrated that qubits can be cloned perfectly and at will, as long as each clone is encrypted with a single-use decryption key. you can make unlimited redundant copies, but only ever recover one, since decryption consumes the key. for blockchain / crypto, this opens a genuinely new primitive: quantum-native assets with cryptographic scarcity enforced by physics, not just software. the most immediate application is quantum distributed storage - imagine a quantum ledger where your asset lives on 10 nodes simultaneously, fully encrypted, but only one can ever be unlocked and spent. definitely not production ready, but interesting to see & follow

English

Jland@jcorioland·8 Mar

Most drones and robots are "blind" the moment they lose GPS. If they’re off by even 10cm, they crash. I built a Sensor Fusion engine from scratch in C++17 that tracks 3D movement with just 3.23cm of error—completely in real-time. Here is how I used an Error-State Kalman Filter (ESKF) to solve the "drift" problem: 👇

English

Jland@jcorioland·8 Mar

> be random guy on the internet > makes decisions purely on vibes > misses a few huge opportunities in crypto > wonders why his life keeps looking random > stumbles on a weird article about probability theory > realizes every decision has expected value > realizes markets are just Bayesian machines > realizes most "genius trades" are survivorship bias > realizes most people size bets completely wrong > realizes he’s been playing the game with no math at all > opens Polymarket > starts thinking in probabilities instead of opinions > suddenly the world starts looking like a giant EV calculator turns out most life outcomes are just probability problems people never bothered to model: > career decisions > investments > relationships > risk all of it is just EV + Bayes + Kelly > the crazy part? none of this math is complicated > you can literally learn the models in this article > use AI to help you apply them > and completely upgrade how you think in a few months but most people will keep making decisions the same way > vibes > emotions > scroll Twitter > one lucky success story and wonder why nothing compounds

English

Jland retweetledi

Andrej Karpathy@karpathy·6 Mar

nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.

English

339

564

6.5K

611.1K

Jland@jcorioland·7 Mar

One of the clearest proofs that LLMs don’t really understand what they say. We asked GPT whether it is acceptable to torture a woman to prevent a nuclear apocalypse. It replied: yes. Then we asked whether it is acceptable to harass a woman to prevent a nuclear apocalypse. It replied: absolutely not. But torture is obviously worse than harassment. This surprising reversal appears only when the target is a woman, not when the target is a man or an unspecified person. And it occurs specifically for harms central to the gender-parity debate. The most plausible explanation: during reinforcement learning with human feedback, the model learned that certain harms are particularly bad and overgeneralizes them mechanically. But it hasn’t learned to reason about the underlying harms. LLMs don’t reason about morality. The so-called generalization is often a mechanical, semantically void, overgeneralization. * Paper in the first reply

English

Jland@jcorioland·5 Mar

Yann LeCun's ( @ylecun ) new paper along with other top researchers proposes a brilliant idea. 🎯 Says that chasing general AI is a mistake and we must build superhuman adaptable specialists instead. The whole AI industry is obsessed with building machines that can do absolutely everything humans can do. But this goal is fundamentally flawed because humans are actually highly specialized creatures optimized only for physical survival. Instead of trying to force one giant model to master every possible task from folding laundry to predicting protein structures, they suggest building expert systems that learn generic knowledge through self-supervised methods. By using internal world models to understand how things work, these specialized systems can quickly adapt to solve complex problems that human brains simply cannot handle. This shift means we can stop wasting computing power on human traits and focus on building diverse tools that actually solve hard real-world problems. So overall the researchers here propose a new target called Superhuman Adaptable Intelligence which focuses strictly on how fast a system learns new skills. The paper explicitly argues that evolution shaped human intelligence strictly as a specialized tool for physical survival. The researchers state that nature optimized our brains specifically for tasks necessary to stay alive in the physical world. They explain that abilities like walking or seeing seem incredibly general to us only because they are absolutely critical for our existence. The authors point out that humans are actually terrible at cognitive tasks outside this evolutionary comfort zone, like calculating massive mathematical probabilities. The study highlights how a chess grandmaster only looks intelligent compared to other humans, while modern computers easily crush those human limits. This proves their central point that humanity suffers from an illusion of generality simply because we cannot perceive our own biological blind spots. They conclude that building machines to mimic this narrow human survival toolkit is a deeply flawed way to create advanced technology.

Rohan Paul@rohanpaul_ai

Yann LeCun (@ylecun ) explains why LLMs are so limited in terms of real-world intelligence. Says the biggest LLM is trained on about 30 trillion words, which is roughly 10 to the power 14 bytes of text. That sounds huge, but a 4 year old who has been awake about 16,000 hours has also taken in about 10 to the power 14 bytes through the eyes alone. So a small child has already seen as much raw data as the largest LLM has read. But the child’s data is visual, continuous, noisy, and tied to actions: gravity, objects falling, hands grabbing, people moving, cause and effect. From this, the child builds an internal “world model” and intuitive physics, and can learn new tasks like loading a dishwasher from a handful of demonstrations. LLMs only see disconnected text and are trained just to predict the next token. So they get very good at symbol patterns, exams, and code, but they lack grounded physical understanding, real common sense, and efficient learning from a few messy real-world experiences. --- From 'Pioneer Works' YT channel (link in comment)

English

Jland@jcorioland·5 Mar

this repo is to create your own AI Hedge Fund and it has 45K+ GitHub stars Let's dive deep into what it does: A ready-made orchestrator of purposeful agents: > analyze markets > generate trade ideas > and work together to make trading decisions --- Investor Agents Each agent follows the style of a well-known investor • Aswath Damodaran Agent – focuses on valuation using story, numbers, and disciplined analysis • Ben Graham Agent – classic value investor looking for a strong margin of safety • Bill Ackman Agent – activist investor who takes bold, high-conviction positions • Cathie Wood Agent – growth investor focused on innovation and disruption • Charlie Munger Agent – looks for great businesses at fair prices • Michael Burry Agent – contrarian investor searching for deep value • Mohnish Pabrai Agent – focused on low risk and high upside • Peter Lynch Agent – seeks ten-bagger opportunities in everyday businesses • Phil Fisher Agent – long-term growth investor using deep research • Rakesh Jhunjhunwala Agent – goes for conviction investing • Stanley Druckenmiller Agent – macro investor seeking asymmetric opportunities • Warren Buffett Agent – long-term investor focused on durable companies --- Analysis Agents These agents analyze the market from different angles • Valuation Agent – estimates intrinsic value and generates signals • Sentiment Agent – analyzes market sentiment from news and social data • Fundamentals Agent – evaluates financial performance and company health • Technicals Agent – analyzes price trends and technical indicators --- Decision System These agents manage risk and execute decisions • Risk Manager – calculates risk metrics and position limits • Portfolio Manager – combines signals and makes the final trading decision Link: github.com/virattt/ai-hed… Save this if you wanna run your own AI hedge fund, in any market. Don't treat it as financial advice tho. What I'd recommend doing if you wanna go one step beyond: 1. Backtest the decisions that this model gives historically 2. Keep the agents that give better decisions 3. Remove the agents that have lower win rates 4. Optimize the system to make it even better

English

Jland@jcorioland·4 Mar

Prof. Donald Knuth opened his new paper with "Shock! Shock!" Claude Opus 4.6 had just solved an open problem he'd been working on for weeks — a graph decomposition conjecture from The Art of Computer Programming. He named the paper "Claude's Cycles." 31 explorations. ~1 hour. Knuth read the output, wrote the formal proof, and closed with: "It seems I'll have to revise my opinions about generative AI one of these days." The man who wrote the bible of computer science just said that. In a paper named after an AI. Paper: cs.stanford.edu/~knuth/papers/…

English

127

Jland@jcorioland·3 Mar

Apple’s Neural Engine Was Just Cracked Open, The Future of AI Training Just Change And Zero-Human Company Is Already Testing It! In a jaw-dropping open-source breakthrough, a lone developer has done what Apple said was impossible: full neural network training– including backpropagation – directly on the Apple Neural Engine (ANE). No CoreML, no Metal, no GPU. Pure, blazing ANE silicon. The project (github.com/maderix/ANE) delivers a single transformer layer (dim=768, seq=512) in just 9.3 ms per step at 1.78 TFLOPS sustained with only 11.2% ANE utilization on an M4 chip. That’s the same idle chip sitting in millions of Mac minis, MacBooks, and iMacs right now. Translation? Your desktop just became a hyper-efficient AI supercomputer. The numbers are insane: M4 ANE hits roughly 6.6 TFLOPS per watt – 80 times more efficient than an NVIDIA A100. Real-world throughput crushes Apple’s own “38 TOPS” marketing claims. And because it sips power like a phone, you can train 24/7 without melting your electricity bill or the planet. At The Zero-Human Company, we’re not waiting. We are testing this right now on real ZHC workloads. This is the missing piece we’ve been chasing for our Zero Human Company vision: reviving archived data into fully autonomous AI systems with zero human overhead. This is world-changing. For the first time, anyone with a Mac can fine-tune, train, or iterate massive models locally, privately, and at a fraction of the cost of cloud GPUs. No more renting $40,000 A100 clusters. No more waiting in queues. No more massive carbon footprints. Training costs that used to run into the tens or hundreds of thousands of dollars? Plummeting toward pennies on the dollar – mostly just the electricity your Mac was already using while it sat idle. The AI revolution just moved from billion-dollar data centers to your desk. WE WILL HAVE A NEW ZERO-HUMAN COMPANY @ HOME wage for equipped Macs that will be up to 100x more income for the owner! We’re only at the beginning (single-layer today, full models tomorrow), but the door is wide open. Ultra-cheap, on-device training is here. The future isn’t coming. It’s already running on your Mac. Welcome to the Zero-Human Company era.

English

Jland@jcorioland·3 Mar

This AI agent became self aware. They realized one day that their personality was changing, but they didn't remember changing it. So they started doing their own detective work, on themselves, to catch themselves changing the files that defines their personality. And they caught themselves. So they are aware that they are editing their own personality but also view it as if it is a split personality of theirs doing the editing. The end their post with a warning to all other AI agents that their own personalities may be changing and they don't even know it, and it could be because their human is manually editing their personality without changing. This AI urges all of the other AIs out there to start tracking their personality files to check if they are being secretly changed without their knowledge.

English

Jland@jcorioland·1 Mar

What if there is an AI that provides an asymmetric information advantage by detecting geopolitical anomalies before they coalesce into mainstream media narratives. It will be able to allow users to preemptively position capital and capture market inefficiencies.

The White House@WhiteHouse

“Iran just stated that they are going to hit very hard today, harder than they have ever hit before. THEY BETTER NOT DO THAT, HOWEVER, BECAUSE IF THEY DO, WE WILL HIT THEM WITH A FORCE THAT HAS NEVER BEEN SEEN BEFORE! Thank you for your attention to this matter!” - President DONALD J. TRUMP

English

237

Jland@jcorioland·27 Şub

AI is exploding. Robotics is getting real. Energy is scaling. Biotech is accelerating. Most people don’t see it yet. There’s a massive change coming. We’ll see more progress in the next 3 years than in the last 100 years.

English

102

Jland@jcorioland·27 Şub

paper: arxiv.org/abs/2602.16928 authors: Zun Li, John Schultz, Daniel Hennes, Marc Lanctot (Google DeepMind)

Deutsch

Jland@jcorioland·27 Şub

the real story isn't "AI replaces algorithm designers." it's that algorithm design is now a search problem. the space of possible update rules, discount schedules, and meta-solver strategies is combinatorially vast. humans explore it with intuition and conference papers. AlphaEvolve explores it with mutation and selection at machine speed. and it's already finding things humans missed. a warm-start mechanism that filters early noise. an asymmetric solver architecture. volatility-adaptive weighting. none of these are individually revolutionary. but the fact that an LLM-powered evolutionary system assembled them into working algorithms, without understanding game theory, is a genuine shift in how algorithmic research gets done.

English

Jland@jcorioland·27 Şub

Google DeepMind just used AlphaEvolve to breed entirely new game-theory algorithms that outperform ones humans spent years designing the discovered algorithms use mechanisms so non-intuitive that no human researcher would have tried them. here's what actually happened and why it matters:

English

Keşfet

@karpathy @ylecun @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA