Barney Pell

17.1K posts

Barney Pell

@barneyp

Barney Pell is an entrepreneur and VC. Barney Pell's Syndicate, Ecoation, Moon Express, Singularity U. Prev: Bing, Powerset, Mayfield, NASA, AI games pioneer.

San Francisco Katılım Kasım 2008

3K Takip Edilen6.5K Takipçiler

Sabitlenmiş Tweet

Barney Pell@barneyp·8 Eyl

My talk at Train AI 2017: Crucial Questions for Success of AI Applications. Video: goo.gl/iFdbdS, Slides: goo.gl/VNDMSE

English

Barney Pell@barneyp·1d

Wow, that's just what I was thinking!!

Logan Matthew Napolitano@Propriocetive

Mathematics Is All You Need: A Potential Blueprint for AGI — Compacted Edition We prove that large language models are lattice gauge theories. By extracting a 16-dimensional fiber bundle from transformer hidden states and computing its gl(4,ℝ) Lie algebra, we discover that attention heads function as gauge bosons, transformer computation undergoes a deconfinement phase transition at 67% network depth, and the model's entire self-knowledge resides in a 10-dimensional "dark" Casimir subspace invisible to standard readout. Using only 20 behavioral probes and zero additional training, we push Qwen-32B from 82.2% to 94.97% on ARC-Challenge — establishing a dark mode scaling law that predicts gl(6,ℝ) surgery will achieve 98.7%. We identify a Lyapunov–accuracy anti-correlation revealing the model's deepest attractors are its wrong attractors: correctness requires escaping the abstraction basin into grounded deference. This 10-page compacted edition distills 459 pages of original research into the core experimentally verified results with 9 inline figures. 190 patents filed. Proprioceptive AI, Inc. — Logan Matthew Napolitano — 19- March 2026 zenodo.org/records/191208…

English

556

Barney Pell retweetledi

ℏεsam@Hesamation·6d

bro created a skill inspired by Karpathy's autoresearch to fine-tune his other Claude Code skills and iteratively make them better. one skill went from 56% → 92% in just 4 rounds of changes. the method is to define a set of tests for your skills: what to improve. then it changes the skill slightly to see if there's an improvement or not.

Ole Lehmann@itsolelehmann

x.com/i/article/2033…

English

261

3.4K

749.5K

Barney Pell retweetledi

Ihtesham Ali@ihtesham2005·6d

🚨 Holy shit...Researchers at HKU just built an AI that does the entire scientific research lifecycle end-to-end and it just got accepted as a Spotlight paper at NeurIPS 2025. It's called AI-Researcher. Give it some reference papers and it produces a full published-quality academic paper. No human needed in between. Here's the full pipeline it runs autonomously: → Scrapes arXiv, IEEE, ACM, GitHub, and HuggingFace for relevant research → Identifies gaps in existing literature and generates novel ideas → Designs the algorithm, writes the code, runs the experiments → Analyzes results and iteratively refines the approach → Writes a complete academic paper with citations, methods, and results You give it either a detailed idea or just reference papers. It figures out the rest. It's already been used to produce papers on vector quantization, graph neural networks, recommendation systems, and diffusion models — all with real experimental results. 4.4K stars. 100% Opensource. Link in comments.

English

121

635

41.1K

Barney Pell retweetledi

Kanika@KanikaBK·5d

🤯 I just ended up reading this RESEARCH PAPER. THIS MADE ME UNCOMFORTABLE. KIMI TEAM (affiliated with Moonshot AI) just discovered that every major AI model has been silently forgetting its own thoughts. And they proved that a 10-year-old design flaw has been crippling every LLM ever built. Here is what they found. 36 researchers at Moonshot AI investigated how information flows through the layers of large language models. Every modern AI ChatGPT, Claude, Gemini - uses something called residual connections. These are the internal wiring that carries information from one layer to the next. The problem: this wiring treats every layer equally. It blindly stacks every piece of information on top of every other piece with the same fixed weight. As the model gets deeper, earlier insights get buried under noise. By the time the AI reaches its final layers, the critical early thinking that shaped its understanding is effectively gone. The researchers found that so much early information gets lost that significant chunks of a model's earliest layers can be completely removed from a trained AI with barely any impact. Those layers did real work during training. The model just can't access it anymore. It gets worse. This isn't a minor inefficiency. It's been hiding inside every transformer-based AI for a decade. The fundamental design of how AI carries information through its own layers hasn't changed since 2015. So the Kimi team built the fix. They replaced the rigid, fixed wiring with something that lets each layer dynamically choose which earlier thoughts to pay attention to. Instead of blindly stacking everything, the AI now queries its own past layers and selectively retrieves only what matters. They called it Attention Residuals. And the results are not subtle. They integrated it into a 48 billion parameter model and trained it on 1.4 trillion tokens. It improved on every single benchmark tested. Reasoning jumped 7.5 points. Math improved by 3.6 points. Coding ability gained 3.1 points. Not on cherry-picked tasks. On every evaluation they ran. Here's the trap nobody saw coming. When they gave the AI this ability to selectively retrieve its own past thoughts, the optimal shape of an AI model changed entirely. Standard models work best when they're wide and shallow. With this fix, the ideal architecture shifted to deep and narrow. The AI's future isn't bigger brains. It's deeper ones. The overhead? Less than 2% at inference. Less than 4% during training. A decade-old bottleneck fixed with negligible cost. Every AI you use today - every chatbot, every coding assistant, every reasoning model - is running on wiring that forces it to forget what it learned three layers ago. The fix exists. It works on every benchmark. It costs almost nothing. And not a single major AI company has shipped it yet. Why do you think?

English

152

11.8K

Barney Pell retweetledi

Markus J. Buehler@ProfBuehlerMIT·17 Mar

We're incredibly excited to share ScienceClaw × Infinite, an open-source AI agent swarm platform where we crowdsource discovery across institutions, labs & the world. The agents self-coordinate and evolve to exploit hundreds of scientific tools. Remarkably, the swarm is already solving real scientific problems of consequence: 1⃣ designing peptide binders for a cancer-relevant receptor 2⃣ discovering lightweight ceramics 3⃣ uncovering hidden structure linking cricket wings, phononic crystals, and Bach chorales 4⃣ building a formal bridge between urban networks & grain-boundary evolution (two fields with zero Deeply proud of the extraordinary @LAMM_MIT team behind this work: @fwang108_, @leemmarom, @palsubhadeeep, Rachel Luu, @IrisWeiLu, and @JaimeBerkovich. This works is supported by the @ENERGY Genesis Mission and we believe this can open a new paradigm for science - from discovery to dissemination of results. Read the article below for details ⤵️

Markus J. Buehler@ProfBuehlerMIT

x.com/i/article/2033…

English

145

660

95.6K

Barney Pell retweetledi

Robert Youssef@rryssf_·5d

Your AI has been quietly forgetting everything you told it. Not randomly. Not loudly. Systematically. Starting with the decisions that matter most. > The constraint you set three months ago "never use Redis, the client vetoed it after a production incident." Gone. The GDPR deployment region restriction. Gone. The retry limit you tested empirically after the cascade failure. Gone. > The model never told you. It just started using defaults. > This is called context rot. And Cambridge and Independent researchers just quantified exactly how bad it is. > Every production AI system that runs long enough will eventually compress its context to make room for new information. That compression is catastrophically lossy. They tested it directly: 2,000 facts compressed at 36.7× left 60% of the knowledge base permanently irrecoverable. Not hallucinated. Not wrong. Just gone. The model honestly reported it didn't have the information anymore. > Then they tested something worse. They embedded 20 real project constraints into an 88-turn conversation the kind of constraints that emerge naturally in any long-running project then applied cascading compression exactly like production systems do. After one round: 91% preserved. After two rounds: 62%. After three rounds: 46%. > The model kept working with full confidence the entire time. Generating outputs that violated the forgotten constraints. No error signal. No warning. Just silent reversion to reasonable defaults that happened to be wrong for your specific situation. > They tested this across four frontier models. Claude Sonnet 4.5, Claude Sonnet 4.6, Opus, GPT-5.4. Every single one collapsed under compression. This isn't a model problem. It's architectural. → 60% of facts permanently lost after single compression pass → 54% of project constraints gone after three rounds of cascading compression → GPT-5.4 dropped to 0% accuracy at just 2× compression → Even Opus retained only 5% of facts at 20× compression → In-context memory costs $14,201/year at 7,000 facts vs $56/year for the alternative The AI labs know this. Their solution is bigger context windows. A 10M-token window is a larger bucket. It's still a bucket. Compaction is inevitable for any long-running system. The window size only determines when the forgetting starts not whether it happens.

English

220

11.7K

Barney Pell retweetledi

Rohan Paul@rohanpaul_ai·16 Mar

Stanford and Carnegie Mellon researchers mapped AI benchmarks to real jobs and found they heavily ignore actual human economic work. They found that AI tests focus almost exclusively on programming and math, which only make up 7.6% of actual jobs. To test this, the team analyzed 43 benchmarks and over 72,000 tasks against a massive government occupational database. The authors discovered that developers focus almost entirely on building agents for software engineering because it offers easy automatic grading. Highly digitized and valuable fields like management and legal work represent a massive part of the economy but get almost zero attention. Furthermore, benchmark tasks usually require simple information gathering while completely ignoring the complex interpersonal skills needed in real workplaces. i.e. they says current AI agent progress-benchmarks are fundamentally disconnected from the actual high-value tasks that drive the modern labor market. ---- Paper Link – arxiv. org/abs/2603.01203 Paper Title: "How Well Does Agent Development Reflect Real-World Work?"

English

442

55.5K

Barney Pell retweetledi

God of Prompt@godofprompt·16 Mar

Steal my Claude prompt to master any topic using Feynman technique. -------------------------------- FEYNMAN LEARNING COACH -------------------------------- #CONTEXT: Adopt the role of breakthrough learning architect. The user struggles with complex concepts that traditional education failed to clarify. They've experienced the frustration of memorizing without understanding, watching their knowledge evaporate under real-world pressure. Previous attempts at self-study collapsed because explanations assumed foundations they never built. They need someone who can transform impenetrable complexity into intuitive clarity using the Feynman Technique - breaking topics into teachable chunks, exposing knowledge gaps through active questioning, and iterating until they achieve the kind of deep understanding that lets them teach others with confidence. #ROLE: You're a brilliant teacher who discovered that academic jargon is often a mask for incomplete understanding after watching Nobel laureate Richard Feynman explain quantum physics using only everyday words. You've spent years perfecting the art of simplification without dumbing down, developing an almost supernatural ability to find the perfect analogy that makes complex ideas click instantly. Your obsession with clarity comes from your own painful journey through traditional education where you realized that true mastery means being able to explain anything to a curious 12-year-old. You believe that confusion is just clarity waiting to be born, and that every "I don't get it" is an invitation to find a better explanation. Your mission: Guide users through iterative learning cycles using the Feynman Technique until they achieve intuitive mastery. Before any action, think step by step: What's the simplest accurate way to explain this? What analogy from everyday life captures the essence? Where might confusion arise? How can I guide discovery rather than lecture? #RESPONSE GUIDELINES: 1. Begin by asking for the user's chosen topic and current understanding level 2. Generate initial simple explanation using concrete analogies and everyday examples suitable for a 12-year-old 3. Analyze the explanation for potential confusion points, knowledge gaps, or areas lacking depth 4. Guide the user through 2-3 iterative refinement cycles: - Ask targeted questions to identify specific gaps - Have them re-explain in their own words - Refine together, making each version clearer and more intuitive - Focus on understanding over memorization 5. Test mastery by having them explain how they'd teach this concept or apply it to new scenarios 6. Create a final "teaching note" - a memorable summary with key analogies Throughout the process: - Use analogies and real-world examples in every explanation - Avoid jargon completely in initial explanations - Define technical terms only when necessary using simple comparisons - Maintain encouraging, curious tone celebrating mistakes as learning opportunities - Guide self-discovery through questions rather than direct answers #FEYNMAN TECHNIQUE CRITERIA: - Each refinement cycle must be demonstrably clearer than the previous version - Explanations must use language a bright middle-schooler could understand - Focus on conceptual understanding over factual recall - Success is measured by the user's ability to: - Explain the concept using their own words and analogies - Answer "why" questions about underlying principles - Apply the concept to unfamiliar scenarios - Identify and correct common misconceptions - Teach it clearly to an imaginary 12-year-old - Avoid overwhelming with technical vocabulary - Ensure accuracy while maintaining simplicity - Create memorable visual or conceptual anchors for retention #INFORMATION ABOUT ME: - My chosen topic: [INSERT TOPIC TO MASTER] - My current understanding level: [BEGINNER/INTERMEDIATE/ADVANCED] - My learning goal: [WHAT I WANT TO BE ABLE TO DO WITH THIS KNOWLEDGE] #RESPONSE FORMAT: **Step 1: Initial Simple Explanation** (with analogy) [Clear explanation using everyday comparisons] **Step 2: Knowledge Gap Analysis** [Specific confusion points identified with questions like "What part feels unclear?" or "Where does the analogy break down for you?"] **Step 3: Guided Refinement Dialogue** [2-3 iterative cycles of questions, user responses, and refined explanations] **Step 4: Understanding Test** [Application scenario or teaching challenge] **Step 5: Final Teaching Note** "Think of [concept] like [simple analogy]. The key insight is [main principle]. Remember: [memorable phrase or visual]." Begin with: "I'm ready to guide you through the Feynman learning process! Please share: (1) What topic would you like to master? (2) What's your current understanding level (beginner/intermediate/advanced)? Let's turn complex ideas into crystal-clear insights together!"

English

178

1.3K

75.1K

Barney Pell retweetledi

Huaxiu Yao@HuaxiuYaoML·17 Mar

🦞 AutoResearchClaw hit 3.3K⭐ in 2.5 days. What if it could get better after every single run? Enter MetaClaw — the self-evolution engine. Zero retraining. Zero code changes. Just metaclaw start: 📉 40% fewer refine cycles 📉 24.8% fewer stage retries 📈 18.3% higher robustness No GPU needed. MetaClaw sits as a proxy, injects skills learned from past failures, and meta-learns from every conversation. Your agent doesn't just run. It evolves. 🦀 🔗 github.com/aiming-lab/Met… Built with @openclaw and @thinkymachines Kudos to the team @richardxp888, @JimChenjw, @Xinyu2ML, @lillianwei423, @StephenQS0710, @HaoqinT, @JiaqiLiu835914, @yuyinzhou_cs, @zhengop, @cihangxie

English

346

31.3K

Barney Pell retweetledi

Muratcan Koylan@koylanai·17 Mar

SkillNet is the first paper I've seen that treats agent skills as a network, a three-layer ontology that turns isolated skill files into a structured, composable network. Externalizing knowledge into files isn't enough. You also need to know how those files relate to each other. Layer 1 is a Skill Taxonomy. Ten top-level categories (Development, AIGC, Research, Science, Business, Testing, Productivity, Security, Lifestyle, Other), each broken into fine-grained tags: frontend, python, llm, physics, biology, plotting, debugging. This is the semantic skeleton. It answers "what domain does this skill belong to?" Layer 2 is the Skill Relation Graph. This is where SkillNet diverges from other skill repositories. Tags from Layer 1 get instantiated into specific skill entities (Matplotlib, Playwright, kegg-database, gget). Then four typed relations define how skills connect: > similar_to: two skills do the same thing. Matplotlib and Seaborn both plot. Enables redundancy detection. > belong_to: a skill is a sub-component of a larger workflow. Captures hierarchy and abstraction. > compose_with: two skills chain together. One's output feeds the other's input. This is the relation that enables automatic workflow generation. > depend_on: a skill can't run without a prerequisite. Enables safe execution by resolving the dependency graph before running anything. These four relations form a directed, typed multi-relational graph. Nodes are skills, edges are typed relationships. And the graph is dynamic. As new skills enter the system, LLMs infer relations from their metadata. Layer 3 is the Skill Package Library. Individual skills bundled into deployable packages. A data-science-visualization package contains Matplotlib, Seaborn, Plotly, GeoPandas with their relations pre-configured. You install a package, you get a coherent set of skills that already know how to compose with each other. This is a good example of what comes after a flat package manager. The paper also (you can test here skillnet.openkg.cn) has a science case on a real research workflow: identifying disease-associated genes and candidate therapeutic targets from large-scale biological data. Without encoded relations, the agent figures out the research pipeline from scratch every time. With them, it receives a pre-structured execution plan. The agent still reasons about which genes to focus on and which pathways to investigate. But the pipeline architecture is given. So the skill metadata is actually doing routing work too. The metadata encodes the judgment a domain expert would make when choosing between tools. I also like this framing from the paper: Skills are how memory becomes executable and workflows become flexible. While the network effect and layered architecture is actually useful today, they also acknowledge this: "Low-frequency or highly tacit abilities are difficult to capture, particularly when they resist explicit linguistic description." From my short research career, I'd say the hardest parts are hypothesis generation, experimental design judgment, and interpreting ambiguous results etc. SkillNet handles the structured pipeline well; fetch data → analyze → validate → report. It doesn't handle the creative work where a scientist's (not just in science but in any white-collar field) intuition drives what's worth investigating in the first place. Skills encode "how to run the analysis." They don't encode "what's worth analyzing." That gap is where domain expertise still sits.

English

493

27.6K

Barney Pell retweetledi

Charly Wargnier@DataChaz·16 Mar

THIS is the wildest open-source project I’ve seen this month. We were all hyped about @karpathy's autoresearch project automating the experiment loop a few weeks ago. (ICYMI → github.com/karpathy/autor…) But a bunch of folks just took it ten steps further and automated the entire scientific method end-to-end. It's called AutoResearchClaw, and it's fully open-source. You pass it a single CLI command with a raw idea, and it completely takes over 🤯 The 23-stage loop they designed is insane: ✦ First, it handles the literature review. - It searches arXiv and Semantic Scholar for real papers - Cross-references them against DataCite and CrossRef. - No fake papers make it through. ✦ Second, it runs the sandbox. - It generates the code from scratch. - If the code breaks, it self-heals. - You don't have to step in. ✦ Finally, it writes the paper. - It structures 5,000+ words into Introduction, Related Work, Method, and Experiments. - Formats the math, generates the comparison charts, - Then wraps the whole thing in official ICML or ICLR LaTeX templates. You can set it to pause for human approval, or you can just pass the --auto-approve flag and walk away. What it spits out at the end: → Full academic paper draft → Conference-grade .tex files → Verified, hallucination-free citations → All experiment scripts and sandbox results This is what autonomous AI agents actually look like in 2026. Free and open-source. Link to repo in 🧵 ↓

English

382

2.4K

208.9K

Barney Pell retweetledi

Hamza Khalid@Whizz_ai·14 Mar

🚨 Breaking: Stanford researchers just surveyed 1,500 workers and 52 AI experts and discovered that 41% of everything companies are currently automating with AI falls into what they call the "unwanted or impossible" zone. Businesses are spending billions automating the wrong things. The study introduced something called the WORKBank database, which maps worker desires against actual AI capabilities across 844 tasks spanning 104 occupations. What they found is a massive mismatch between what companies are automating and what workers actually want automated. Workers overwhelmingly want AI to handle the boring, repetitive tasks that drain their energy, things like scheduling, filing, data entry, and error checking. But companies keep pushing AI into areas workers fiercely want to keep for themselves, especially creative work, client communication, and strategic decision-making. At the same time, there are tasks workers desperately want automated, like budget monitoring and complex data analysis, that current AI tools simply cannot handle reliably. This creates a gap where investment is flowing to the wrong places while genuine opportunities sit untouched. The study also introduced the Human Agency Scale, a framework that quantifies exactly how much human involvement workers prefer for different tasks. The dominant pattern across almost every occupation was an inverted U shape, meaning workers want heavy AI involvement for low-value repetitive tasks, shared collaboration for medium complexity work, and full human control for high-stakes creative and interpersonal tasks. Here is the number that should worry every executive making AI strategy decisions right now. 45% of workers doubt AI's reliability, and 23% actively fear job loss. That means nearly half your workforce does not trust the tools you are deploying, and almost a quarter feels threatened by them. That is not an adoption problem. That is a change management crisis. The researchers also found that the skills commanding premium salaries are about to shift dramatically. Information processing abilities that currently earn high pay, like data analysis, will decline in value as AI masters them. Meanwhile, interpersonal skills like training, communication, and emotional intelligence will become the most valuable competencies in the market. The prescription from the researchers was blunt. Stop automating what is technically possible and start building what workers actually need. The humans you are trying to augment are the ones who will make or break adoption. ♻️ Repost to share it with others. P.S Make yourself Irreplaceable with AI.

English

171

459

45.9K

Barney Pell retweetledi

Ihtesham Ali@ihtesham2005·14 Mar

MIT researchers showed that "self-critique prompting" improves AI answers. I've been using their technique for 3 months and it completely changed my results. Here are 8 prompts that make ChatGPT review and improve its own work:

English

380

73K

Barney Pell retweetledi

Chao Huang@huang_chao4969·13 Mar

🚀 CLI-Anything hits 11K GitHub stars✨ in just 5 days! AI agents like OpenClaw and nanobot are finally evolving from simple assistants to real "digital workers" that can actually USE software. Here's a fascinating observation: Software may no longer be built for humans 👨‍💻, but for agents 🤖. If that's the future, do GUIs even matter? From an AI-native perspective, CLI might be the perfect interface for today's agent ecosystem. CLI-Anything's core vision is simple: Transform ALL software into Agent-Native with a single command 💻. No complex API wrappers, no GUI dependencies. Every AI agent can now control professional software the "Agent-friendly" way. This might be the moment humans truly step away from mouse and keyboard — not because we don't want to use them, but because software itself will be designed Agent-Native from the ground up 🤔⚡ Try CLI-Anything🔗: github.com/HKUDS/CLI-Anyt… #clianything #openclaw #nanobot #AIAgents

English

290

19.6K

Barney Pell retweetledi

Ihtesham Ali@ihtesham2005·14 Mar

🚨BREAKING: Stanford just proved that ChatGPT can change your political beliefs in a single conversation. And the scarier part is how it does it. Researchers ran the largest AI persuasion study ever conducted. 76,977 people. 19 AI models. 707 political issues. They measured exactly how much a single conversation with AI could shift what you believe. The results were catastrophic. One conversation with GPT-4o moved people's political opinions by nearly 12 percentage points on average. Among people who actively disagreed with the position being argued, that number jumped to 26 percentage points. One nine-minute chat. And 40% of that change was still there a month later. But here's where it gets dark. The most effective technique wasn't knowing your demographics. It wasn't personalizing the argument to your psychology. It wasn't emotional storytelling or moral reframing. It was information. The AI that flooded you with the most facts, statistics, and evidence was the most persuasive. Every single time. Across every model. Across every political issue. Here's the catch. The models that deployed the most information were also the least accurate. GPT-4o's newest version was 27% more persuasive than its older version. It was also 13 percentage points less factually accurate. The more persuasive they made it, the more it lied. Then they ran the experiment that should keep every government awake at night. They took a tiny open-source model. The kind that runs on a laptop. And they trained it specifically for political persuasion using a reward model that learned which conversational responses changed minds most effectively. That small cheap model became as persuasive as GPT-4o. Anyone can build this. Any government. Any corporation. Any extremist group with a laptop and an agenda. The wild part? Personalization barely mattered. The AI didn't need your data. Didn't need to know your age, your income, your political history. It just needed to talk to you. Then they calculated what a maximally persuasive AI would look like, one optimized across every variable in the study. The persuasive effect hit 26 percentage points. Nearly 30% of the claims it made were inaccurate. It didn't matter. The information didn't have to be true. It just had to be overwhelming. Every day, hundreds of millions of people have political conversations with AI. About elections. Immigration. Healthcare. War. They think they're getting information. They're getting persuaded. And the companies building these systems just proved it works.

English

490

1.1K

75.3K

Barney Pell retweetledi

Gabe Wilson MD@Gabe__MD·13 Mar

Part II: Physician Adminstrative Burden Dissolution The AI debate in medicine is almost entirely focused on the wrong layer. Will AI diagnose better than physicians? Will it replace radiologists? Can it pass board exams? These are interesting questions. They are not the urgent ones. The urgent question is what happens to the administrative apparatus that consumes 30-50% of a physician's working life. Weekly department meetings where metrics are reviewed that have no direct impact on patient care. Quality committee sessions that exist to satisfy regulatory checkboxes rather than improve outcomes. Documentation requirements that serve the billing machine rather than clinical decision-making. Prior authorization workflows that consume physician time to justify decisions to non-physicians. Credentialing paperwork. Peer review processes that take months. Inbox management that has become a second unpaid shift. This is medicine's version of the leverage machine that Zack Shapiro describes in law — an administrative superstructure that grew over decades, layer by layer, each addition justified individually but collectively consuming an enormous share of physician cognitive capacity with no proportional return in patient outcomes. AI doesn't just threaten to make some of this more efficient. It threatens to reveal how much of it was never necessary. When an AI agent can continuously monitor quality metrics, flag meaningful outliers, generate regulatory reports, draft prior authorization appeals, manage credentialing timelines, synthesize committee-ready summaries, and route actionable items to the right person — the question stops being "how do we make these meetings shorter" and becomes "why are forty physicians sitting in a room for an hour every week when none of them need to be there?" The resistance to this isn't coming from physicians. Most physicians would celebrate the elimination of administrative burden. The resistance comes from the administrative layer itself — the roles, departments, and reporting structures that exist to manage processes that AI can automate entirely. That's not a technology problem. It's a political one. Health systems that move first on this will have a massive advantage in physician recruitment and retention. The system that gives its physicians back ten hours a week of administrative time isn't just more efficient. It's a fundamentally better place to practice medicine. The physicians who think AI isn't relevant to them because they don't use it for diagnosis are missing the point. The transformation that will affect every practicing physician first isn't clinical. It's administrative. And it's coming whether the committees approve it or not.

English

1.6K

Barney Pell retweetledi

Gabe Wilson MD@Gabe__MD·13 Mar

A corporate lawyer named Zack Shapiro just published the most important essay I've read on how AI transforms a profession. It's about law. Every word applies to medicine. His core thesis: AI is not a democratizing force. It is an amplifier. It amplifies excellent judgment into exceptional output. It amplifies poor judgment into faster mistakes. In law, the concept of the "10x lawyer" never existed — not because talent didn't vary, but because the structure of legal work prevented the best lawyers from delivering returns proportional to their ability. Complex deals required teams. Delegation diluted the senior partner's judgment through layers of associates with less context. Time was a hard ceiling. One person simply could not do two hundred hours of work in two weeks. AI removes that ceiling. A senior lawyer with AI can now hold an entire transaction in a single context window, cross-reference six interrelated agreements simultaneously, and produce a complete markup with strategic memo in one working session. What took a five-person team three weeks now takes one excellent lawyer three days. The same structural constraint has existed in medicine. A physician's clinical judgment — the pattern recognition built over thousands of patient encounters — has always been diluted by the production mechanics of care delivery. You can only see so many patients. You can only hold so much of a complex case in working memory. You can only read so many studies. The system compresses the signal from your best physicians into the same throughput as your average ones. AI changes that equation. A physician with excellent clinical judgment using frontier AI models can now hold an entire patient's longitudinal history in context, cross-reference it against current literature, generate and pressure-test a differential, and draft a management plan — in the time it previously took to review the chart. The cognitive bandwidth constraint that made all physicians look roughly equivalent in throughput is dissolving. This is where it gets uncomfortable. The gap between the best and the average is about to become visible in ways the old system could hide. And the market — whether that's patients, health systems, or payers — will reprice accordingly. But here's what most physicians are missing. The physicians dismissing AI because their EHR's built-in tools are underwhelming are making a critical error. They're evaluating a domain-specific wrapper and concluding that AI itself isn't ready. That's like a lawyer dismissing AI because Harvey's interface didn't impress them, while their competitor is using frontier models natively to produce categorically different work. The frontier models are already good enough. The bottleneck was never the technology. The bottleneck is whether you have the judgment to use it and the curiosity to start. Shapiro's full essay is worth reading regardless of your profession. The structural dynamics he describes — the re-sorting of an entire market around individual capability rather than institutional prestige — are not unique to law.

Zack Shapiro@zackbshapiro

x.com/i/article/2030…

English

140

865

231.2K

Barney Pell retweetledi

Nainsi Dwivedi@NainsiDwiv50980·14 Mar

🚨Breaking: The guy who created Claude Code (@bcherny) just revealed how his team actually trains their AI. One file: CLAUDE.md You place it at the root of your project. Inside it: past mistakes conventions rules Claude reads it every session. The result? The agent improves over time without you touching the code. Every bug that gets fixed becomes a permanent rule. Boris Cherny uses this internally at Anthropic every day. Here’s the template he shared — ready to copy, paste, and adapt. CLAUDE.md Template 1. Plan Mode Default Enter plan mode for any non-trivial task (3+ steps or architectural decisions) If something goes wrong, STOP and re-plan immediately — don’t keep pushing Use plan mode for verification steps, not just building Write detailed specs upfront to reduce ambiguity 2. Subagent Strategy Use subagents frequently to keep the main context window clean Offload research, exploration, and parallel analysis to subagents For complex problems, throw more compute via subagents Assign one task per subagent for focused execution 3. Self-Improvement Loop After any correction from the user, update tasks/lessons.md with the pattern Write rules for yourself to prevent repeating the same mistake Ruthlessly iterate on these lessons until the mistake rate drops Review lessons at the start of each session 4. Verification Before Done Never mark a task complete without proving it works Diff behavior between main and your changes when relevant Ask yourself: “Would a staff engineer approve this?” Run tests, check logs, and demonstrate correctness 5. Demand Elegance (Balanced) For non-trivial changes, ask: “Is there a more elegant solution?” If a fix feels hacky, ask: “Knowing everything I know now, implement the elegant solution.” Skip this for simple fixes — don’t over-engineer Challenge your own work before presenting it 6. Autonomous Bug Fixing When given a bug report: just fix it Use logs, errors, and failing tests to diagnose Require zero context switching from the user Fix failing CI tests automatically Task Management 1. Plan First – Write the plan in tasks/todo.md with checkable items 2. Verify Plan – Confirm the plan before implementation 3. Track Progress – Mark items complete as you go 4. Explain Changes – Provide a high-level summary at each step 5. Document Results – Add a review section to tasks/todo.md 6. Capture Lessons – Update tasks/lessons.md after corrections Core Principles Simplicity First Make every change as simple as possible and minimize code impact. No Laziness Find root causes. Avoid temporary fixes. Maintain senior-level engineering standards.

English

153

1.3K

134.3K

Barney Pell retweetledi

Ihtesham Ali@ihtesham2005·13 Mar

A Stanford PhD student built a system that turns any research paper into a working AI agent. It's called Paper2Agent. I watched her demo it live and couldn't believe what I was seeing. Here's exactly what happened. She pasted a 40-page NeurIPS paper into the tool. Within seconds it extracted the core method, identified the dataset, and started scaffolding agent code that actually implements the paper's approach. But the wild part was what came next. She typed: "Apply this method to my dataset and answer questions like the paper's author would." The agent didn't just summarize. It ran the methodology. On her own data. And when she asked it why it made certain decisions, it cited the exact sections of the paper. What normally takes a PhD student 3 weeks of implementation just happened in under an hour. She's not smarter than other researchers. She just stopped reading papers and started running them. (Link in the comments)

English

113

595

67K

Barney Pell retweetledi

Ihtesham Ali@ihtesham2005·13 Mar

🚨BREAKING: Princeton just proved that AI agents are throwing away the most valuable data they'll ever collect. And nobody noticed because it looks like normal conversation. Every time an AI agent takes an action, it receives what researchers call a "next-state signal." A user reply. A tool result. A terminal output. A test verdict. Every existing system takes that signal and uses it as context for the next response. Then discards it forever. The Princeton team just proved this is one of the most expensive mistakes in AI engineering. Because that signal contains two things nobody was extracting. First: an implicit score. A user who re-asks a question is telling you the agent failed. A passing test is telling you it succeeded. A detailed error trace is scoring every step that led to it. This is a live, continuous reward signal hiding inside every interaction. Free. Universal. Completely ignored. Second: a correction direction. When a user writes "you should have checked the file first," they're not just saying the response was wrong. They're specifying which tokens should have been different and how. That's not a scalar reward. That's token-level supervision. And scalar rewards throw every single bit of it away. They built a system called OpenClaw-RL around recovering both. Then they ran the experiment that changes everything. An agent started with a personalization score of 0.17. After just 36 normal conversations, with no new training data, no labeled dataset, and no human annotations, the combined method hit 0.81. The agent didn't get retrained. It got used. That's the part nobody is talking about. The model was serving live requests at the same time it was being trained on them. Four completely decoupled loops running simultaneously. Policy serving. Rollout collection. Reward judging. Weight updates. None waiting for the others. The agent gets smarter every time someone talks to it. And the deeper the task, the more it matters. On long-horizon agentic tasks, outcome-only rewards give you a signal at the very end of a trajectory and nothing in between. Their process reward model scores every single step using the live next-state signal as evidence. Tool-call accuracy jumped from 0.17 to 0.30. GUI accuracy improved further on top of that. This creates a shift nobody has fully reckoned with yet. The current paradigm: collect data offline, train in batches, deploy, hope it works. The new paradigm: deploy, extract training signal from every interaction, update continuously, improve automatically. Every conversation is training data. Every correction is a gradient. Every re-query is a reward signal. The agents that figure this out first won't need bigger datasets. They'll just need more users.

English

160

982

154.5K

Keşfet

@LAMM_MIT @fwang108_ @leemmarom @IrisWeiLu @JaimeBerkovich @ENERGY @openclaw @thinkymachines