dave jan

258 posts

dave jan

@prometx3

Austria Katılım Mayıs 2013

342 Takip Edilen51 Takipçiler

dave jan@prometx3·9 May

@mitsuhiko @rachelnabors @badlogicgames @antirez Its so great! I can never go back anymore

English

133

Armin Ronacher ⇌@mitsuhiko·9 May

@rachelnabors @badlogicgames @antirez Nanotexture in Europe is a must for full euromaxxing.

English

20.2K

Mario Zechner@badlogicgames·8 May

recommended reading by our junion developer @mitsuhiko my name is pidalf, and i support this message. would love to have some spare cycles to join @antirez effort and use some of my GPU knowledge. alas. lucumr.pocoo.org/2026/5/8/local…

English

308

23.3K

dave jan@prometx3·7 May

@mitsuhiko I dont even know how to force the clanker to blow it up like this, thats actually a skill 😂

English

111

Armin Ronacher ⇌@mitsuhiko·7 May

This one is only 1370 files but impressive 1.4 million lines of added lines of code. Also somehow more than a 1000 commits.

English

9.7K

Armin Ronacher ⇌@mitsuhiko·7 May

A selection of great PRs that were submitted to Pi — a thread.

English

414

56.3K

dave jan@prometx3·7 May

@aliouftw @shirtwascash Interesting workflow. Do you use any extension to rollback codechanges that might have occured while investigating the bug? As far as I know there is no built in way in PI to revert changes right?

English

aliou@aliouftw·7 May

it really depends on how you work. I usually use trees when I'm investigating bugs: let's say you have multiple you have a bug but the issue could be from different places. the agent gives you a message A that summarizes the different candidates 1, 2 and 3. I'll then tell the agent to focus on candidate 1. we'll investigate, repro, try to fix it etc. until we figure out if it's the bug. Then, I rewind back to message A, with a summary of our findings. The summary is a new message, A-1. You then do the same for each candidates and rewind back to the summary of the previous branch: you then get A-2 and A-3. From there, you can continue on a new branch and refine your understanding of the bug and start brainstorming an actual fix, and/or even investigate further: for example if the bug is actually a mix of the candidates you've identified first. Usually, I would have the agent add a failing test or create a repro-script, and then explain the bug fully with the full context. then I would spawn a new session with only that bug explanation, none of the research above + the repro and ask another agent to research / implement the bug. In some cases, there might be multiple ways to do so, and this might trigger a new multi-tree session.

English

1.3K

aliou@aliouftw·6 May

i love pi.dev 's tree and fork but i also like having a clean context when moving from investigation to implementation. so i made a `/spawn` command that simply creates a new child session with the last message of the parent session bonus, looks nice in `/resume`

English

dave jan@prometx3·3 May

@paraddox @badlogicgames Its not all about writing "better code"

English

Ddox@paraddox·3 May

@badlogicgames now it's just a meme also now there's still that 20% of developers that write better code than 5.5 or 4.7 But next model iteration it will be only 15% of developers The iteration after that it will be 10% And so on until this article will be seen as "ahead of its time" :)

English

1.3K

Mario Zechner@badlogicgames·3 May

i actually don't want this "but you don't review compiler output either" meme to die. it's the perfect signal for being immediately able to ignore someone in this space.

solst/ICE of Astarte@IceSolst

Interesting article on treating agent output like compiler output (and why) skiplabs.io/blog/codegen_a…

English

1.8K

129.6K

dave jan@prometx3·26 Nis

test

English

dave jan@prometx3·25 Nis

@zeeg Nothing

English

616

David Cramer@zeeg·25 Nis

Ran a dozen agents over night shipping new features and you’ll never guess what happened 👇

English

30.4K

dave jan@prometx3·14 Nis

@HarveenChadha The chandra ocr pipeline is pretty advanced in preventing errors. Same with all the newest ocr pipelines. If you would run something like gemini flash naively it would produce some degree of hallucinations but I think pipelines like chandra are 99% hallucination free normally.

English

Harveen Singh Chadha@HarveenChadha·14 Nis

Disappointed that the article says nothing about the OCR validation part OCR-ing 27k arxiv papers with a VLM will inevitably introduce repeated token errors and hallucinations At scale, the quality check of OCR output is a bigger challenge than OCR itself

clem 🤗@ClementDelangue

We just OCR'd 27,000 arxiv papers into Markdown using an open 5B model, 16 parallel HF Jobs on L40S GPUs, and a mounted bucket. Total cost: $850 Total time: ~29 hours Jobs that crashed: 0 This now powers "Chat with your paper" on hf.co/papers

English

101

8.3K

dave jan@prometx3·13 Nis

@ryanvogel @opencode How to apply? 🌚 For real, are there open positions?

English

1.1K

vogel@ryanvogel·13 Nis

a little insight into our @opencode company meetings

English

383

29.4K

dave jan@prometx3·13 Nis

@HammadTime @trychroma thanks. @HammadTime is there any channel where this will be posted? I guess you will post/repost it?

English

hammad 🔍@HammadTime·5 Nis

@prometx3 @trychroma targeting next 1-2 weeks

English

hammad 🔍@HammadTime·30 Mar

My favorite part of working on the @trychroma Context-1 report was how easy interactive explanations have become with AI coding. As a longtime fan of sites like explorabl.es and ciechanow.ski the barrier to quickly iterating on and building interactive explainers is now so absurdly low. No excuse for every developer facing company to not invest in these.

English

911

dave jan@prometx3·12 Nis

@bensig Stop calling it the "highest-scoring" AI memory system. Its not. gist.github.com/roman-rr/0569f…

English

Ben Sigman@bensig·12 Nis

MemPalace just crossed 42K stars and 5.4K forks. v3.1.0 already shipped. Milla and I have barely slept this week. The response has been overwhelming in the best way. We’re running on parallel tracks right now - fixing bugs and reviewing PRs from the community on one side, building the next generation of storage and retrieval on the other. Both are getting better fast. To everyone who has starred, forked, opened issues, submitted PRs, or just sent kind words - thank you. This thing belongs to all of us now. More soon. ✨ github.org/mempalace/memp…

English

383

23.7K

dave jan@prometx3·12 Nis

@diblacksmith @bensig has been done and confirmed faked benchmarks. basically a vibe coded slop and they present it as something new shiny.

English

diego@diblacksmith·12 Nis

@bensig Im super curious! but still waiting for someone to rerun benchmarks and publish the *actual* results

English

498

dave jan@prometx3·10 Nis

@chalish_b @karpathy @kepano this is the marker converted version one shot, as you said it looks pretty clean. github.com/prometixX/pape…

English

chalish@chalish_b·10 Nis

@karpathy @kepano I was gonna upload it to github, but this version shared by Datalab's founder is much cleaner. There is apparently a newer model x.com/VikParuchuri/s…

Vik Paruchuri@VikParuchuri

@karpathy @chalish_b @kepano Here you go - github.com/datalab-to/res… This was with github.com/datalab-to/cha…

English

2.3K

kepano@kepano·9 Nis

I wrote about Microsoft's Markitdown back in 2024, but it's grown into a big messy project now :/ It would be more valuable if Microsoft provided high-quality official libraries for converting their proprietary formats to Markdown (.docx, .xlsx, .pptx, OneNote, etc). For now Obsidian's Markdown conversion options are: 1. Obsidian Web Clipper for converting URLs 2. Obsidian Importer for converting from apps like Notion, Apple Notes, Google Keep, Microsoft OneNote, Evernote, etc

Vaishnavi@_vmlops

MICROSOFT BUILT A TOOL THAT CONVERTS LITERALLY ANYTHING INTO CLEAN MARKDOWN FOR YOUR LLM pdfs. word docs. excel. powerpoint. audio. youtube urls one pip install and your AI pipeline stops choking on raw files forever no custom parsers. no broken layouts. no garbled text. just clean, structured markdown your LLM can actually read github.com/microsoft/mark…

English

1.2K

348.5K

dave jan@prometx3·9 Nis

@karpathy @kepano Sota aee probably llamaparse and datalab but those are hosted services and pretty expensive for a large corpus. For a few parger pdfs you can probably get away with the free plans

English

580

Andrej Karpathy@karpathy·9 Nis

@kepano I just tried it this morning on the 245-page Mythos pdf and it failed badly and the outputs were all mangled. Converting pdfs is really hard, I think it has to probably be a Skill not a program, for a SOTA LLM for it to work properly.

English

170

1.7K

276.5K

dave jan@prometx3·9 Nis

@karpathy @chalish_b @kepano Marker is much much better than anything else I tried. Newer ocr-vlms are also pretty good, look into paddleOcr-vl, dots.mocr

English

684

Andrej Karpathy@karpathy·9 Nis

@chalish_b @kepano In my experience there are approx. one thousand different pdf converters that are all equally terrible for anything except the simplest documents. Post the converted Mythos pdf, figures, tables and all. If good, happy to retweet as this is essential and missing infrastructure.

English

150

14.6K

dave jan@prometx3·8 Nis

@ALEngineered If those models are really that good in finding exploits (especially zero-day), dont you think this will be used by china (they have the resources to also train such a model)

English

684

Steve Huynh@ALEngineered·8 Nis

You guys realize that Claude Mythos can’t be ethically released to the general public ever, right? That is, we just have to wait until the entire internet has been patched of all critical exploits, and all future code is forever scanned going forward. So no software should be released until it has been scanned by Mythos. But you have to be part of the handful of companies that have access to it. We are in a genie-out-of-the-bottle moment. When there’s a new major 0-day exploit, teams of agents will race to compromise systems while the means to stop them will be dependent on whether you are in the club or not (you are likely not in the club)

English

270

112

210.8K

dave jan@prometx3·8 Nis

@BrianRoemmele @bensig He is just trying to get something from the cake. Tells you quite a lot.

English

Brian Roemmele@BrianRoemmele·7 Nis

@bensig Ben wow, thank you. And thank you for this amazing work. Love it!

English

541

Ben Sigman@bensig·7 Nis

If you're into AI - follow Brian. He has great content.

Brian Roemmele@BrianRoemmele

We at The Zero-Human Company have been testing MemPalace by the amazing @bensig and Milla Jovovich and are absolutely blown away! It is a freaking masterpiece and we have deployed it to 79 employees at the company. Each worker will be testing and expanding on MemPalace. I will have a lot to say about how we are using it and how you should to.

English

241

49.9K

dave jan@prometx3·8 Nis

@jeffreyhuber Really want to test the context-1 harness!

English

Jeff Huber@jeffreyhuber·7 Nis

use context-1 for the searching and get an instant 25x cost reduction with better performance

Marc Andreessen 🇺🇸@pmarca

Magical OpenClaw experiences that use frontier models cost $300-1,000/day today, heading to $10,000/day and more. The future shape of the entire technology industry will be how to drive that to $20/month.

English

dave jan@prometx3·8 Nis

@redtachyon It was a good signal to find out who is just jumping on the hypetrain sharing bs without even checking once. And oh boy many jumped on that train

English

942

Ariel@redtachyon·7 Nis

Honestly, if you thought this was legit even for a moment, you're a lost cause

Ben Sigman@bensig

30 second explanation of the MemPalace by Milla Jovovich. By day she’s filming action movies, walking Miu Miu fashion shows, and being a mom. By night she’s coding. She’s the most creative, brilliant, and hilarious person I know. I’m honored to be working with her on this project… more to come.

English

417

25.7K

dave jan@prometx3·7 Nis

@thekitze They faked the benchmark, its basically just claude generated slop. But its interesting how many people are jumping on the hypetrain without even reading the code or even what others have found

English

133

kitze@thekitze·7 Nis

yes, that mila jovovich

Ben Sigman@bensig

My friend Milla Jovovich and I spent months creating an AI memory system with Claude. It just posted a perfect score on the standard benchmark - beating every product in the space, free or paid. It's called MemPalace, and it works nothing like anything else out there. Instead of sending your data to a background agent in the cloud, it mines your conversations locally and organizes them into a palace - a structured architecture with wings, halls, and rooms that mirrors how human memory actually works. Here is what that gets you: → Your AI knows who you are before you type a single word - family, projects, preferences, loaded in ~120 tokens → Palace architecture organizes memories by domain and type - not a flat list of facts, a navigable structure → Semantic search across months of conversations finds the answer in position 1 or 2 → AAAK compression fits your entire life context into 120 tokens - 30x lossless compression any LLM reads natively → Contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them The benchmarks: 100% recall on LongMemEval — first perfect score ever recorded. 500/500 questions. Every question type at 100%. 92.9% on ConvoMem — more than 2x Mem0's score. 100% on LoCoMo — every multi-hop reasoning category, including temporal inference which stumps most systems. No API key. No cloud. No subscription. One dependency. Runs on your machine. Your memories never leave. MIT License. 100% Open Source. github.com/milla-jovovich…

Čeština

14.1K

dave jan@prometx3·7 Nis

@kenwheeler I wonder how one would prevent massive hallucinations with this

English

170

patagucci perf papi@kenwheeler·7 Nis

i set it up it’s honestly naive and mid

Nav Toor@heynavtoor

🚨 Andrej Karpathy thinks RAG is broken. He published the replacement 2 days ago. 5,000 stars in 48 hours. It's called LLM Wiki. A pattern where your AI doesn't retrieve information from scratch every time. It builds and maintains a persistent, compounding knowledge base. Automatically. RAG re-discovers knowledge on every question. LLM Wiki compiles it once and keeps it current. Here's the difference: RAG: You ask a question. AI searches your documents. Finds fragments. Pieces them together. Forgets everything. Starts over next time. LLM Wiki: You add a source. AI reads it, extracts key information, updates entity pages, revises topic summaries, flags contradictions, strengthens the synthesis. The knowledge compounds. Every source makes the wiki smarter. Permanently. Here's how it works: → Drop a source into your raw collection. Article, paper, transcript, notes. → AI reads it, writes a summary, updates the index → Updates every relevant entity and concept page across the wiki → One source can touch 10 to 15 wiki pages simultaneously → Cross-references are built automatically → Contradictions between sources get flagged → Ask questions against the wiki. Good answers get filed back as new pages. → Your explorations compound in the knowledge base. Nothing disappears into chat history. Here's the wildest part: Karpathy's use case examples: → Personal: track goals, health, psychology. File journal entries and articles. Build a structured picture of yourself over time. → Research: read papers for months. Build a comprehensive wiki with an evolving thesis. → Reading a book: build a fan wiki as you read. Characters, themes, plot threads. All cross-referenced. → Business: feed it Slack threads, meeting transcripts, customer calls. The wiki stays current because the AI does the maintenance nobody wants to do. Think of it like this: Obsidian is the IDE. The LLM is the programmer. The wiki is the codebase. You never write the wiki yourself. You source, explore, and ask questions. The AI does all the grunt work. NotebookLM, ChatGPT file uploads, and most RAG systems re-derive knowledge on every query. This compiles it once and builds on it forever. 5,000+ stars. 1,294 forks. Published by Andrej Karpathy. 2 days ago. 100% Open Source.

English

166

34.5K

Keşfet

@mitsuhiko @rachelnabors @badlogicgames @antirez @aliouftw @shirtwascash @paraddox @zeeg