dave jan

258 posts

dave jan banner
dave jan

dave jan

@prometx3

Austria Katılım Mayıs 2013
342 Takip Edilen51 Takipçiler
dave jan
dave jan@prometx3·
@mitsuhiko I dont even know how to force the clanker to blow it up like this, thats actually a skill 😂
English
0
0
2
111
Armin Ronacher ⇌
Armin Ronacher ⇌@mitsuhiko·
This one is only 1370 files but impressive 1.4 million lines of added lines of code. Also somehow more than a 1000 commits.
Armin Ronacher ⇌ tweet media
English
2
0
87
9.7K
Armin Ronacher ⇌
Armin Ronacher ⇌@mitsuhiko·
A selection of great PRs that were submitted to Pi — a thread.
English
36
15
414
56.3K
dave jan
dave jan@prometx3·
@aliouftw @shirtwascash Interesting workflow. Do you use any extension to rollback codechanges that might have occured while investigating the bug? As far as I know there is no built in way in PI to revert changes right?
English
1
0
0
86
aliou
aliou@aliouftw·
it really depends on how you work. I usually use trees when I'm investigating bugs: let's say you have multiple you have a bug but the issue could be from different places. the agent gives you a message A that summarizes the different candidates 1, 2 and 3. I'll then tell the agent to focus on candidate 1. we'll investigate, repro, try to fix it etc. until we figure out if it's the bug. Then, I rewind back to message A, with a summary of our findings. The summary is a new message, A-1. You then do the same for each candidates and rewind back to the summary of the previous branch: you then get A-2 and A-3. From there, you can continue on a new branch and refine your understanding of the bug and start brainstorming an actual fix, and/or even investigate further: for example if the bug is actually a mix of the candidates you've identified first. Usually, I would have the agent add a failing test or create a repro-script, and then explain the bug fully with the full context. then I would spawn a new session with only that bug explanation, none of the research above + the repro and ask another agent to research / implement the bug. In some cases, there might be multiple ways to do so, and this might trigger a new multi-tree session.
English
2
0
4
1.3K
aliou
aliou@aliouftw·
i love pi.dev 's tree and fork but i also like having a clean context when moving from investigation to implementation. so i made a `/spawn` command that simply creates a new child session with the last message of the parent session bonus, looks nice in `/resume`
aliou tweet mediaaliou tweet media
English
7
2
60
6K
Ddox
Ddox@paraddox·
@badlogicgames now it's just a meme also now there's still that 20% of developers that write better code than 5.5 or 4.7 But next model iteration it will be only 15% of developers The iteration after that it will be 10% And so on until this article will be seen as "ahead of its time" :)
English
1
0
3
1.3K
dave jan
dave jan@prometx3·
test
English
0
0
0
3
David Cramer
David Cramer@zeeg·
Ran a dozen agents over night shipping new features and you’ll never guess what happened 👇
English
27
3
93
30.4K
dave jan
dave jan@prometx3·
@HarveenChadha The chandra ocr pipeline is pretty advanced in preventing errors. Same with all the newest ocr pipelines. If you would run something like gemini flash naively it would produce some degree of hallucinations but I think pipelines like chandra are 99% hallucination free normally.
English
0
0
0
84
Harveen Singh Chadha
Harveen Singh Chadha@HarveenChadha·
Disappointed that the article says nothing about the OCR validation part OCR-ing 27k arxiv papers with a VLM will inevitably introduce repeated token errors and hallucinations At scale, the quality check of OCR output is a bigger challenge than OCR itself
clem 🤗@ClementDelangue

We just OCR'd 27,000 arxiv papers into Markdown using an open 5B model, 16 parallel HF Jobs on L40S GPUs, and a mounted bucket. Total cost: $850 Total time: ~29 hours Jobs that crashed: 0 This now powers "Chat with your paper" on hf.co/papers

English
12
2
101
8.3K
vogel
vogel@ryanvogel·
a little insight into our @opencode company meetings
English
13
8
383
29.4K
hammad 🔍
hammad 🔍@HammadTime·
My favorite part of working on the @trychroma Context-1 report was how easy interactive explanations have become with AI coding. As a longtime fan of sites like explorabl.es and ciechanow.ski the barrier to quickly iterating on and building interactive explainers is now so absurdly low. No excuse for every developer facing company to not invest in these.
English
1
1
21
911
Ben Sigman
Ben Sigman@bensig·
MemPalace just crossed 42K stars and 5.4K forks. v3.1.0 already shipped. Milla and I have barely slept this week. The response has been overwhelming in the best way. We’re running on parallel tracks right now - fixing bugs and reviewing PRs from the community on one side, building the next generation of storage and retrieval on the other. Both are getting better fast. To everyone who has starred, forked, opened issues, submitted PRs, or just sent kind words - thank you. This thing belongs to all of us now. More soon. ✨ github.org/mempalace/memp…
Ben Sigman tweet media
English
32
32
383
23.7K
dave jan
dave jan@prometx3·
@diblacksmith @bensig has been done and confirmed faked benchmarks. basically a vibe coded slop and they present it as something new shiny.
English
0
0
2
37
diego
diego@diblacksmith·
@bensig Im super curious! but still waiting for someone to rerun benchmarks and publish the *actual* results
English
2
0
5
498
kepano
kepano@kepano·
I wrote about Microsoft's Markitdown back in 2024, but it's grown into a big messy project now :/ It would be more valuable if Microsoft provided high-quality official libraries for converting their proprietary formats to Markdown (.docx, .xlsx, .pptx, OneNote, etc). For now Obsidian's Markdown conversion options are: 1. Obsidian Web Clipper for converting URLs 2. Obsidian Importer for converting from apps like Notion, Apple Notes, Google Keep, Microsoft OneNote, Evernote, etc
Vaishnavi@_vmlops

MICROSOFT BUILT A TOOL THAT CONVERTS LITERALLY ANYTHING INTO CLEAN MARKDOWN FOR YOUR LLM pdfs. word docs. excel. powerpoint. audio. youtube urls one pip install and your AI pipeline stops choking on raw files forever no custom parsers. no broken layouts. no garbled text. just clean, structured markdown your LLM can actually read github.com/microsoft/mark…

English
42
37
1.2K
348.5K
dave jan
dave jan@prometx3·
@karpathy @kepano Sota aee probably llamaparse and datalab but those are hosted services and pretty expensive for a large corpus. For a few parger pdfs you can probably get away with the free plans
English
0
0
0
580
Andrej Karpathy
Andrej Karpathy@karpathy·
@kepano I just tried it this morning on the 245-page Mythos pdf and it failed badly and the outputs were all mangled. Converting pdfs is really hard, I think it has to probably be a Skill not a program, for a SOTA LLM for it to work properly.
English
170
37
1.7K
276.5K
dave jan
dave jan@prometx3·
@karpathy @chalish_b @kepano Marker is much much better than anything else I tried. Newer ocr-vlms are also pretty good, look into paddleOcr-vl, dots.mocr
English
0
0
2
684
Andrej Karpathy
Andrej Karpathy@karpathy·
@chalish_b @kepano In my experience there are approx. one thousand different pdf converters that are all equally terrible for anything except the simplest documents. Post the converted Mythos pdf, figures, tables and all. If good, happy to retweet as this is essential and missing infrastructure.
English
30
3
150
14.6K
dave jan
dave jan@prometx3·
@ALEngineered If those models are really that good in finding exploits (especially zero-day), dont you think this will be used by china (they have the resources to also train such a model)
English
0
0
2
684
Steve Huynh
Steve Huynh@ALEngineered·
You guys realize that Claude Mythos can’t be ethically released to the general public ever, right? That is, we just have to wait until the entire internet has been patched of all critical exploits, and all future code is forever scanned going forward. So no software should be released until it has been scanned by Mythos. But you have to be part of the handful of companies that have access to it. We are in a genie-out-of-the-bottle moment. When there’s a new major 0-day exploit, teams of agents will race to compromise systems while the means to stop them will be dependent on whether you are in the club or not (you are likely not in the club)
English
270
112
2K
210.8K
Brian Roemmele
Brian Roemmele@BrianRoemmele·
@bensig Ben wow, thank you. And thank you for this amazing work. Love it!
English
3
0
4
541
dave jan
dave jan@prometx3·
@redtachyon It was a good signal to find out who is just jumping on the hypetrain sharing bs without even checking once. And oh boy many jumped on that train
English
0
0
6
942
dave jan
dave jan@prometx3·
@thekitze They faked the benchmark, its basically just claude generated slop. But its interesting how many people are jumping on the hypetrain without even reading the code or even what others have found
English
0
0
4
133
kitze
kitze@thekitze·
yes, that mila jovovich
Ben Sigman@bensig

My friend Milla Jovovich and I spent months creating an AI memory system with Claude. It just posted a perfect score on the standard benchmark - beating every product in the space, free or paid. It's called MemPalace, and it works nothing like anything else out there. Instead of sending your data to a background agent in the cloud, it mines your conversations locally and organizes them into a palace - a structured architecture with wings, halls, and rooms that mirrors how human memory actually works. Here is what that gets you: → Your AI knows who you are before you type a single word - family, projects, preferences, loaded in ~120 tokens → Palace architecture organizes memories by domain and type - not a flat list of facts, a navigable structure → Semantic search across months of conversations finds the answer in position 1 or 2 → AAAK compression fits your entire life context into 120 tokens - 30x lossless compression any LLM reads natively → Contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them The benchmarks: 100% recall on LongMemEval — first perfect score ever recorded. 500/500 questions. Every question type at 100%. 92.9% on ConvoMem — more than 2x Mem0's score. 100% on LoCoMo — every multi-hop reasoning category, including temporal inference which stumps most systems. No API key. No cloud. No subscription. One dependency. Runs on your machine. Your memories never leave. MIT License. 100% Open Source. github.com/milla-jovovich…

Čeština
16
1
66
14.1K
dave jan
dave jan@prometx3·
@kenwheeler I wonder how one would prevent massive hallucinations with this
English
0
0
0
170
patagucci perf papi
patagucci perf papi@kenwheeler·
i set it up it’s honestly naive and mid
Nav Toor@heynavtoor

🚨 Andrej Karpathy thinks RAG is broken. He published the replacement 2 days ago. 5,000 stars in 48 hours. It's called LLM Wiki. A pattern where your AI doesn't retrieve information from scratch every time. It builds and maintains a persistent, compounding knowledge base. Automatically. RAG re-discovers knowledge on every question. LLM Wiki compiles it once and keeps it current. Here's the difference: RAG: You ask a question. AI searches your documents. Finds fragments. Pieces them together. Forgets everything. Starts over next time. LLM Wiki: You add a source. AI reads it, extracts key information, updates entity pages, revises topic summaries, flags contradictions, strengthens the synthesis. The knowledge compounds. Every source makes the wiki smarter. Permanently. Here's how it works: → Drop a source into your raw collection. Article, paper, transcript, notes. → AI reads it, writes a summary, updates the index → Updates every relevant entity and concept page across the wiki → One source can touch 10 to 15 wiki pages simultaneously → Cross-references are built automatically → Contradictions between sources get flagged → Ask questions against the wiki. Good answers get filed back as new pages. → Your explorations compound in the knowledge base. Nothing disappears into chat history. Here's the wildest part: Karpathy's use case examples: → Personal: track goals, health, psychology. File journal entries and articles. Build a structured picture of yourself over time. → Research: read papers for months. Build a comprehensive wiki with an evolving thesis. → Reading a book: build a fan wiki as you read. Characters, themes, plot threads. All cross-referenced. → Business: feed it Slack threads, meeting transcripts, customer calls. The wiki stays current because the AI does the maintenance nobody wants to do. Think of it like this: Obsidian is the IDE. The LLM is the programmer. The wiki is the codebase. You never write the wiki yourself. You source, explore, and ask questions. The AI does all the grunt work. NotebookLM, ChatGPT file uploads, and most RAG systems re-derive knowledge on every query. This compiles it once and builds on it forever. 5,000+ stars. 1,294 forks. Published by Andrej Karpathy. 2 days ago. 100% Open Source.

English
27
1
166
34.5K