PageIndex

1.4K posts

PageIndex banner
PageIndex

PageIndex

@PageIndexAI

Agentic AI for Documents. Try https://t.co/yRBqagX9Na.

London Katılım Haziran 2023
277 Takip Edilen978 Takipçiler
PageIndex
PageIndex@PageIndexAI·
@heynavtoor Thanks for sharing! We're betting on "Reasoning-as-Retrieval" with PageIndex: the LLM agentically reasons over a document tree index to find the right context — no vector DB, no chunking. github.com/VectifyAI/Page…
English
0
0
0
69
Nav Toor
Nav Toor@heynavtoor·
🚨 This Python tool just made vector databases optional for RAG. It's called PageIndex. It reads documents the way you do. No embeddings. No chunking. No vector database needed. Here's the problem with normal RAG: It takes your document, cuts it into tiny pieces, turns those pieces into numbers, and searches for the closest match. But closest match doesn't mean best answer. PageIndex works completely different. → It reads your full document → Builds a tree structure like a table of contents → When you ask a question, the AI walks through that tree → It thinks step by step until it finds the exact right section Same way you'd find an answer in a textbook. You don't read every page. You check the chapters, pick the right one, and go straight to the answer. That's exactly what PageIndex teaches AI to do. Here's the wildest part: It scored 98.7% accuracy on FinanceBench. That's a test where AI answers real questions from SEC filings and earnings reports. Most traditional RAG systems can't touch that number. Works with PDFs, markdown, and even raw page images without OCR. 100% Open Source. MIT License.
Nav Toor tweet media
English
51
103
742
61.5K
i5ting
i5ting@i5ting·
我现在关注的问题和一些开源项目: 1、ai在monorepo里并行开发:web3infra-foundation/mega 2、多实例agent,如何做隔离,docker太慢:vm0-ai/vm0 3、基于git提交记录,针对prompt等审计,回放:entireio/cli 4、一些新的解决方案,比如不基于向量实现rag等:VectifyAI/PageIndex
中文
2
3
75
8.7K
Rohan Paul
Rohan Paul@rohanpaul_ai·
Yann LeCun's (@ylecun ) new paper along with other top researchers proposes a brilliant idea. 🎯 Says that chasing general AI is a mistake and we must build superhuman adaptable specialists instead. The whole AI industry is obsessed with building machines that can do absolutely everything humans can do. But this goal is fundamentally flawed because humans are actually highly specialized creatures optimized only for physical survival. Instead of trying to force one giant model to master every possible task from folding laundry to predicting protein structures, they suggest building expert systems that learn generic knowledge through self-supervised methods. By using internal world models to understand how things work, these specialized systems can quickly adapt to solve complex problems that human brains simply cannot handle. This shift means we can stop wasting computing power on human traits and focus on building diverse tools that actually solve hard real-world problems. So overall the researchers here propose a new target called Superhuman Adaptable Intelligence which focuses strictly on how fast a system learns new skills. The paper explicitly argues that evolution shaped human intelligence strictly as a specialized tool for physical survival. The researchers state that nature optimized our brains specifically for tasks necessary to stay alive in the physical world. They explain that abilities like walking or seeing seem incredibly general to us only because they are absolutely critical for our existence. The authors point out that humans are actually terrible at cognitive tasks outside this evolutionary comfort zone, like calculating massive mathematical probabilities. The study highlights how a chess grandmaster only looks intelligent compared to other humans, while modern computers easily crush those human limits. This proves their central point that humanity suffers from an illusion of generality simply because we cannot perceive our own biological blind spots. They conclude that building machines to mimic this narrow human survival toolkit is a deeply flawed way to create advanced technology.
Rohan Paul tweet media
Rohan Paul@rohanpaul_ai

Yann LeCun (@ylecun ) explains why LLMs are so limited in terms of real-world intelligence. Says the biggest LLM is trained on about 30 trillion words, which is roughly 10 to the power 14 bytes of text. That sounds huge, but a 4 year old who has been awake about 16,000 hours has also taken in about 10 to the power 14 bytes through the eyes alone. So a small child has already seen as much raw data as the largest LLM has read. But the child’s data is visual, continuous, noisy, and tied to actions: gravity, objects falling, hands grabbing, people moving, cause and effect. From this, the child builds an internal “world model” and intuitive physics, and can learn new tasks like loading a dishwasher from a handful of demonstrations. LLMs only see disconnected text and are trained just to predict the next token. So they get very good at symbol patterns, exams, and code, but they lack grounded physical understanding, real common sense, and efficient learning from a few messy real-world experiences. --- From 'Pioneer Works' YT channel (link in comment)

English
118
315
1.6K
208.3K
Nav Toor
Nav Toor@heynavtoor·
Zhipu AI just mass-dropped a 744 billion parameter model and nobody is talking about it. It's called GLM-5. Built by Zhipu AI and Tsinghua University. And it doesn't just match Claude Opus 4.5 and GPT-5.2 on benchmarks. It matches them in ways that expose what's actually happening in the AI race right now. Let me explain why this paper matters more than the leaderboard scores. The subtitle tells you everything: "From Vibe Coding to Agentic Engineering." That's not marketing. That's a philosophical claim about where AI development is heading. Vibe coding is when a human prompts an AI to write code. Agentic engineering is when the AI writes the code itself. It plans. It implements. It iterates. It debugs. It ships. GLM-5 is built for the second world. Here are the raw numbers: 744B total parameters. 40B active at any time. Trained on 28.5 trillion tokens. That's double the size of their previous model GLM-4.5 (355B total, 32B active). Context window pushed to 200K tokens. First open-weights model to score 50 on the Artificial Analysis Intelligence Index v4.0. Previous version scored 42. That's an 8-point jump in a single generation. Number 1 open model on LMArena Text Arena. Number 1 open model on LMArena Code Arena. Comparable to Claude Opus 4.5 and GPT-5.2 overall. On SWE-bench Verified, it beats Gemini 3 Pro. On SWE-bench Multilingual, it beats both Gemini 3 Pro and GPT-5.2. On BrowseComp, it achieves state-of-the-art among ALL frontier models, open and closed, in both English and Chinese. But here's where the paper gets interesting. Not in what GLM-5 can do. In HOW they built it. Three technical innovations that the industry should be paying attention to: First: DeepSeek Sparse Attention. Traditional attention is O(L²). At 128K context, that becomes prohibitively expensive. DSA replaces dense attention with a dynamic selection mechanism. Instead of attending to everything, the model looks at the content to decide which tokens actually matter. Not a fixed sliding window. Not a static pattern. Content-aware sparsity. The result: 90% of attention entries in long contexts are redundant. DSA cuts attention computation by 1.5 to 2x for long sequences. And here's what makes it elegant. They didn't train from scratch. They adapted it via continued pre-training from their dense base model. Two stages: a 1000-step warmup training only the indexer, then 20B tokens of joint training. That's a fraction of the cost DeepSeek spent (943.7B tokens) and they matched the original model's performance. They proved this empirically. They fine-tuned both the DSA and the original MLA models with identical SFT data. Same training loss. Same evaluation benchmarks. The sparse model lost nothing. Second: Fully asynchronous reinforcement learning for agents. This is the infrastructure innovation nobody is talking about. Standard synchronous RL is crippled by long-horizon agent tasks. An agent needs to browse the web, write code, execute it, check results, iterate. That takes minutes per trajectory. During that time, GPUs sit idle. Zhipu's solution decouples the inference engine from the training engine entirely. The inference engine generates trajectories continuously. Once enough trajectories accumulate, they batch-send to the training engine. Model weights sync back periodically. But asynchronous RL introduces a brutal problem: off-policy drift. Different trajectories get generated by different versions of the model. Their solution is a Token-in-Token-out gateway that preserves exact action-level correspondence between what was sampled and what is optimized. No re-tokenization. No boundary mismatches. No lossy text round-trips. They also developed double-sided importance sampling with token-level clipping that controls off-policy bias without tracking historical policy checkpoints. This is the kind of systems engineering that separates models that work on benchmarks from models that work in production. Third: The multi-stage RL pipeline. Not one RL phase. Four sequential stages: Reasoning RL, then Agentic RL, then General RL, then cross-stage distillation. Each stage optimizes for different capabilities. Reasoning RL covers math, science, code, and tool-integrated reasoning. Agentic RL handles software engineering, terminal tasks, and multi-hop search. General RL optimizes for human-style alignment across three dimensions: foundational correctness, emotional intelligence, and task-specific quality. The final distillation stage uses the checkpoints from all previous stages as teachers, preventing catastrophic forgetting. The model retains its reasoning edge while becoming a robust generalist. Here's the part most people will miss: This model runs natively on seven different Chinese chip platforms. Huawei Ascend. Moore Threads. Hygon. Cambricon. Kunlunxin. MetaX. Enflame. From day one. That's not an afterthought. That's a strategic decision. They developed custom fusion kernels for sparse attention on Ascend NPUs. They implemented W4A8 mixed-precision quantization to fit 744B parameters onto a single Chinese node. They claim performance comparable to dual-GPU international clusters while cutting deployment costs by 50% for long-sequence scenarios. This is what chip independence looks like in practice. Not announcements. Not roadmaps. A 744B parameter frontier model running in production on domestic hardware. And then there's the Easter Egg. Before revealing GLM-5, Zhipu released it anonymously on OpenRouter under the codename "Pony Alpha." No brand name. No hype. Just the model. Within days it became a sensation. Developers noticed its exceptional performance in coding tasks, agentic workflows, and roleplay. The community speculated wildly: 25% guessed it was Claude Sonnet 5. 20% guessed DeepSeek. 10% guessed Grok. It was none of them. It was GLM-5 from a Chinese lab most Western developers had never heard of. The paper's own words: "The eventual confirmation that it was indeed our GLM-5 was a profound moment for us, effectively silencing doubts about whether Chinese LLMs could compete at the frontier level." That's the real story here. Not the benchmark scores. The fact that when stripped of branding, a Chinese open-weights model was indistinguishable from the world's best proprietary systems. The gap between open and closed models isn't narrowing. It's collapsing. And the gap between Western and Chinese AI labs isn't what anyone assumed it was. GLM-5 is open-weights. Apache 2.0. Code, models, and weights all available.
Nav Toor tweet media
English
21
28
126
17K
Rimsha Bhardwaj
Rimsha Bhardwaj@heyrimsha·
🚨 Holy shit… Google published one of the cleanest demonstrations of real multi-agent intelligence I’ve seen so far. Not another “look, two chatbots are talking” demo. An actual framework for how agents can infer who they’re interacting with and adapt on the fly. The paper is “Multi-agent cooperation through in-context co-player inference.” The core idea is deceptively simple: In multi-agent environments, performance doesn’t just depend on the task. It depends on who you’re paired with. Most current systems ignore this. They optimize against an average opponent. Or assume fixed partner behavior. Or hard-code roles. Google does something smarter. They let the model infer its co-player’s strategy directly from the interaction history inside the context window. No retraining, separate belief model and no explicit opponent classifier. Just in-context inference. The agent observes a few rounds of behavior. Forms an implicit hypothesis about its partner’s type. Then updates its own strategy accordingly. This turns static policies into adaptive ones. The experiments are structured around cooperative and social dilemma games where partner types vary: Some partners are fully cooperative. Some are selfish. Some are stochastic. Some strategically defect. Agents without co-player inference treat all partners the same. Agents with inference adjust. And the performance gap is significant. What makes this paper uncomfortable for a lot of current “multi-agent” hype is how clearly it shows what real coordination requires. First, coordination is not just communication. It’s modeling the incentives and likely actions of others. Second, robustness matters. An agent that cooperates blindly gets exploited. An agent that defects blindly loses cooperative gains. The system must dynamically balance trust and caution. Third, adaptation must happen at inference time. In real deployments, you cannot retrain every time the population changes. The most interesting part is that this capability emerges purely from structured context. The model isn’t fine-tuned to classify opponent types explicitly. It uses behavioral traces embedded in the prompt to infer latent strategy. That’s belief modeling through language. And it scales. Think about where this matters outside toy games: Autonomous trading systems reacting to different market participants. Negotiation agents interacting with unpredictable humans. Distributed AI workflows coordinating across departments. Swarm robotics where teammate reliability varies. In all these settings, static competence is not enough. Strategic awareness is the bottleneck. The deeper shift is philosophical. We’ve been treating LLM agents as isolated optimizers. This paper moves us toward agents that reason about other agents reasoning about them. That’s recursive modeling. And once that loop becomes stable, you no longer have “a chatbot.” You have a participant in a strategic ecosystem. The takeaway isn’t that multi-agent AI is solved. It’s that most current systems aren’t even attempting the hard part. Real multi-agent intelligence isn’t multiple prompts in parallel. It’s adaptive belief formation under uncertainty. And this paper is one of the first clean proofs that large models can do that using nothing but context. Paper: Multi-agent cooperation through in-context co-player inference
Rimsha Bhardwaj tweet media
English
35
81
518
36.9K
kajuKatli_Kavya
kajuKatli_Kavya@KavyaKapoor420·
RAG without vectors? Yes 𝗩𝗲𝗰𝘁𝗼𝗿𝗹𝗲𝘀𝘀 𝗥𝗔𝗚 is a real thing, and it’s surprisingly effective I feel like Today I just explored 𝗣𝗮𝗴𝗲𝗜𝗻𝗱𝗲𝘅, a reasoning-based, vectorless RAG framework that transforms documents into a tree-like structure of nodes and subnode.
kajuKatli_Kavya tweet mediakajuKatli_Kavya tweet media
English
3
0
3
424
Damien Noir
Damien Noir@damienoir·
Most AI search tools are just fancy ctrl+F. They find text that looks similar. They don't actually think about what you're asking. Reasoning-based RAG changes everything.
Damien Noir tweet media
English
2
1
1
71
Venkat
Venkat@Venkatpachalaa·
Found a best way to reduce retrieval drift and hallucination. Blending vector search (recall) + structure based reasoning (precision) makes answers stay tighter to the query. So basically, AI systems are won in architecture, not prompts. github.com/VectifyAI/Page…
English
1
1
4
72
Abhay
Abhay@abhxy03·
Wake up babe,a better alternative of vector databases is here ! ->PageIndex is a vectorless Retrieval-Augmented Generation (RAG) framework that replaces traditional vector embeddings and document chunking with a hierarchical tree index for precise, reasoning-based retrieval. ->Core Concept: It builds a “Tree Index” mimicking a table of contents, where documents are organized into multi-level nodes (e.g., chapters → sections → paragraphs) with LLM-generated summaries at each level. This preserves original structure and enables agentic navigation by large language models, simulating human-like document exploration. ->How It Works: The process starts by parsing a document into a JSON-based tree structure, linking each node to raw content like text or tables. During queries, an LLM reasons over this in-context index to select relevant paths, retrieving exact sections without similarity searches ->Key Advantages: •No vectors or databases: Avoids embedding costs and storage; retrieval is deterministic and auditable •High accuracy: Reports 98.7% on benchmarks like FinanceBench for complex docs (e.g., legal or financial reports). •Explainable: Traces retrieval paths, unlike opaque vector matches.
Abhay tweet media
English
2
0
9
130
PageIndex
PageIndex@PageIndexAI·
@hackernoon Thanks for sharing PageIndex! We are betting on "Reasoning-as-Retrieval" with PageIndex: the LLM agentically reasons over a document tree index to find the right context. No vector DB, no chunking. github.com/VectifyAI/Page…
English
0
0
0
41
Yuri Quintana, PhD, FACMI
Yuri Quintana, PhD, FACMI@yuriquintana·
A new “vectorless RAG” approach, PageIndex, replaces chunking & embeddings with a hierarchical table-of-contents tree. model navigates sections step-by-step; reports 98.7% accuracy on FinanceBench, outperforming traditional RAG. Tradeoff: higher cost & latency. #AIResearch pageindex.ai/blog/Mafin2.5
English
1
0
1
78
PageIndex
PageIndex@PageIndexAI·
@oyik_ai Thanks for sharing PageIndex! We are betting on "Reasoning-as-Retrieval" with PageIndex: the LLM agentically reasons over a document tree index to find the right context. No vector DB, no chunking. github.com/VectifyAI/Page…
English
0
0
0
23
Oyik.ai
Oyik.ai@oyik_ai·
RAG is broken for long docs. Chunk → embed → vector DB → pray you didn't miss the footnote on page 247. PageIndex just open-sourced a fix: hierarchical tree over your full PDF, LLM-guided search, no vector DB needed. It hit GitHub Top 10 trending. Worth watching. #AI #RAG #LLM
Oyik.ai tweet media
English
3
0
4
139
Eddy_Job
Eddy_Job@Eddy_Kerario·
Most AI tools search documents by matching patterns (vector matching) PageIndex does something smarter — it reads the document, builds a smart outline, then thinks through it step by step to find your answer. Like a human expert flipping through a report. We finally have something that addresses the drawbacks of RAG systems. 98.7% accuracy. Open source. No cost. 🔗 github.com/VectifyAI/Page…
English
1
1
3
47
Hadj Hadji
Hadj Hadji@elhadjx__·
🚨 Vector DBs are dead. PageIndex just killed them. Instead of mangling your docs into floating-point soup, PageIndex preserves full document structure, sections, hierarchy, context, exactly how an LLM needs to read it. ✅ PROS: - No chunking artifacts destroying your context - Document structure fully preserved - Retrieval is coherent, not fragmented - Dramatically fewer hallucinations in prod - Simpler pipeline, less infra to maintain ❌ CONS: - Higher token usage per retrieval (you're pulling pages, not snippets) - Latency can spike on large documents - Not ideal for massive corpora at scale (yet) - Smaller ecosystem vs. mature Vector DB tooling We've been duct-taping embeddings together for 3 years. It's over. #RAG #LLM #VectorDB #AIEngineering
Hadj Hadji tweet media
English
1
0
1
43
PageIndex
PageIndex@PageIndexAI·
@techwith_ram Thanks for sharing PageIndex! We are betting on "Reasoning-as-Retrieval" with PageIndex: the LLM agentically reasons over a document tree index to find the right context. No vector DB, no chunking. github.com/VectifyAI/Page…
English
0
0
0
20
𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰
Vector Database is not needed anymore for RAG? 𝗣𝗮𝗴𝗲𝗜𝗻𝗱𝗲𝘅: 𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗜𝗻𝗱𝗲𝘅 𝗳𝗼𝗿 𝗩𝗲𝗰𝘁𝗼𝗿𝗹𝗲𝘀𝘀, 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴-𝗯𝗮𝘀𝗲𝗱 𝗥𝗔𝗚 Repo: github.com/VectifyAI/Page… 𝗖𝗼𝗿𝗲 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀 - Compared to traditional vector-based RAG, PageIndex features: - No Vector DB: Uses document structure and LLM reasoning for retrieval instead of vector similarity search. - No Chunking: Documents are organized into natural sections, not artificial chunks. - Human-like Retrieval: Simulates how human experts navigate and extract knowledge from complex documents. - Better Explainability and Traceability: Retrieval is based on reasoning—traceable and interpretable, with page and section references. No more opaque, approximate vector search (“vibe retrieval”).
𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰 tweet media
English
1
14
64
2.9K
PageIndex
PageIndex@PageIndexAI·
Thanks for sharing PageIndex! We are betting on Reasoning-as-Retrieval with PageIndex: the LLM agentically reasons over a document tree index to find the right context — no vector DB, no chunking. github.com/VectifyAI/Page…
English
0
0
3
69
Tech with Mak
Tech with Mak@techNmak·
Chunking is the original sin of RAG. You take a beautifully structured document. Slice it into arbitrary 512-token pieces. Destroy all context. Then wonder why retrieval is bad. PageIndex doesn't chunk. Documents stay organized in natural sections. Hierarchy preserved. Context intact. Instead of similarity search over chunks, it uses reasoning over structure. → Build a tree index (like a smart table of contents) → Navigate with LLM reasoning → Find relevant sections through tree search 98.7% accuracy on FinanceBench. No vectors. No chunks. No destroyed context. 18.2K stars. Worth a look. GitHub Repo in comments.
Tech with Mak tweet media
English
31
66
600
36.5K
TrendSpider
TrendSpider@TrendSpider·
$NVDA THE WORLD IS SAVED
TrendSpider tweet media
English
55
80
1K
163.9K