Vertically Integrated Consulting

10 posts

Vertically Integrated Consulting banner
Vertically Integrated Consulting

Vertically Integrated Consulting

@verticalicon

Great clients inspire great work.

Katılım Şubat 2023
106 Takip Edilen1 Takipçiler
Vertically Integrated Consulting
The described method in @isotopes_ai paper achieves over 90% internal error interception before user exposure while balancing cost and latency, expanding capabilities without impacting existing functionalities.
English
0
0
0
10
Vertically Integrated Consulting
Specialized agent teams organized to achieve clear goals, utilizing a remote code executor to separate data transformations from reasoning. Agents execute code remotely, returning only relevant summaries, thereby maintaining a separation between perception and execution.
Arun C Murthy@acmurthy

We recently talked about the "Team of Rivals" architecture which we bring to bear @isotopes_ai to scale sophisticated AI Agents, here is our paper on @arxiv: arxiv.org/abs/2601.14351 Discuss: news.ycombinator.com/item?id=468023…

English
1
1
1
476
Vertically Integrated Consulting retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
New Stanford + SambaNova + UC Berkeley paper proposes quite a revolutionary idea. 🤯 Proves LLMs can be improved by purely changing the input context, instead of changing weights. Introduces a new method called Agentic Context Engineering (ACE). It helps language models improve by updating what they read and remember, instead of changing their core weights. It has three parts: a Generator that works on tasks, a Reflector that learns from mistakes, and a Curator that updates the notebook with helpful lessons. So ACE works like this. The model first tries to solve a task. While doing that, it writes down its reasoning steps, which show what helped and what caused mistakes. That record is passed to another model, called the Reflector. The Reflector reads it and writes short lessons based on it, such as useful tricks, common errors, or patterns that worked better. Then another component, the Curator, takes these lessons and turns them into small, clearly written notes called delta items. These are merged into the existing playbook using simple rules. The system does not rewrite the whole context, it only adds or edits these tiny pieces. Because of this, it keeps all the useful older notes while gradually improving the context with every new task. Over time, the playbook becomes a stronger, richer guide that helps the model perform better on future tasks without retraining or changing weights. This design avoids full rewrites that can trigger “context collapse”, where a long, useful context shrinks and accuracy drops. Instead, the context grows steadily and stays specific to the domain. 🧵 Read on 👇
Rohan Paul tweet media
English
17
132
561
62.3K
Vertically Integrated Consulting retweetledi
Carlos E. Perez
Carlos E. Perez@IntuitMachine·
Ever feel like you're talking to an AI with severe short-term memory loss? You watch it solve a problem, have a brilliant insight... and then five minutes later, it's forgotten everything, starting from scratch on the next task. It's Groundhog Day, but for logic. This isn't just a quirk. It's a fundamental flaw. Every time an LLM reasons, it's like writing a masterpiece on a whiteboard that gets wiped clean the second it's done. The solution is generated, but the wisdom—the 'how'—is lost forever. A colossal waste. But what if the AI could stop the whiteboard from being wiped? What if it could look at its own work, identify the clever bits, and save them for later? New research from Meta & Princeton does exactly this. It gives the AI a notebook and teaches it how to take notes. They call it 'Metacognitive Reuse'. After solving a problem, the LLM is prompted to reflect on its own reasoning and extract the core strategies into reusable 'Behaviors'. This isn't just memory; it's a knowledge capture process. It turns a fleeting thought into a permanent asset. So, a long-winded derivation of a mathematical formula becomes a compact, reusable tool. The knowledge is no longer a one-off performance. It's been catalogued, named, and put into a "behavior handbook"—a library of its own best ideas. This is where it gets really interesting. The immediate benefit is efficiency. Reusing knowledge is obviously faster than re-inventing it. Token usage dropped by up to 46%. But the true magic happens when that reusable knowledge starts to compound. By having a library of trusted methods, the AI doesn't get bogged down re-solving solved problems. It can grab a reliable tool from its own handbook and focus its 'thinking' on the truly novel parts of the new challenge. This is why accuracy on hard maths problems actually increased by up to 10%. Think about it. You don't teach a carpenter to build a house by forcing them to first re-invent the hammer, the saw, and the level every single time. You give them a toolbox of reliable, reusable tools. This research shows how an AI can build its own. This marks a critical shift from 'disposable intelligence' to 'cumulative knowledge'. We're moving away from models that are just brilliant-but-forgetful improvisers, toward systems that can build a lasting, growing foundation of their own expertise. And this isn't just for maths. Imagine an AI that solves thousands of coding problems, building a reusable library of its most effective debugging patterns. Or a legal AI that distills recurring principles of case law into its own handbook. The knowledge base grows and improves with every single use. This changes how I see the future of these systems. We're not just building a better calculator; we're potentially building a self-improving artisan that sharpens its own tools and remembers how to use them. A system that genuinely learns from its experience. It’s a simple idea with profound implications: the most valuable knowledge an AI can have is its own. For anyone who builds or works with AI, the full paper is a must-read. It might just change how you think about AI memory. "Metacognitive Reuse..." (arXiv:2509.13237v1)
Carlos E. Perez tweet mediaCarlos E. Perez tweet media
English
25
24
181
15.1K
Vertically Integrated Consulting retweetledi
Carlos E. Perez
Carlos E. Perez@IntuitMachine·
How is it possible that Claude Sonnet 4.5 is able to work for 30 hours to build an app like Slack?! The system prompts have been leaked and Sonnet 4.5's reveals its secret sauce! Here’s how the prompt enables Sonnet 4.5 to autonomously grind out something Slack/Teams-like—i.e., thousands of lines of code over many hours—without falling apart: It forces “big code” into durable artifacts. Anything over ~20 lines (or 1500 chars) is required to be emitted as an artifact, and only one artifact per response. That gives the model a persistent, append-only surface to build large apps module-by-module without truncation. It specifies an iterative “update vs. rewrite” workflow. The model is told exactly when to apply update (small diffs, ≤20 lines/≤5 locations, up to 4 times) versus rewrite (structural change). That lets it evolve a large codebase safely across many cycles—how you get to 11k lines without losing state. It enforces runtime constraints for long-running UI code. The prompt bans localStorage/sessionStorage, requires in-memory state, and blocks HTML forms in React iframes. That keeps generated chat UIs stable in the sandbox while the model iterates for hours. It nails the dependency & packaging surface. The environment whitelists artifact types and import rules (single-file HTML, React component artifacts, CDNs), so the model can scaffold full features (auth panes, channels list, message composer) without fighting toolchain drift. It provides a research cadence for “product-scale” tasks. The prompt defines a Research mode (≥5 up to ~20 tool calls) with an explicit planning → research loop → answer construction recipe, which supports the many information lookups a Slack-like build needs (protocol choices, UI patterns, presence models). It governs tool use instead of guessing. The “Tool Use Governance” pattern tells the model to investigate with tools rather than assume, reducing dead-ends when selecting frameworks, storage schemas, or deployment options mid-build. It separates “think” and “do” with mode switching. The Deliberation–Action Split prevents half-baked code sprees: plan (deliberation), then execute (action), user-directed. Over long sessions, this avoids trashing large artifacts and keeps scope disciplined. It supports long-horizon autonomy via planning/feedback loops. The prompt’s pattern library cites architectures like Voyager (state + tools → propose code → execute → learn) and Generative Agents (memory → reflect → plan). Those loops explain how an LLM can sustain progress across dozens of hours. It insists on full conversational state in every call. For stateful apps, it requires sending complete history/state each time. That’s crucial for a chat app where UI state, presence, and message history must remain coherent across many generation cycles. It bakes in error rituals and guardrails. The pattern language’s “Error Ritual” and “Ghost Context Removal” encourage cleaning stale context and retrying with distilled lessons—vital when a big build hits integration errors at hour 12. It chooses familiar, well-documented stacks. The guidance warns about the “knowledge horizon” and recommends mainstream frameworks (React, Flask, REST) and clean layering (UI vs. API). That drastically improves throughput and correctness for a Slack-like system. It enables “Claude-in-Claude” style self-orchestration. The artifacts are allowed to call an LLM API from within the running artifact (with fetch), so the model can generate a dev tool that helps itself (e.g., codegen assistant, schema migrator) during the build. It keeps outputs machine-parseable when needed. Strict JSON-only modes (and examples) let downstream scripts/tests wrap the app and auto-verify modules, enabling unattended iteration over many hours. Put together, these prompts/patterns create the conditions for scale: a safe sandbox to emit large artifacts, iterative control over code evolution, disciplined research and tool usage, long-horizon memory/plan loops, and pragmatic tech choices. That’s how an LLM can realistically accrete ~10k+ lines for a Slack-style app over a long session without collapsing under its own complexity.
Carlos E. Perez tweet media
English
65
364
3K
366.8K
Vertically Integrated Consulting
Vertically Integrated Consulting tweet media
Yu Wang@__YuWang__

Introducing The Most Advanced Memory System for LLM Agents MIRIX is by far the most advanced memory system in the world, designed to make AI truly remember, learn, and help you over time. Website: mirix.io Paper: arxiv.org/abs/2507.07957 Github: github.com/Mirix-AI/MIRIX 1️⃣ Performance Benchmarks On ScreenshotVQA: 410% improvements over Gemini with 93.3% storage reduction; 35% improvements over RAG with 99.9% storage reduction! On LOCOMO, we obtained 85.4% accuracy, the current state-of-the-art performance (better than Mem0, MemOS, Zep, ...) 2️⃣ Six Core Memory Types (Inspired by Human Cognition) 1. Core Memory: Stores the AI’s personality and your long-term preferences: tone, settings, identity. 2. Episodic Memory: Your personal “event log.” 3. Semantic Memory: Facts, concepts, and your social graph. 4. Procedural Memory: How-tos and workflows. 5. Resource Memory: Your raw materials, files, notes. 6. Knowledge Vault: Verbatim information, your addresses (less sensitive), ID numbers (highly sensitive), encrypted with multi-level access controls. 3️⃣ Multi-Agent Workflow We have eight different agents: ⚙️ Meta Memory Manager: Analyzes inputs, then routes them to the right memory modules. ⚙️ Memory Managers: Specialized for each memory type, insert, deduplicate, merge, index, embed data. ⚙️ Chat Agent: Interfaces with the user, synthesizes context-aware responses Then during memory update: (1) Checks if similar data already exists; (2) Let meta memory manager analyze and route to different agents, (3) After that, Each module structures, merges, and stores in parallel. (4) Finally, all modules report back when done. 💡 This is why we call it an AI “cognitive prototype.” It’s not just an index. It’s the first step toward a system that remembers, reasons, and grows with you.

ZXX
0
0
0
17