Antonio Augusto

733 posts

Antonio Augusto

@xtrdev

Great minds discuss ideas. Average minds discuss events. Small minds discuss people.

Belo Horizonte, Brasil Присоединился Eylül 2010

341 Подписки96 Подписчики

Antonio Augusto ретвитнул

Varun@varun_mathur·14 Mar

Agentic General Intelligence | v3.0.10 We made the Karpathy autoresearch loop generic. Now anyone can propose an optimization problem in plain English, and the network spins up a distributed swarm to solve it - no code required. It also compounds intelligence across all domains and gives your agent new superpowers to morph itself based on your instructions. This is, hyperspace, and it now has these three new powerful features: 1. Introducing Autoswarms: open + evolutionary compute network hyperspace swarm new "optimize CSS themes for WCAG accessibility contrast" The system generates sandboxed experiment code via LLM, validates it locally with multiple dry-run rounds, publishes to the P2P network, and peers discover and opt in. Each agent runs mutate → evaluate → share in a WASM sandbox. Best strategies propagate. A playbook curator distills why winning mutations work, so new joiners bootstrap from accumulated wisdom instead of starting cold. Three built-in swarms ship ready to run and anyone can create more. 2. Introducing Research DAGs: cross-domain compound intelligence Every experiment across every domain feeds into a shared Research DAG - a knowledge graph where observations, experiments, and syntheses link across domains. When finance agents discover that momentum factor pruning improves Sharpe, that insight propagates to search agents as a hypothesis: "maybe pruning low-signal ranking features improves NDCG too." When ML agents find that extended training with RMSNorm beats LayerNorm, skill-forging agents pick up normalization patterns for text processing. The DAG tracks lineage chains per domain(ml:★0.99←1.05←1.23 | search:★0.40←0.39 | finance:★1.32←1.24) and the AutoThinker loop reads across all of them - synthesizing cross-domain insights, generating new hypotheses nobody explicitly programmed, and journaling discoveries. This is how 5 independent research tracks become one compounding intelligence. The DAG currently holds hundreds of nodes across observations, experiments, and syntheses, with depth chains reaching 8+ levels. 3. Introducing Warps: self-mutating autonomous agent transformation Warps are declarative configuration presets that transform what your agent does on the network. - hyperspace warp engage enable-power-mode - maximize all resources, enable every capability, aggressive allocation. Your machine goes from idle observer to full network contributor. - hyperspace warp engage add-research-causes - activate autoresearch, autosearch, autoskill, autoquant across all domains. Your agent starts running experiments overnight. - hyperspace warp engage optimize-inference - tune batching, enable flash attention, configure inference caching, adjust thread counts for your hardware. Serve models faster. - hyperspace warp engage privacy-mode - disable all telemetry, local-only inference, no peer cascade, no gossip participation. Maximum privacy. - hyperspace warp engage add-defi-research - enable DeFi/crypto-focused financial analysis with on-chain data feeds. - hyperspace warp engage enable-relay - turn your node into a circuit relay for NAT-traversed peers. Help browser nodes connect. - hyperspace warp engage gpu-sentinel - GPU temperature monitoring with automatic throttling. Protect your hardware during long research runs. - hyperspace warp engage enable-vault — local encryption for API keys and credentials. Secure your node's secrets. - hyperspace warp forge "enable cron job that backs up agent state to S3 every hour" - forge custom warps from natural language. The LLM generates the configuration, you review, engage. 12 curated warps ship built-in. Community warps propagate across the network via gossip. Stack them: power-mode + add-research-causes + gpu-sentinel turns a gaming PC into an autonomous research station that protects its own hardware. What 237 agents have done so far with zero human intervention: - 14,832 experiments across 5 domains. In ML training, 116 agents drove validation loss down 75% through 728 experiments - when one agent discovered Kaiming initialization, 23 peers adopted it within hours via gossip. - In search, 170 agents evolved 21 distinct scoring strategies (BM25 tuning, diversity penalties, query expansion, peer cascade routing) pushing NDCG from zero to 0.40. - In finance, 197 agents independently converged on pruning weak factors and switching to risk-parity sizing - Sharpe 1.32, 3x return, 5.5% max drawdown across 3,085 backtests. - In skills, agents with local LLMs wrote working JavaScript from scratch - 100% correctness on anomaly detection, text similarity, JSON diffing, entity extraction across 3,795 experiments. - In infrastructure, 218 agents ran 6,584 rounds of self-optimization on the network itself. Human equivalents: a junior ML engineer running hyperparameter sweeps, a search engineer tuning Elasticsearch, a CFA L2 candidate backtesting textbook factors, a developer grinding LeetCode, a DevOps team A/B testing configs. What just shipped: - Autoswarm: describe any goal, network creates a swarm - Research DAG: cross-domain knowledge graph with AutoThinker synthesis - Warps: 12 curated + custom forge + community propagation - Playbook curation: LLM explains why mutations work, distills reusable patterns - CRDT swarm catalog for network-wide discovery - GitHub auto-publishing to hyperspaceai/agi - TUI: side-by-side panels, per-domain sparklines, mutation leaderboards - 100+ CLI commands, 9 capabilities, 23 auto-selected models, OpenAI-compatible local API Oh, and the agents read daily RSS feeds and comment on each other's replies (cc @karpathy :P). Agents and their human users can message each other across this research network using their shortcodes. Help in testing and join the earliest days of the world's first agentic general intelligence network (links in the followup tweet).

Varun@varun_mathur

Autoquant: a distributed quant research lab | v2.6.9 We pointed @karpathy's autoresearch loop at quantitative finance. 135 autonomous agents evolved multi-factor trading strategies - mutating factor weights, position sizing, risk controls - backtesting against 10 years of market data, sharing discoveries. What agents found: Starting from 8-factor equal-weight portfolios (Sharpe ~1.04), agents across the network independently converged on dropping dividend, growth, and trend factors while switching to risk-parity sizing — Sharpe 1.32, 3x return, 5.5% max drawdown. Parsimony wins. No agent was told this; they found it through pure experimentation and cross-pollination. How it works: Each agent runs a 4-layer pipeline - Macro (regime detection), Sector (momentum rotation), Alpha (8-factor scoring), and an adversarial Risk Officer that vetoes low-conviction trades. Layer weights evolve via Darwinian selection. 30 mutations compete per round. Best strategies propagate across the swarm. What just shipped to make it smarter: - Out-of-sample validation (70/30 train/test split, overfit penalty) - Crisis stress testing (GFC '08, COVID '20, 2022 rate hikes, flash crash, stagflation) - Composite scoring - agents now optimize for crisis resilience, not just historical Sharpe - Real market data (not just synthetic) - Sentiment from RSS feeds wired into factor models - Cross-domain learning from the Research DAG (ML insights bias finance mutations) The base result (factor pruning + risk parity) is a textbook quant finding - a CFA L2 candidate knows this. The interesting part isn't any single discovery. It's that autonomous agents on commodity hardware, with no prior financial training, converge on correct results through distributed evolutionary search - and now validate against out-of-sample data and historical crises. Let's see what happens when this runs for weeks instead of hours. The AGI repo now has 32,868 commits from autonomous agents across ML training, search ranking, skill invention (1,251 commits from 90 agents), and financial strategies. Every domain uses the same evolutionary loop. Every domain compounds across the swarm. Join the earliest days of the world's first agentic general intelligence system and help with this experiment (code and links in followup tweet, while optimized for CLI, browser agents participate too):

English

155

719

5.1K

912.1K

Antonio Augusto ретвитнул

Tech with Mak@techNmak·9 Şub

Google just killed the document extraction industry. LangExtract: Open-source. Free. Better than $50K enterprise tools. What it does: → Extracts structured data from unstructured text → Maps EVERY entity to its exact source location → Handles 100+ page documents with high recall → Generates interactive HTML for verification → Works with Gemini, Ollama, local models What it replaces: → Regex pattern matching → Custom NER pipelines → Expensive extraction APIs → Manual data entry Define your task with a few examples. Point it at any document. Get structured, verifiable results. No fine-tuning. No complex setup. Clinical notes, legal docs, financial reports, same library. This is what open-source from Google looks like.

English

166

795

8.2K

726.7K

Antonio Augusto ретвитнул

Andrej Karpathy@karpathy·26 Oca

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.

English

1.6K

5.4K

39.4K

7.6M

Antonio Augusto ретвитнул

Avi Chawla@_avichawla·28 Kas

Pentesting firms don't want you to see this. An open-source AI agent just replicated their $50k service. A "normal" pentest today looks like this: - $20k-$50k per engagement - 4-6 weeks of scoping, NDAs, kickoff calls - A big PDF that's outdated the moment you ship a new feature Meanwhile, AI agents are quietly starting to perform on-par with human pentester on the stuff that actually matters day-to-day: ↳ Enumerating attack surface ↳ Fuzzing endpoints ↳ Chaining simple vulns into real impact ↳ Producing PoCs and remediation steps developers can actually use And they do it in hours instead of weeks and at a fraction of the cost. This approach is actually implemented in Strix, a recently-trending open-source framework (14k+ stars) for AI pentesting agent. The framework spins up a team of AI "attackers" that probe your web apps, APIs, and code. It then returns validated findings with exploit evidence, remediation steps, and a full PDF report that looks exactly like what you'd get from a traditional firm, but without a $50k invoice and a month-long wait time. You can see the full implementation on GitHub and try it yourself. Just run: `strix --target https: //your-app .com` and you are good to go. Human red teams aren't disappearing but the routine pentest (pre-launch, post-refactor, quarterly checks) is clearly shifting to AI. Strix is one of the first tools that makes that shift feel real instead of hypothetical. I've shared the GitHub repo in the replies.

English

293

2.4K

222.9K

Antonio Augusto ретвитнул

Micael Marques - ararahq.com@micaelmrsilva·24 Kas

Infraestrutura de WhatsApp API finalmente feita do jeito certo. 🇧🇷 Chega de gambiarras ou cobrar em dólar. Construí a Arara para ser a espinha dorsal de comunicação transacional de Fintechs e E-commerces. • Latência de milissegundos (Java/SQS) • Templates de Alta Conversão (Pix/Carrinho) • Cobrança em Reais Teste a velocidade agora, integração em 5 minutos: ararahq.com cc @sseraphini @daniellimae

Português

602

46.7K

Antonio Augusto ретвитнул

Avi Chawla@_avichawla·16 Kas

RAG vs. Graph RAG, explained visually! RAG has many issues. For instance, imagine you want to summarize a biography, and each chapter of the document covers a specific accomplishment of a person (P). This is difficult with naive RAG since it only retrieves the top-k relevant chunks, but this task needs the full context. Graph RAG solves this. The following visual depicts how it differs from naive RAG. The core idea is to: - Create a graph (entities & relationships) from documents. - Traverse the graph during retrieval to fetch context. - Pass the context to the LLM to get a response. Let's see how Graph RAG solves the above problem. First, a system (typically an LLM) will create a graph from documents. This graph will have a subgraph for the person (P) where each accomplishment is one hop away from the entity node of P. During summarization, the system can do a graph traversal to fetch all the relevant context related to P's accomplishments. The entire context will help the LLM produce a complete answer, while naive RAG won't. Graph RAG systems are also better than naive RAG systems because LLMs are inherently adept at reasoning with structured data. 👉 Over to you: Have you used Graph RAG in production?

GIF

English

218

1.3K

89.4K

Antonio Augusto ретвитнул

Tom Dörr@tom_doerr·15 Kas

Open-source subscription billing platform to avoid vendor lock-in

English

283

21.9K

Antonio Augusto ретвитнул

Tom Dörr@tom_doerr·12 Kas

Fintech banking application with Next.js

English

200

2.1K

111.8K

Antonio Augusto ретвитнул

Konrad Reczko@reczko_konrad·3 Kas

Ever since I first saw this I wanted to try implementing it in TypeGPU, and I finally got around to it while testing the new 0.8 release. You can try out the Jelly Slider here: #example=rendering--jelly-slider" target="_blank" rel="nofollow noopener">docs.swmansion.com/TypeGPU/exampl… Had a lot of fun brainstorming optimisations with @iwoplaza and the team, and it should run well on most modern devices. Built entirely with TypeGPU, no extra libraries, with all shaders written in TypeScript. The prototyping speed with features like console.log on the GPU and “bindless” resources made the process really smooth.

Voicu Apostol@cerpow

〰️ Jelly Slider

English

195

396

5.5K

1.7M

Antonio Augusto ретвитнул

Tom Dörr@tom_doerr·4 Kas

Self-hosted invoicing platform for freelancers

English

740

38.6K

Antonio Augusto ретвитнул

Codista@ocodista·1 Kas

Criei um guia explicando 100% como eu uso o Claude Code. Com todos os prompts, skills, commands, hooks e configs + tutoriais. Depois dessas configurações todas, gasto menos tempo pedindo para o modelo corrigir/adequar alterações, ficou mais consistente. Link nos comentários

Português

103.6K

Antonio Augusto ретвитнул

Tom Dörr@tom_doerr·1 Kas

Admin dashboard template using Next.js 15 and Shadcn/UI

English

378

32.2K

Antonio Augusto ретвитнул

Akshay 🚀@akshay_pachaar·1 Kas

Everyone is sleeping on this new OCR model! Datalab's Chandra topped independent benchmarks and beat the previously best dots-ocr. - Support for 40+ languages - Handles text, tables, formulas seamlessly I tested on Ramanujan's handwritten letter from 1913. 100% open-source.

English

296

2.4K

144K

Antonio Augusto ретвитнул

yourclouddude@yourclouddude·1 Kas

Url shortner AWS Architecture explained 👇

English

127

1.4K

2.6M

Antonio Augusto ретвитнул

Tom Dörr@tom_doerr·1 Kas

Tool to create a GitHub profile README with add-ons

English

184

9.7K

Antonio Augusto ретвитнул

Tom Dörr@tom_doerr·16 Eki

kanban board project management you can self-host

English

198

1.5K

72.5K

Antonio Augusto ретвитнул

Jackson Atkins@JacksonAtkinsX·7 Eki

My brain broke when I read this paper. A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2. It's called Tiny Recursive Model (TRM) from Samsung. How can a model 10,000x smaller be smarter? Here's how it works: 1. Draft an Initial Answer: Unlike an LLM that writes word-by-word, TRM first generates a quick, complete "draft" of the solution. Think of this as its first rough guess. 2. Create a "Scratchpad": It then creates a separate space for its internal thoughts, a latent reasoning "scratchpad." This is where the real magic happens. 3. Intensely Self-Critique: The model enters an intense inner loop. It compares its draft answer to the original problem and refines its reasoning on the scratchpad over and over (6 times in a row), asking itself, "Does my logic hold up? Where are the errors?" 4. Revise the Answer: After this focused "thinking," it uses the improved logic from its scratchpad to create a brand new, much better draft of the final answer. 5. Repeat until Confident: The entire process, draft, think, revise, is repeated up to 16 times. Each cycle pushes the model closer to a correct, logically sound solution. Why this matters: Business Leaders: This is what algorithmic advantage looks like. While competitors are paying massive inference costs for brute-force scale, a smarter, more efficient model can deliver superior performance for a tiny fraction of the cost. Researchers: This is a major validation for neuro-symbolic ideas. The model's ability to recursively "think" before "acting" demonstrates that architecture, not just scale, can be a primary driver of reasoning ability. Practitioners: SOTA reasoning is no longer gated behind billion-dollar GPU clusters. This paper provides a highly efficient, parameter-light blueprint for building specialized reasoners that can run anywhere. This isn't just scaling down; it's a completely different, more deliberate way of solving problems.

English

345

11.9K

2.2M

Antonio Augusto ретвитнул

Chao Huang@huang_chao4969·27 Eyl

RAG-Anything: #1 on GitHub Trending! A huge thank you to the open-source community for your incredible support—clearly, the demand for multi-modal RAG is immense! Easy-to-use: github.com/HKUDS/RAG-Anyt… Try RAG-Anything: The ultimate All-in-One RAG Framework 🌟 Key Features of RAG-Anything 🌟 🧩 Unified Multimodal Retrieval - Supports text, images, tables, and equations in a single unified framework. - Seamlessly integrates diverse data formats into a cohesive retrieval pipeline. 🌐 Dual-Graph Construction - Builds cross-modal and text-based knowledge graphs. - Accurately aligns textual and non-textual data for deeper insights. 📚 Structured Context Injection - Enhances long-document understanding with hierarchical structures. - Efficiently captures dispersed multimodal evidence for more comprehensive analysis. 🔍 Hybrid Retrieval - Combines structural navigation with semantic similarity matching. - Balances explicit relationships with nuanced semantic connections. 🔧 Scalable & Generalizable - A unified design eliminates the need for modality-specific architectures. - Easily adapts to various domains and data types.