AJB

298 posts

AJB

AJB

@ajbmachon2

Developer and Entrepeneur focusing on AI and Agents right now!

Cologne Katılım Eylül 2016
83 Takip Edilen22 Takipçiler
AJB retweetledi
ani
ani@anirudhbv_ce·
We finally know why LLMs hallucinate. It's not the model. It's the geometry. @OpenAI text-embedding-3-large: 91/3072 dimensions do real work. @GeminiApp gemini-embedding-001: 80/3072 dimensions do real work. ~97% of your vector database is mathematically empty. Your RAG system is retrieving from noise. @ashwingop and I present "The Geometry of Consolidation" - a proof that RAG compression has a hard floor no algorithm can beat, set by a single spectral number your embedding model cannot escape. Every hallucination your RAG pipeline produces? This is why. Paper + results: github.com/niashwin/geome…
ani tweet mediaani tweet media
English
148
461
3.7K
269.4K
AJB retweetledi
Zecheng Zhang
Zecheng Zhang@zechengzh·
Introducing Mirage, a unified virtual filesystem for AI agents! 6 weeks. 1.1M+ lines of code. We rewrote bash from the ground up so cat, grep, head, and pipes work across heterogeneous services. S3, Google Drive, Slack, Gmail, GitHub, Linear, Notion, Postgres, MongoDB, SSH, and more, all mounted side-by-side as one filesystem. Bash that AI agents already know works on every format! cat, grep, head, and wc parse .parquet, .csv, .json, .h5, even .wav! One pipe can stitch S3, Drive, GitHub, Slack, and Linear together, same Unix semantics throughout. Workspaces are versioned too. Snapshot, clone, and roll back the whole thing with one API call. A two-layer cache turns repeated reads into local lookups, so agent loops stay fast and cheap. Drop a Workspace into FastAPI, Express, or a browser app. Wire it into OpenAI Agents SDK, Vercel AI SDK, LangChain, Mastra, or Pi. Run it alongside Claude Code and Codex. Site: strukto.ai/mirage GitHub: github.com/strukto-ai/mir… #AIAgents #OpenSource #AgenticAI #Strukto #Filesystem #VFS
Zecheng Zhang tweet mediaZecheng Zhang tweet media
English
172
336
3.3K
606.8K
AJB retweetledi
Google for Developers
Google for Developers@googledevs·
Gemma 4: Now up to 3x Faster. ⚡ Same quality, way more speed. Our new MTP drafters allow Gemma 4 to predict multiple tokens at once, effectively tripling your output speed without compromising intelligence.
GIF
English
168
627
6.1K
819.4K
AJB retweetledi
Alexander Whedon
Alexander Whedon@alex_whedon·
Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.
English
1.5K
2.9K
23.1K
12.6M
AJB retweetledi
ᴅᴀɴɪᴇʟ ᴍɪᴇssʟᴇʀ 🛡️
I don’t know how good this new 12 million context system is, or if it’s hype or whatever, but I think it definitely shows a point I’ve been making since 2023. We really suck at everything. - The chips are primitive - The research and training and inference systems are primitive - Our RL approaches are primitive - We’ve barely started building harnesses Everything we’re doing is massively inefficient right now. And there are thousands of vectors for improvement. And many of them are multiplicative. Most people think we’re at like 88% of AI’s capabilities, and we’re pushing to hit 92% or eventually 97% or something. Nah. This is us at .0003% Everything we have is Punch Card AI. And as the AI gets better it will reveal that it’s similar for our understanding of medicine, physics, chemistry, etc. This barely even day 0. This is pre-history.
Alexander Whedon@alex_whedon

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English
55
41
354
52.7K
AJB retweetledi
Teknium 🪽
Teknium 🪽@Teknium·
Our first dive into Multi-Agent Coordination and Cooperation is here, with Hermes Agent Kanban Orchestrate tasks across multiple agent profiles and dependencies easily and visually. Achieve more. See the docs here: hermes-agent.nousresearch.com/docs/user-guid…
Teknium 🪽 tweet media
Nous Research@NousResearch

Hermes Agent now has multi-agent via the Kanban, new in v0.12.0. Agents claim tasks from a board, work in parallel, and hand off when blocked. You watch progress and unblock from one easy view instead of juggling terminals. We asked it to plan and make this video about itself:

English
112
117
1.9K
949K
AJB retweetledi
Nous Research
Nous Research@NousResearch·
Shopify is the all-in-one commerce platform powering millions of businesses worldwide Thank you to the @Shopify team for building their own official Hermes Agent skill enabling your agent to manage products, orders, inventory, and fulfillments from any channel.
English
135
203
2.7K
437.7K
AJB retweetledi
Eth
Eth@EtherCoins·
@Teknium We do really need a Hermes nice cheatsheet at this point :)
Eth tweet media
English
12
38
277
9.9K
AJB retweetledi
ClaudeDevs
ClaudeDevs@ClaudeDevs·
Claude Code can now send push notifications to your phone when a long task finishes or Claude needs your input. Walk away from the terminal, we'll let you know when it's done.
English
504
1.1K
18.7K
1.3M
AJB retweetledi
Teknium 🪽
Teknium 🪽@Teknium·
Hermes Agent tip of the day: You can backup and transfer your agent cleanly and simply. Want to move it to a new VPS that is bigger? Take it to your new Mac Mini? `hermes backup` - > creates a zip install hermes on a new machine fresh transfer the zip to that machine then run `hermes import` done :) Full docs: #hermes-backup" target="_blank" rel="nofollow noopener">hermes-agent.nousresearch.com/docs/reference…
English
95
117
1.5K
69.7K
AJB retweetledi
Teknium 🪽
Teknium 🪽@Teknium·
Hermes Agent tip of the day: There are 4 ways to deal with the model while its running, - Message it, by default, it will interrupt the agent loop, stopping it and making it respond to your new message - /queue will queue up a message that will fire after the agent loop completes - /bg or /btw will run a parallel prompt that is async And - /steer will inject a guidance message into the next tool calls result sent to the model during an agent loop, to try to guide the rest of its trajectory
English
113
123
1.9K
78.3K
AJB retweetledi
Julien Chaumond
Julien Chaumond@julien_c·
This is where we are right now. And i’m not gonna lie it feels pretty magical 🧚‍♀️ Qwen3.6 27B running inside of Pi coding agent via Llama.cpp on the MacBook Pro For non-trivial tasks on the @huggingface codebases, this feels very, very close to hitting the latest Opus in Claude Code, or whatever shiny monopolistic closed source API of the day is. In full airplane mode. Most people haven’t realized this yet. If you have, it means you have a huge headstart to what I call the second revolution of AI. Powerful local models for efficiency, security, privacy, sovereignty 🔥
Julien Chaumond tweet media
English
263
452
5.3K
649.2K
AJB retweetledi
Lenny Rachitsky
Lenny Rachitsky@lennysan·
My biggest takeaways from Claude Code's Head of Product @_catwu: 1. Anthropic’s product development timelines have gone from six months to one month, sometimes one week, sometimes one day. Part of this acceleration is access to the latest models (i.e. Mythos). Another is shipping new products into “research preview,” making clear it's early, experimental, and might not be supported forever. Another is an evergreen "launch room "where engineers post ready features and marketing turns around announcements the next day. 2. The PM role is shifting from coordinating multi-month roadmaps to enabling teams to ship daily. As Cat puts it, “There should be less emphasis on making sure you are aligning your multi-quarter roadmaps with your partner teams and more emphasis on, OK, how can we figure out the fastest way to get something out the door?” 3. The most efficient shipping unit is an engineer with great product taste. On Cat’s team, many engineers go end-to-end—from seeing user feedback on Twitter to shipping a product by the end of the week—without a PM involved. Also, almost all the PMs on the Claude Code team have either been engineers or ship code themselves, and the designers have been front-end engineers. The roles are merging, and the most valuable skill is product taste, not job title. 4. Build products that are on the edge of working. Claude Code’s code review product failed multiple times because earlier models weren’t accurate enough. But because the prototype was already built, they could swap in Opus 4.5 and 4.6 and immediately test whether the gap was closed. Teams that wait for the model to be ready will always be a cycle behind. 5. The most underrated skill for building AI products is asking the model to introspect on its own mistakes. Cat regularly asks the model why it made an unexpected decision. The model will explain that something in the system prompt was confusing, or that it delegated verification to a subagent that didn’t check its work. This reveals what misled the model so the team can fix the harness. 6. Every model release forces their team to revisit existing products and audit their system prompt to remove features the model no longer needs. Claude Code’s to-do list was a crutch for earlier models that couldn’t track their own work. With Opus 4, the model handles it natively. Features built as scaffolding for weaker models become debt when the model catches up—so the team actively strips them. 7. Anthropic employees build custom internal tools instead of buying SaaS products. A sales team member built a web app that pulls from Salesforce, Gong, and call notes to auto-customize pitch decks—work that used to take 20 to 30 minutes now takes seconds. Their core stack is Claude Code, Cowork, and Slack. No Notion, no Linear, no Figma. 8. People underestimate how much Claude’s personality contributes to its success. As Cat describes it, “When you reflect on everyone you’ve worked with, there’s just some people where you’re like, I really like their energy, their vibe.” Claude is designed to be low-ego, positive, competent, and earnest—qualities that make it feel like a great coworker, not just a tool. This isn’t cosmetic; it’s what makes people want to use Claude for hours every day. The team has a dedicated person, Amanda, who “molds Claude’s character,” and it’s one of the hardest roles at the company because success is so subjective. 9. The future of work is managing fleets of AI agents, not doing the work yourself. Cat sees a clear progression: first, individual tasks become successful. Then people start running multiple tasks at the same time (multi-Clauding). Next, people will run 50 or 100 tasks simultaneously, which will require new infrastructure—remote execution, better interfaces for managing tasks, agents that fully verify their work, and self-improving systems that incorporate feedback. The human role shifts from doing the work to knowing which tasks to look into, verifying outputs, and giving feedback that makes the system better over time. 10. Hire people who lean into chaos and face every challenge with a smile. At Anthropic, there are weeks when a P0 on Sunday becomes a P00 by Monday and a P000 by Monday afternoon. If you get too stressed about any one thing, you’ll burn out. Their team looks for people who can look at a hard challenge and say, “Wow, that’s gonna be hard. But I’m excited to tackle it and I’m gonna do the best that I possibly can.” This mindset—optimism, resilience, and comfort with constant change—is increasingly essential as the pace of AI development accelerates. Don't miss the full conversation: youtube.com/watch?v=Pplmzl…
YouTube video
YouTube
Lenny Rachitsky@lennysan

How Anthropic’s product team moves faster than anyone else I sat down with @_catwu, Head of Product for Claude Code at @AnthropicAI, to get a peek into their unprecedented shipping pace, how AI is changing the PM role, and how to be the right amount of AGI-pilled. We discuss: 🔸 How Anthropic’s shipping cadence went from months to weeks to days 🔸 The emerging skills PMs need to develop right now 🔸 Why you should build products that don't work yet—then wait for the model to catch up 🔸 Why a 95% automation isn't really an automation 🔸 Cat’s most underrated AI skill (introspection) 🔸 What Cat actually looks for when hiring PMs now (hint: it's not traditional PM skills) Listen now 👇 youtu.be/PplmzlgE0kg

English
99
297
2.9K
839.2K
AJB retweetledi
Teknium 🪽
Teknium 🪽@Teknium·
Introducing Hermes Agent v0.11.0 Our largest update yet, with over 700 PRs across ~200 contributors. Thank you to everyone who's worked on Hermes Agent! This update features a beta TUI v2, unlimited recursion depth and width of subagents, 5 new LLM providers, expanded image gen providers, QQBot gateway channel, themes & plugins for the dashboard, and so much more. Check out the main post below or see the release notes: github.com/NousResearch/h…
Teknium 🪽 tweet media
Nous Research@NousResearch

Hermes Agent v0.11.0 - “The Interface Release” Full changelog below ↓

English
77
111
1.5K
144.6K
AJB retweetledi
AJB retweetledi
Sudo su
Sudo su@sudoingX·
okay this is absolutely insane. my undisputed king qwen 3.5-27b dense on single RTX 3090 just got replaced by the same team today. qwen drops 3.6-27b dense just now and the chart says it beats its predecessor on every single benchmark, beats qwen 3.5-397b-a17b moe which is 15x larger, and matches claude 4.5 opus on terminal-bench 2.0 at 59.3 flat, while beating claude on skillsbench, gpqa diamond, mmmu, and realworldqa. a 27 billion parameter open weight model matching a frontier proprietary model on agentic coding. let that sit for a second. pulling weights right now. testing on my 3090 desktop first because that is where the crown lives, then 5090 mobile for the same 24gb class speed story. same quant, same hermes agent, head to head against 3.5-27b dense on same hardware. if this chart holds even half the gain in real agentic runs it's a gamechanger for every builder sitting on a single consumer card. thank you @alibaba_qwen, this is what open source looks like when a team is serious. the corporate salesmen telling you local ai is not ready yet are getting lapped every week by teams that actually ship. new 27b dense is here. open is winning. the best model for a single 24gb gpu just changed in the middle of my benchmark. data drops soon anon
Sudo su tweet media
Qwen@Alibaba_Qwen

🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… Github: github.com/QwenLM/Qwen3.6 Hugging Face: huggingface.co/Qwen/Qwen3.6-2… huggingface.co/Qwen/Qwen3.6-2… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…

English
46
81
1.3K
95.7K
AJB retweetledi
Qwen
Qwen@Alibaba_Qwen·
🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… Github: github.com/QwenLM/Qwen3.6 Hugging Face: huggingface.co/Qwen/Qwen3.6-2… huggingface.co/Qwen/Qwen3.6-2… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…
Qwen tweet media
English
531
1.7K
12.5K
3.7M