Noah

156 posts

Noah

@noahirzinger

Katılım Ocak 2023

121 Takip Edilen14 Takipçiler

Noah@noahirzinger·2d

@TheEthanDing that ic that convinced their excited manager this was a cutting edge idea in 2026 should get a bonus though

English

1.2K

ethan ding 📊@TheEthanDing·2d

we've literally been doing this for 3 years lmao this is like the first thing you realize when you get into analytics agents

Databricks AI Research@DbrxMosaicAI

New research from Databricks: the context window is the only persistent substrate today's LLM agents have, and it floods fast. A single SQL query can return millions of rows that ride along in every subsequent turn, even when only one cell ever mattered. We hit this constraint every day in the agents we run in production, from Genie to Agent Bricks' Supervisor Agent to KARL. In a new post from the Databricks research team, we introduce MemEx: a programmable Python scratchpad that lets agents transform, slice, and persist tool outputs as typed objects in a live kernel. Same observe-act loop. Different action space. Across nine frontier and open-weight models on two enterprise agentic tasks (OfficeQA Pro and Enterprise Structured Retrieval): • Frontier models (Opus 4.6, Sonnet 4.6, Gemini 3.1 Pro) gain 2 to 5 accuracy points at 25 to 30% lower cost • Qwen 122B and Qwen 397B nearly double accuracy at 40 to 50% lower cost • Four of the five points on the OfficeQA Pro cost-accuracy Pareto frontier are MemEx configurations MemEx extends the code-as-action line (CodeAct, Anthropic Programmatic Tool Calling, Cloudflare Code Mode) with persistent scope across turns, eager spawn_agent for parallel sub-agents that share the parent's namespace, typed submit() for validated returns, and live-object scope injection. Built on aroll, the same Databricks agentic rollouts framework already powering those production systems. MemEx is rolling out across Databricks first-party agents and Agent Bricks soon. If you build on Databricks agents today, you'll be able to try it. Full write-up: databricks.com/blog/memex-pro…

English

269

123.9K

Noah@noahirzinger·2d

@ryanzhuuuu Honestly this is sick, trying it out this weekend.

English

Ryan Zhu@ryanzhuuuu·4d

went to a conference and found out i don’t know any companies there. so i built a sales navigator in iMessage in 10min

English

618

Noah@noahirzinger·3d

@a16z in spite of its utility there's very little incentive on the side of the consumer to actually give up this much personal information for surveillance without any renumeration. this doesn't work in our economy.

English

458

a16z@a16z·3d

A tale of two cities with and without Flock over the weekend in Texas: "Austin had Flock and then turned it off. And as a consequence, they were not able to find these guys." "These guys drove into some adjacent town up against Austin. And Flock was live in that town, and so Flock tagged them the minute they drove into that town, and then they caught the guys." "It's crazy to have the ability to solve crimes and stop crimes and not be able to use it." @pmarca with @joerogan

Garrett Langley@glangley

This is the debate we should have everywhere in America. In the richest communities. In the poorest. No community should live with this kind of senseless violence if we have solutions that stop it. "The certainty of being caught is the #1 deterrent of violent crime." -National Institute of Justice

English

692

170.9K

Noah@noahirzinger·15 May

@tobi Yet another signal Canada is not a serious market for startup growth or tech investment this is extremely disappointing.

English

1.1K

tobi lutke@tobi·15 May

C-22 is looking like a huge mistake. It worries me a great deal. There is so much nonsense in there that It may well end up dealing a death blow to Canadian tech viability.

Windscribe@windscribecom

We won't be far behind if C-22 passes. In its current state, VPNs would almost certainly require us to log identifying user data. Signal isn't headquartered in Canada so they can just shut off Canadian servers, but our HQ is. We pay an ungodly amount of taxes to this corrupt government, and in return they want to destroy the entire essence of our service to basically spy on its own citizens. Not happening. We'll move HQ and take our taxes elsewhere.

English

173

831

4.6K

389.2K

Noah@noahirzinger·7 May

@bcherny Hey @bcherny one quick thing, can we get CC to automatically re-index files in its sessions's working directory? Writing new files to disk doesn't get picked up using the "@<file_name>" reference command until you restart the session. 🙏

English

129

Boris Cherny@bcherny·6 May

It's been an amazing start to Code w/ Claude! Love hearing what people are building with Claude Code and getting feedback on what we can do better.

English

149

1.9K

111.2K

Noah@noahirzinger·5 May

@bhalligan Illegibility will be akin to privacy. Flooding the zone with synthetic, junk data is a strategy I’ve been thinking about for a while.

English

321

Brian Halligan@bhalligan·4 May

x.com/i/article/2051…

ZXX

367

212.2K

Noah@noahirzinger·4 May

@migueldeicaza @mitsuhiko I prefer “Make it so.”

English

424

Miguel de Icaza ᯅ🍉@migueldeicaza·3 May

This is how a distinguished engineer becomes a burger flipper:

English

255

31.1K

Noah@noahirzinger·27 Nis

@signulll Without any new breakthroughs in interface design we’re probably going to keep refining until they match and exceed a Star Trek LCARS interface. Voice/intent input and rich Gen UI output stitched together with agents in display units.

English

213

signüll@signulll·27 Nis

the future interface is probably three layers: 1. ambient intent capture voice, location, calendar, screen context, messages, habits, biometrics, etc. the system understands what you’re trying to do before you explicitly “open” anything or augments your intent deeply. 2. agentic execution the actual work happens through agents operating software, apis, browsers, documents, email, calendars, workflows, payments, support systems, whatever. most “computer use” becomes machine to machine clerical labor. 3. ephemeral verification ux humans still need to inspect, compare, approve, edit, reject, or enjoy things. that’s where gui survives but as disposable, task specific surfaces generated for the moment.

English

146

235

2.5K

180.5K

Noah@noahirzinger·27 Nis

@nghoihin This looks awesome I’m going to check it out thanks @nghoihin!

English

Jack H. Ng@nghoihin·27 Nis

@noahirzinger we shipped github.com/Beever-AI/beev… around this same bet last week — typed Neo4j + MCP, Apache 2.0.

English

Noah@noahirzinger·9 Nis

LLMs are incredible at reasoning but have zero memory of *your* world. every new chat starts from zero you repeat clients, projects, decisions, and preferences over and over. Backpack fixes that. it turns your conversations into a real knowledge base using lightweight knowledge engineering. knowledge that actually persists and compounds across sessions: 🧵

English

130

Noah@noahirzinger·26 Nis

x.com/i/article/2048…

ZXX

337

850.1K

Noah@noahirzinger·17 Nis

@jondelarroz Season 1-3 before they re-did it was peak.

English

Jon Del Arroz | Pop Culture & Gaming 🎮@jondelarroz·15 Nis

ANNOUNCEMENT: I apologize for not posting about Faith of the Heart for the last 3 days. I've been getting sick, and tho that's not a real excuse, sometimes it's hard, because... IT'S BEEN A LONG ROAD...

Jon Del Arroz | Pop Culture & Gaming 🎮 tweet media

English

262

5.8K

Noah@noahirzinger·16 Nis

@shannholmberg If you’re ever looking to experiment with a knowledge engine system that is based on indexing relationships give backpack a shot! github.com/NoahIrzinger/b…

English

127

Shann³@shannholmberg·15 Nis

I connected my knowledge base to every project I work on. every agent reads my wiki before doing anything I built a knowledge base in obsidian with 230+ pages. my tweets, bookmarks, articles, ideas, notes, all compiled into structured wiki pages with cross-references the knowledge only worked when I was inside that folder. if I started a new project or opened a different codebase, the agent had no idea what I know or how I think so I set up qmd (by tobi lutke) to index the wiki. hybrid BM25 + vector search with LLM re-ranking, runs locally. then I wrote a global skill that any agent in any project can call now before an agent starts brainstorming, planning, or writing, it searches my entire knowledge base first. voice rules, content performance data, frameworks, past thinking on the topic 1. agent in any project calls /knowledge-shann "topic" 2. qmd hybrid-searches 230+ wiki pages 3. returns relevant concept pages, source summaries, and metrics 4. agent reads brand foundation (banned AI words, visual style, voice rules) 5. agent starts working with that context loaded the same pattern works for company knowledge bases too. /knowledge-espressio for agency knowledge, /knowledge-lunar for client work. different collections, same architecture the whole knowledge layer is just markdown files indexed by qmd. one CLI command, plain text back. token efficient and works with any agent that can run bash

Shann³@shannholmberg

x.com/i/article/2044…

English

707

112.5K

Noah@noahirzinger·15 Nis

@mil000 Looks great but I don’t think you can beat the muscle memory of tmux + cc. I don’t really see how a user can be more efficient with the same two dimensional interface albeit it’s styled very pretty.

English

1.5K

Milo Smith@mil000·14 Nis

Why this took so long is beyond me. TUIs were never the future

Claude@claudeai

We've redesigned Claude Code on desktop. You can now run multiple Claude sessions side by side from one window, with a new sidebar to manage them all.

English

1.8K

256.2K

Noah@noahirzinger·15 Nis

@zeeg @mitsuhiko The most exciting thing I’ve been working on is a knowledge engineering framework for LLMs. I think this is going to be very relevant in the near future of AI adoption. Check out backpack! github.com/NoahIrzinger/b…

English

188

David Cramer@zeeg·14 Nis

What is the most legitimately exciting thing you've seen in the engineering AI space that hasn't yet gone mainstream?

English

128

38.7K

Noah@noahirzinger·10 Nis

@PawelHuryn agreed the utility of raw intelligence definitely needs a sophisticated enough harness to manage interactions with it. balancing determinism with tools/hooks and context engineering to keep it on the right track...

English

602

Paweł Huryn@PawelHuryn·10 Nis

Karpathy nails the gap. But I'd attribute it differently. Most people use LLMs as chatbots. Claude Code, when used right, is an operating system - CLAUDE.md, hooks, subagents, MCPs, skills, and knowledge that compounds. The "awe gap" isn't model intelligence or what you use it for. It's the system you build around it.

Andrej Karpathy@karpathy

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

English

174

26.7K

Noah@noahirzinger·10 Nis

x.com/i/article/2042…

ZXX

376

Noah@noahirzinger·9 Nis

@forgebitz the dynamic between claude being naive but solving 90% of the problem and codex being overconfident and spewing 50x more code its needs to patch a particular problem is pretty funny right now

English

136

Klaas@forgebitz·9 Nis

been using codex for the past couple of days instead of copus and it's a lot better, it feels smarter, code quality can be a bit meh, but if you steer it + good linting with examples it's pretty solid

English

5.1K

Noah@noahirzinger·9 Nis

@clbswrs @karpathy @kepano Oh wow this looks like a great wa to mulch up target data sets so it’s easier to ingest into your knowledge base system. Will be checkthing this out.

English

caleb 🐮@clbswrs·9 Nis

@karpathy @kepano I’ve been happy with marker-pdf, though I am fairly insensitive to layout. Then agent can put the textual copy as attachment back on the Zotero record. github.com/datalab-to/mar…

English

1.1K

kepano@kepano·9 Nis

I wrote about Microsoft's Markitdown back in 2024, but it's grown into a big messy project now :/ It would be more valuable if Microsoft provided high-quality official libraries for converting their proprietary formats to Markdown (.docx, .xlsx, .pptx, OneNote, etc). For now Obsidian's Markdown conversion options are: 1. Obsidian Web Clipper for converting URLs 2. Obsidian Importer for converting from apps like Notion, Apple Notes, Google Keep, Microsoft OneNote, Evernote, etc

Vaishnavi@_vmlops

MICROSOFT BUILT A TOOL THAT CONVERTS LITERALLY ANYTHING INTO CLEAN MARKDOWN FOR YOUR LLM pdfs. word docs. excel. powerpoint. audio. youtube urls one pip install and your AI pipeline stops choking on raw files forever no custom parsers. no broken layouts. no garbled text. just clean, structured markdown your LLM can actually read github.com/microsoft/mark…

English

1.2K

349.1K

Noah@noahirzinger·9 Nis

@karpathy @kepano I’ve been experimenting with the idea of having a layer between markdown and the LLM in a loose-schema JSON structure DB to act as an index back to the raw knowledge base. Have you iterated on anything like this in your tool design? github.com/NoahIrzinger/b…

English

530

Andrej Karpathy@karpathy·9 Nis

@kepano I just tried it this morning on the 245-page Mythos pdf and it failed badly and the outputs were all mangled. Converting pdfs is really hard, I think it has to probably be a Skill not a program, for a SOTA LLM for it to work properly.

English

170

1.7K

277.1K

Noah@noahirzinger·9 Nis

the goal is to bridge the gap between LLM knowledge collection and the human ability to connect the dots between relations, concepts and abstractions over your environment. build this once and it will stay with you forever, check it out: github.com/NoahIrzinger/b…

English

Noah@noahirzinger·9 Nis

try the local version in 30 seconds: claude mcp add backpack-local -s user -- npx backpack-ontology@latest Then just say: "start a learning graph about [your project] Completely local. Data stored as readable JSON files. then "show me how these concepts connect on the graph" and backpack will open the viewer on your localhost and visualize the connection no accounts. no config files.

English

Keşfet

@TheEthanDing @ryanzhuuuu @a16z @pmarca @joerogan @tobi @bcherny @bhalligan