Noah

156 posts

Noah

Noah

@noahirzinger

Katılım Ocak 2023
121 Takip Edilen14 Takipçiler
Noah
Noah@noahirzinger·
@TheEthanDing that ic that convinced their excited manager this was a cutting edge idea in 2026 should get a bonus though
English
1
1
6
1.2K
ethan ding 📊
ethan ding 📊@TheEthanDing·
we've literally been doing this for 3 years lmao this is like the first thing you realize when you get into analytics agents
Databricks AI Research@DbrxMosaicAI

New research from Databricks: the context window is the only persistent substrate today's LLM agents have, and it floods fast. A single SQL query can return millions of rows that ride along in every subsequent turn, even when only one cell ever mattered. We hit this constraint every day in the agents we run in production, from Genie to Agent Bricks' Supervisor Agent to KARL. In a new post from the Databricks research team, we introduce MemEx: a programmable Python scratchpad that lets agents transform, slice, and persist tool outputs as typed objects in a live kernel. Same observe-act loop. Different action space. Across nine frontier and open-weight models on two enterprise agentic tasks (OfficeQA Pro and Enterprise Structured Retrieval): • Frontier models (Opus 4.6, Sonnet 4.6, Gemini 3.1 Pro) gain 2 to 5 accuracy points at 25 to 30% lower cost • Qwen 122B and Qwen 397B nearly double accuracy at 40 to 50% lower cost • Four of the five points on the OfficeQA Pro cost-accuracy Pareto frontier are MemEx configurations MemEx extends the code-as-action line (CodeAct, Anthropic Programmatic Tool Calling, Cloudflare Code Mode) with persistent scope across turns, eager spawn_agent for parallel sub-agents that share the parent's namespace, typed submit() for validated returns, and live-object scope injection. Built on aroll, the same Databricks agentic rollouts framework already powering those production systems. MemEx is rolling out across Databricks first-party agents and Agent Bricks soon. If you build on Databricks agents today, you'll be able to try it. Full write-up: databricks.com/blog/memex-pro…

English
10
5
269
123.9K
Noah
Noah@noahirzinger·
@ryanzhuuuu Honestly this is sick, trying it out this weekend.
English
0
0
1
19
Ryan Zhu
Ryan Zhu@ryanzhuuuu·
went to a conference and found out i don’t know any companies there. so i built a sales navigator in iMessage in 10min
Ryan Zhu tweet media
English
1
0
6
618
Noah
Noah@noahirzinger·
@a16z in spite of its utility there's very little incentive on the side of the consumer to actually give up this much personal information for surveillance without any renumeration. this doesn't work in our economy.
English
0
0
2
458
a16z
a16z@a16z·
A tale of two cities with and without Flock over the weekend in Texas: "Austin had Flock and then turned it off.  And as a consequence, they were not able to find these guys." "These guys drove into some adjacent town up against Austin. And Flock was live in that town, and so Flock tagged them the minute they drove into that town, and then they caught the guys." "It's crazy to have the ability to solve crimes and stop crimes and not be able to use it." @pmarca with @joerogan
Garrett Langley@glangley

This is the debate we should have everywhere in America. In the richest communities. In the poorest. No community should live with this kind of senseless violence if we have solutions that stop it. "The certainty of being caught is the #1 deterrent of violent crime." -National Institute of Justice

English
52
66
692
170.9K
Noah
Noah@noahirzinger·
@tobi Yet another signal Canada is not a serious market for startup growth or tech investment this is extremely disappointing.
English
0
0
11
1.1K
Noah
Noah@noahirzinger·
@bcherny Hey @bcherny one quick thing, can we get CC to automatically re-index files in its sessions's working directory? Writing new files to disk doesn't get picked up using the "@<file_name>" reference command until you restart the session. 🙏
English
0
0
0
129
Boris Cherny
Boris Cherny@bcherny·
It's been an amazing start to Code w/ Claude! Love hearing what people are building with Claude Code and getting feedback on what we can do better.
English
149
34
1.9K
111.2K
Noah
Noah@noahirzinger·
@bhalligan Illegibility will be akin to privacy. Flooding the zone with synthetic, junk data is a strategy I’ve been thinking about for a while.
English
0
0
2
321
Miguel de Icaza ᯅ🍉
Miguel de Icaza ᯅ🍉@migueldeicaza·
This is how a distinguished engineer becomes a burger flipper:
Miguel de Icaza ᯅ🍉 tweet media
English
8
20
255
31.1K
Noah
Noah@noahirzinger·
@signulll Without any new breakthroughs in interface design we’re probably going to keep refining until they match and exceed a Star Trek LCARS interface. Voice/intent input and rich Gen UI output stitched together with agents in display units.
English
0
0
1
213
signüll
signüll@signulll·
the future interface is probably three layers: 1. ambient intent capture voice, location, calendar, screen context, messages, habits, biometrics, etc. the system understands what you’re trying to do before you explicitly “open” anything or augments your intent deeply. 2. agentic execution the actual work happens through agents operating software, apis, browsers, documents, email, calendars, workflows, payments, support systems, whatever. most “computer use” becomes machine to machine clerical labor. 3. ephemeral verification ux humans still need to inspect, compare, approve, edit, reject, or enjoy things. that’s where gui survives but as disposable, task specific surfaces generated for the moment.
English
146
235
2.5K
180.5K
Noah
Noah@noahirzinger·
@nghoihin This looks awesome I’m going to check it out thanks @nghoihin!
English
0
0
0
3
Noah
Noah@noahirzinger·
LLMs are incredible at reasoning but have zero memory of *your* world. every new chat starts from zero you repeat clients, projects, decisions, and preferences over and over. Backpack fixes that. it turns your conversations into a real knowledge base using lightweight knowledge engineering. knowledge that actually persists and compounds across sessions: 🧵
English
1
0
1
130
Noah
Noah@noahirzinger·
@jondelarroz Season 1-3 before they re-did it was peak.
English
0
0
0
8
Jon Del Arroz | Pop Culture & Gaming 🎮
ANNOUNCEMENT: I apologize for not posting about Faith of the Heart for the last 3 days. I've been getting sick, and tho that's not a real excuse, sometimes it's hard, because... IT'S BEEN A LONG ROAD...
Jon Del Arroz | Pop Culture & Gaming 🎮 tweet media
English
15
11
262
5.8K
Shann³
Shann³@shannholmberg·
I connected my knowledge base to every project I work on. every agent reads my wiki before doing anything I built a knowledge base in obsidian with 230+ pages. my tweets, bookmarks, articles, ideas, notes, all compiled into structured wiki pages with cross-references the knowledge only worked when I was inside that folder. if I started a new project or opened a different codebase, the agent had no idea what I know or how I think so I set up qmd (by tobi lutke) to index the wiki. hybrid BM25 + vector search with LLM re-ranking, runs locally. then I wrote a global skill that any agent in any project can call now before an agent starts brainstorming, planning, or writing, it searches my entire knowledge base first. voice rules, content performance data, frameworks, past thinking on the topic 1. agent in any project calls /knowledge-shann "topic" 2. qmd hybrid-searches 230+ wiki pages 3. returns relevant concept pages, source summaries, and metrics 4. agent reads brand foundation (banned AI words, visual style, voice rules) 5. agent starts working with that context loaded the same pattern works for company knowledge bases too. /knowledge-espressio for agency knowledge, /knowledge-lunar for client work. different collections, same architecture the whole knowledge layer is just markdown files indexed by qmd. one CLI command, plain text back. token efficient and works with any agent that can run bash
Shann³ tweet media
Shann³@shannholmberg

x.com/i/article/2044…

English
37
78
707
112.5K
Noah
Noah@noahirzinger·
@mil000 Looks great but I don’t think you can beat the muscle memory of tmux + cc. I don’t really see how a user can be more efficient with the same two dimensional interface albeit it’s styled very pretty.
English
1
0
2
1.5K
Noah
Noah@noahirzinger·
@zeeg @mitsuhiko The most exciting thing I’ve been working on is a knowledge engineering framework for LLMs. I think this is going to be very relevant in the near future of AI adoption. Check out backpack! github.com/NoahIrzinger/b…
English
0
0
1
188
David Cramer
David Cramer@zeeg·
What is the most legitimately exciting thing you've seen in the engineering AI space that hasn't yet gone mainstream?
English
57
7
128
38.7K
Noah
Noah@noahirzinger·
@PawelHuryn agreed the utility of raw intelligence definitely needs a sophisticated enough harness to manage interactions with it. balancing determinism with tools/hooks and context engineering to keep it on the right track...
English
0
0
1
602
Paweł Huryn
Paweł Huryn@PawelHuryn·
Karpathy nails the gap. But I'd attribute it differently. Most people use LLMs as chatbots. Claude Code, when used right, is an operating system - CLAUDE.md, hooks, subagents, MCPs, skills, and knowledge that compounds. The "awe gap" isn't model intelligence or what you use it for. It's the system you build around it.
Andrej Karpathy@karpathy

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

English
21
14
174
26.7K
Noah
Noah@noahirzinger·
@forgebitz the dynamic between claude being naive but solving 90% of the problem and codex being overconfident and spewing 50x more code its needs to patch a particular problem is pretty funny right now
English
0
0
0
136
Klaas
Klaas@forgebitz·
been using codex for the past couple of days instead of copus and it's a lot better, it feels smarter, code quality can be a bit meh, but if you steer it + good linting with examples it's pretty solid
English
27
0
89
5.1K
Noah
Noah@noahirzinger·
@clbswrs @karpathy @kepano Oh wow this looks like a great wa to mulch up target data sets so it’s easier to ingest into your knowledge base system. Will be checkthing this out.
English
0
0
0
31
kepano
kepano@kepano·
I wrote about Microsoft's Markitdown back in 2024, but it's grown into a big messy project now :/ It would be more valuable if Microsoft provided high-quality official libraries for converting their proprietary formats to Markdown (.docx, .xlsx, .pptx, OneNote, etc). For now Obsidian's Markdown conversion options are: 1. Obsidian Web Clipper for converting URLs 2. Obsidian Importer for converting from apps like Notion, Apple Notes, Google Keep, Microsoft OneNote, Evernote, etc
Vaishnavi@_vmlops

MICROSOFT BUILT A TOOL THAT CONVERTS LITERALLY ANYTHING INTO CLEAN MARKDOWN FOR YOUR LLM pdfs. word docs. excel. powerpoint. audio. youtube urls one pip install and your AI pipeline stops choking on raw files forever no custom parsers. no broken layouts. no garbled text. just clean, structured markdown your LLM can actually read github.com/microsoft/mark…

English
41
37
1.2K
349.1K
Noah
Noah@noahirzinger·
@karpathy @kepano I’ve been experimenting with the idea of having a layer between markdown and the LLM in a loose-schema JSON structure DB to act as an index back to the raw knowledge base. Have you iterated on anything like this in your tool design? github.com/NoahIrzinger/b…
English
0
0
1
530
Andrej Karpathy
Andrej Karpathy@karpathy·
@kepano I just tried it this morning on the 245-page Mythos pdf and it failed badly and the outputs were all mangled. Converting pdfs is really hard, I think it has to probably be a Skill not a program, for a SOTA LLM for it to work properly.
English
170
37
1.7K
277.1K
Noah
Noah@noahirzinger·
the goal is to bridge the gap between LLM knowledge collection and the human ability to connect the dots between relations, concepts and abstractions over your environment. build this once and it will stay with you forever, check it out: github.com/NoahIrzinger/b…
English
0
0
1
41
Noah
Noah@noahirzinger·
try the local version in 30 seconds: claude mcp add backpack-local -s user -- npx backpack-ontology@latest Then just say: "start a learning graph about [your project] Completely local. Data stored as readable JSON files. then "show me how these concepts connect on the graph" and backpack will open the viewer on your localhost and visualize the connection no accounts. no config files.
English
1
0
1
58