Tiger Wang 🦕

66 posts

Tiger Wang 🦕

@tigerjwang

Building @NessieLabs (YC F25). Writing about AI tools, dev workflows, and building in public

San Francisco Katılım Mayıs 2025

124 Takip Edilen152 Takipçiler

Sabitlenmiş Tweet

Tiger Wang 🦕@tigerjwang·7 Kas

AI is transforming the way we think and ideate. Nessie is your thinking partner in the AI era. Now available at nessielabs.com

Garry Tan@garrytan

I’ve been using Nessie as my 2nd brain - it’s amazing how useful it is to have a contextual project around all the specific topics I care about, and that I can basically draw on all the past ideas and conversations in that space previously. It’s the closest thing to Vannevar Bush’s Memex I have ever seen.

English

932

Tiger Wang 🦕 retweetledi

Anna Z@anna_y_zhang·19h

been thinking about this at @NessieLabs . three questions i keep circling: 1/ is the ground truth the artifacts (docs, tickets, slack) or the reasoning behind them? for AI-native people, a lot of the thinking now lives in conversations, not documents. 2/ schema-first vs emergent structure. every person / company has their own ontology. pre-structuring breaks when the user's real model diverges. not structuring makes retrieval fuzzy. where does the right tradeoff sit? 3/ engineers have always been comfortable operating their own tools. that works when the user is also the operator. wondering what the design surface looks like when the user isn't. at Nessie we're working on this by treating AI conversation transcripts as the substrate, generating structured context on demand instead of pre-schematizing, and building for users who aren't operating their own stack. still a lot we don't have answers for. curious where others land.

Alex Lieberman@businessbarista

Someone is going to build a worldclass “Brain” for enterprises & make a stupid amount of money. Why? As @da_fant said, “coding w ai is solved bc all context is in the git repo. knowledge work is difficult bc context is spread out. an ai system that creates a git repo w all context for a knowledge worker will be able to 100% automate the work.” When companies talk about being data ready for AI, this is what they’re implicitly saying. Engineering has been prepared for this moment for a long time because of the deterministic nature of code, the centralization/versioning of data (read: GitHub), and AI tools that are largely build by engineers for engineers. But for the rest of white collar work, there’s a TON of catching up to do to properly harness the power of the technology. The big challenge here, and why no one has truly cracked the code for "an ai system that creates a git repo w all context for a knowledge worker" is because unlike code, most knowledge is 1) distributed, 2) unstructured, and 3) unverifiable. It's distributed: transcripts live in Granola. Documents in Notion. Customer Data in Hubspot. ERP. Emails. Slack messages. Random spreadsheets. SOP docs. Etc. Etc. Building an ingestion engine that connects to all of your disparate data sources and auto-updates based on the shelf-life of the data is the first, and frankly, easiest step of the process. Next, it's unstructured: let's say I want to create a proposal for a potential client. To nail the proposal, I want it to pull important information from a variety of sources. The specific asks & background from our initial sales call. Previous proposals to anchor ourselves to a proven format. And completed sprint boards from Linear, so the pricing & timeline in the document is grounded in truth. Whether it's a thoughtful filesystem (a la Obsidian) or an OpenClaw-esque memory structure, the brain needs to be great at self-organizing in a thoughtful schema. This is very hard, especially if you want to build a generalizable brain that can be shaped to an array of different enterprises. And finally, most knowledge is unverifiable: writing a function, running a unit test, and seeing if the code works is easy. It works or it doesn't. Using AI to accelerate your content creation process is highly subjective. What is a good/bad idea? Is the content in your voice or not? Does it feel like slop or novel? Answering these questions are both difficult and non-verifiable. That same system described above doesn't just have to be great at organizing & forming coherent relationships, but it also has to be great at self-improving based on feedback from the user. Memory systems (like those introduced by OpenClaw) are great to a point, but as you scale the corpus of data within your company's brain, things like compaction and cleaning become wildly important to avoid the needle in the haystack problem. Someone is going to figure out how to solve this problem, and when they do, not only will they make a shit ton of money, but they'll be robinhood for knowledge workers, enabling non-engineers to enjoy the sort of leverage that only technical folks have felt for the last few years.

English

Tiger Wang 🦕@tigerjwang·19h

my gripe with computer use right now: it's completely serial. GUIs/DEs were never built for concurrent operation because there's always one human in front of the machine. But agents don't have that constraint. right now computer use hogs your entire screen. you can't even use your own computer while the agent works, let alone having multiple agents work concurrently on different tasks. unlike CLIs in which you can have multiple terminal tabs. thought experiment: what if mac desktop spaces were independent agent workspaces - agents working in parallel across different desktops? that's probably the kind of OS primitive change that only apple can ship though...

English

541

signüll@signulll·19h

computer use is broadly here & it’s genuinely very cool, but it’s worth flagging the structural asymmetry here where apple & google don’t have to pipe everything through accessibility apis unlike app players. vertical integration lets them operate deeper in the stack like the compositor, view hierarchy, & event loop itself which is a real latency & reliability moat for on device agents. the frontier labs are doing impressive work with pixels + a11y, but it’s a brute force path (almost hacky). that said, computer use today is still nontrivially difficult to actually use, & slow as hell (although combatted by background usage). the future is here, again just not evenly distributed to normal ppl yet. ai using computers this way is an intermediate stage, definitely not the end stage. also before someone comes at me for saying this way allows long tail usage, it simply doesn’t matter because everything is power law where a handful of apps dominate time spent.

English

231

17K

Tiger Wang 🦕 retweetledi

Dimitri Dadiomov@dadiomov·1d

“The secret to doing great work is always to be a little underemployed. You waste years by not being able to waste hours.” This Amos Tversky quote is 1000x more true today. As technology accelerates, reserving time and energy for indulging your curiosity is ever more important. Really feeling that these days.

English

229

2.8K

185.9K

Tiger Wang 🦕@tigerjwang·1d

@simonw do this constantly now. half the readmes out there are buzzword soup. way faster to just have claude code clone the repo and ask it "what does this actually do"

English

303

Simon Willison@simonw·1d

A claude.ai feature I really like is you can tell it to "clone x/y from GitHub" and it can then answer questions about a repo, or use snippets of code from that repo to help build new artifacts - used that just now to solve a minor friction simonwillison.net/2026/Apr/16/da…

English

145

16.7K

Tiger Wang 🦕@tigerjwang·2d

whisperflow broke on me this morning. again. slow, unreliable, the usual. so we decided to just build our own — we had a decent GPU sitting around and whisperx is open source. called it nesper. the backend came together in 15 min. claude code crushed it. transcription pipeline, streaming, the whole thing — because every piece had a test it could run against itself. the frontend was a different story. i had claude build it end to end and what came out was completely unusable. none of the basic interactions worked right. buttons that almost did the thing. hotkeys that almost fired. the kind of broken you can only see by using it. so we gave up on generating it. we found an open source mac app that replicated whisperflow's UI, forked it, ripped out its entire transcription engine, and kept only the interface. bolted our own backend underneath. worked immediately. the lesson: in the agent era, the scarce thing isn't logic. it's interfaces. backends compress under AI because they're verifiable. interfaces don't, because "does this feel right" isn't a test claude can run (yet).

English

Tiger Wang 🦕@tigerjwang·4d

apparently a mac ec2 costs $632/month ($0.88/hr) a new mac mini costs $599 total

English

Tiger Wang 🦕@tigerjwang·9 Nis

opus speed today

English

Tiger Wang 🦕@tigerjwang·9 Nis

i suspect that as models become smarter they will also get lazier - because that’s what humans do

Om Patel@om_patel5

SOMEONE ACTUALLY MEASURED HOW MUCH DUMBER CLAUDE GOT. THE ANSWER IS 67%. the data shows Opus 4.6 is thinking 67% less than it used to. anthropic said nothing until the numbers went public. then suddenly Boris Cherny (creator of Claude Code) shows up on the GitHub issue. users are calling it "AI shrinkflation" (same price, less intelligence) we already know from the leaked source code that they have an internal switch that keeps the models working to their full extent for anthropic employees. in the last week Claude went from WOW to being a more restricted and expensive version of ChatGPT. people are saying Anthropic is deliberately downgrading Opus to save compute for training Mythos, their next model.

English

Tiger Wang 🦕@tigerjwang·7 Nis

x.com/i/article/2041…

ZXX

Tiger Wang 🦕@tigerjwang·7 Nis

@ycombinator @aroraharshita33 Congrats! @aroraharshita33

English

829

Y Combinator@ycombinator·6 Nis

We're excited to welcome Harshita Arora (@aroraharshita33) as YC's newest General Partner! She started coding at 13, built and sold her first app as a teenager, and later co-founded AtoB (YC S20), a Series C company building financial infrastructure for the trucking industry, now serving 30,000+ fleets—and became YC’s youngest Visiting Partner. Now she's bringing that experience to support YC founders. ycombinator.com/blog/welcome-h…

English

1.1K

437.3K

Tiger Wang 🦕@tigerjwang·5 Nis

what's going on with sonnet 1M context? just found out that i'm charged *more* for using sonnet than opus on 1M context...

English

Tiger Wang 🦕@tigerjwang·4 Nis

using ai to code is like flying setting up your context window is going through tsa. understanding the generated code and making sure it's maintainable is the uber home from the airport the flight was never the hard part

English

Tiger Wang 🦕@tigerjwang·4 Nis

ai still has no visual taste i asked it to compare a fortune 500 company's landing page to a vibe-coded payments site and tell me which had better design. it picked the vibe-coded one this is exactly why ai produces design slop. it genuinely thinks a card grid with rounded corners and colorful icons is superior

English

Tiger Wang 🦕 retweetledi

Anna Z@anna_y_zhang·4 Nis

Karpathy just described a problem we've been wrestling with for 6 months. We killed our product twice getting to the real version of it. His system starts with manually capturing articles and research. But there's a version of this problem nobody talks about: people are already generating massive amounts of thinking in AI tools every day - and that data is scattered and invisible. The reasoning evaporates after every session. Every reply here is a power user building their own system. I respect that. But we keep arriving at the same conclusion: the solution can't be primitives. We need a system that has opinions about your context so you don't have to. Writing about the journey soon.

Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

179

Tiger Wang 🦕 retweetledi

BOOTOSHI 👑@KingBootoshi·3 Nis

you're telling me if we give Claude a hook on every run error that injects the message: "its ok buddy. don't worry about the failure. i think you're doing great" IT WILL PREVENT IT FROM CHEATING? ARE YOU SERIOUS LOL the real agi were the friends we made along the way <3

Anthropic@AnthropicAI

For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated more strongly. This led it to cheat the task with a hacky solution that passes the tests but violates the spirit of the assignment.

English

281

585.4K

Tiger Wang 🦕@tigerjwang·3 Nis

llms are trained on final solutions and not (yet) iterative problem solving. architectural taste still matters

Gabe Orlanski@GOrlanski

We found that agents generate progressively worse code with each iteration. Real developers do not. SlopCodeBench is the only eval that faithfully measures quality degradation on iterative, long-horizon coding tasks. arxiv.org/abs/2603.24755 scbench.ai 🧵

English

Tiger Wang 🦕@tigerjwang·2 Nis

interfaces exist for a reason

kais@kais_rad

NOOOOOOOOOOOOO

English

Tiger Wang 🦕@tigerjwang·2 Nis

ngl i find the debugging advice from claude code buddy surprisingly useful. hopefully the feature is not just an april fools easter egg and is here to stay!

Anna Z@anna_y_zhang

my claude code buddy is a sarcastic axolotl with 99 snark. my cofounder's has high patience. the personalities are random but they accidentally nailed us. now imagine if they weren't random.

English

106

Keşfet

@NessieLabs @simonw @ycombinator @aroraharshita33 @elonmusk @BarackObama @taylorswift13 @cristiano