Nick
115 posts

Nick
@nick_kango
PM @ Kaggle (Google DeepMind). My own opinions

this is the only bench i can still beat anthropic at incline barbell 245lbs x 5 @ ~165bw


Singapore's Foreign Minister published the architecture for his "second brain for a diplomat" yesterday. Architecture diagrams, design rationale, the works. A developer-style writeup of his own system. It runs on a Raspberry Pi. It connects to his WhatsApp and Gmail, transcribes voice notes locally, ingests speeches and articles, and builds up a knowledge graph over time. It answers questions, drafts speeches, condenses information. He says he doesn't dare switch it off. What @VivianBala built is one-of-one. There's no other setup like it. But what he built it from isn't. He composed four open-source pieces: - @NanoClaw_AI , the agent framework: github.com/qwibitai/nanoc… - Mnemon, the persistent memory layer: github.com/mnemon-dev/mne… - OneCLI, the credential proxy that keeps API keys out of the containers: github.com/onecli/onecli - The LLM Wiki pattern by Andrej Karpathy, the synthesis approach: x.com/karpathy/statu… None of them are his. The composition is his. And then he published the composition: gist.github.com/VivianBalakris… He didn't keep it internal as Singapore's edge. He didn't spin it into a product. He didn't gatekeep. He wrote it up and put it on GitHub. There are tens of thousands of doctors, lawyers, researchers, investors, and operators building one-of-one setups for themselves right now. Some simpler than Vivian's, some more elaborate. The impulse will be to sit on it. Treat it as your edge. Think about what product or company you could spin out of it. Resist that impulse. Vivian put it directly: "The diplomat who learns to work with AI will have a meaningful edge. I think that edge is now." The specific thing Vivian composed will be obsolete in months. His real edge isn't the system. It's his ability to build it. Being plugged in, up to speed, able to cut through the noise and connect the right pieces into something that brings real value. Sharing the blueprint doesn't give that away. It amplifies it. You become a beacon. Other people working on the same things find you. They share what they're building, suggest improvements, point at things you didn't know existed. You learn faster. You stay in the center of where things are happening. Publishing isn't giving away your edge. It's doubling down on it.



@kaggle is how you create open benchmarks :) Make evaluations open!!

Every company building on top of AI should be making their own benchmarks. This is the way if you want model progress to disproportionally benefit your company.



Announcing the launch of ParseBench on Kaggle. I'm excited for the first of many partnerships together with the great team at @llama_index

ParseBench is now live on Kaggle Benchmarks! 🚀 Developed by @llama_index, this benchmark evaluates PDF-to-structured-data conversion, featuring ~2k human-verified pages from real enterprise docs across 5 capability dimensions. 🥇Gemini 3 Flash: 79.3% 🥈GPT 5.4: 72.9% 🥉Gemma 4 31B: 66.4%

ParseBench is now live on @Kaggle. The first document OCR benchmark built for AI agents — 2,000 enterprise pages, 167K+ test rules, 5 dimensions that actually break downstream agents. Benchmark your parser against 14 methods including GPT-5 Mini, Gemini 3, Textract, and LlamaParse. Read the full story → llamaindex.ai/blog/llamainde…


We're sharing a new method for scoring models on agentic coding tasks. Here's how models in Cursor compare on intelligence and efficiency:








Lot of catching up to do. xAI is half the age or less of competitors.



