Nick

115 posts

Nick

Nick

@nick_kango

PM @ Kaggle (Google DeepMind). My own opinions

san francisco Katılım Ekim 2025
81 Takip Edilen137 Takipçiler
Vivian Balakrishnan
Vivian Balakrishnan@VivianBala·
Thanks @Gavriel_Cohen. You’re right. I never used an IDE. Claude Code made all edits. No @karpathy ‘vibe coding’. All I did was ‘tool assembly’ to create a utility that worked in my domain!
Gavriel Cohen@Gavriel_Cohen

Singapore's Foreign Minister published the architecture for his "second brain for a diplomat" yesterday. Architecture diagrams, design rationale, the works. A developer-style writeup of his own system. It runs on a Raspberry Pi. It connects to his WhatsApp and Gmail, transcribes voice notes locally, ingests speeches and articles, and builds up a knowledge graph over time. It answers questions, drafts speeches, condenses information. He says he doesn't dare switch it off. What @VivianBala built is one-of-one. There's no other setup like it. But what he built it from isn't. He composed four open-source pieces: - @NanoClaw_AI , the agent framework: github.com/qwibitai/nanoc… - Mnemon, the persistent memory layer: github.com/mnemon-dev/mne… - OneCLI, the credential proxy that keeps API keys out of the containers: github.com/onecli/onecli - The LLM Wiki pattern by Andrej Karpathy, the synthesis approach: x.com/karpathy/statu… None of them are his. The composition is his. And then he published the composition: gist.github.com/VivianBalakris… He didn't keep it internal as Singapore's edge. He didn't spin it into a product. He didn't gatekeep. He wrote it up and put it on GitHub. There are tens of thousands of doctors, lawyers, researchers, investors, and operators building one-of-one setups for themselves right now. Some simpler than Vivian's, some more elaborate. The impulse will be to sit on it. Treat it as your edge. Think about what product or company you could spin out of it. Resist that impulse. Vivian put it directly: "The diplomat who learns to work with AI will have a meaningful edge. I think that edge is now." The specific thing Vivian composed will be obsolete in months. His real edge isn't the system. It's his ability to build it. Being plugged in, up to speed, able to cut through the noise and connect the right pieces into something that brings real value. Sharing the blueprint doesn't give that away. It amplifies it. You become a beacon. Other people working on the same things find you. They share what they're building, suggest improvements, point at things you didn't know existed. You learn faster. You stay in the center of where things are happening. Publishing isn't giving away your edge. It's doubling down on it.

English
73
130
1.3K
305.5K
Nick retweetledi
Omar Sanseviero
Omar Sanseviero@osanseviero·
ParseBench: A benchmark for document parsing agents @llama_index just shipped a benchmark with 2k verified pages for real enterprise documents. Benchmarks are the major underrated component in the ML ecosystem, so I'm excited to see more entities doing open work in the space
Omar Sanseviero tweet media
English
4
15
81
15.9K
Nick
Nick@nick_kango·
100%. Open source the implementation & run it with Kaggle. Every lab today is reporting different numbers for the same benchmark + models, and it benefits no one to have this info asymmetry and confusion
Ivan Leo@ivanleomk

@kaggle is how you create open benchmarks :) Make evaluations open!!

English
0
0
2
70
Nick
Nick@nick_kango·
@ivanleomk Ty:) My wife is happier than me that I’m done!
English
0
0
0
12
Nick
Nick@nick_kango·
After three years of studying part-time for my master's in computer science, I just finished my final exam for my final course. Learning felt vastly different as I progressed through the course with the genAI explosion. Lots to think about on the implications of learning.
English
3
0
8
347
Nick
Nick@nick_kango·
@TomasMann1878 Headed for a short weekend holiday, but really just looking forward to not studying everyday after an intense day of work 🥲
English
1
0
1
24
Nick
Nick@nick_kango·
@ivanleomk 💯 we’re always looking to partner with anyone who wants to improve transparency in the open source ecosystem
English
1
0
0
22
Nick
Nick@nick_kango·
Gemini, GPT, and Gemma killed it on this real-world benchmark. Multimodality (esp vision) is so important for accomplishing real world tasks. Particularly impressed with the small models punching above their weight!
Kaggle@kaggle

ParseBench is now live on Kaggle Benchmarks! 🚀 Developed by @llama_index, this benchmark evaluates PDF-to-structured-data conversion, featuring ~2k human-verified pages from real enterprise docs across 5 capability dimensions. 🥇Gemini 3 Flash: 79.3% 🥈GPT 5.4: 72.9% 🥉Gemma 4 31B: 66.4%

English
1
0
2
244
Nick
Nick@nick_kango·
Announcing the launch of ParseBench on Kaggle. I'm excited for the first of many partnerships together with the great team at @llama_index
LlamaIndex 🦙@llama_index

ParseBench is now live on @Kaggle. The first document OCR benchmark built for AI agents — 2,000 enterprise pages, 167K+ test rules, 5 dimensions that actually break downstream agents. Benchmark your parser against 14 methods including GPT-5 Mini, Gemini 3, Textract, and LlamaParse. Read the full story → llamaindex.ai/blog/llamainde…

English
1
2
10
3K
Nick
Nick@nick_kango·
It’s mind boggling that we’ve condensed a large part of humanity’s collective knowledge into a bunch of numbers. Makes you wonder about the interface theory of perception — that our view is a visual GUI overlaid over a world of numbers
English
0
0
4
77
Kevin Hartnett
Kevin Hartnett@KSHartnett·
Happy to share for the first time data from CursorBench, @cursor_ai's internal benchmark suite. We think CursorBench is better at showing separation between models than public benchmarks, and more aligned with real developer outcomes. The more precise eval results we generate from CursorBench feed back into how we tune our agent harness and train our model, Composer.
Cursor@cursor_ai

We're sharing a new method for scoring models on agentic coding tasks. Here's how models in Cursor compare on intelligence and efficiency:

English
3
0
26
2.6K
Nick
Nick@nick_kango·
@ivanleomk Get a solid pull up/dip rack + dip belt + weights!
English
0
0
0
43
Ivan Leo
Ivan Leo@ivanleomk·
About to buy a personal squat rack and home gym in SF. Any tips? I've been eyeing the PR-4000 Rack Builder from REP fitness but does anyone have any suggestions? :)
English
11
0
11
1.8K
Nick
Nick@nick_kango·
@lukaslevert Email me nicholaskang@kaggle.com including how to get access to your endpoint and your terms of service!
English
0
0
1
33
Nick
Nick@nick_kango·
@ExaAILabs we'd love to add you guys to our official leaderboard that we authored with the benchmark publishing team: kaggle.com/benchmarks/goo… we'll just need access to an endpoint - how best can we collab?
English
0
0
8
209
Exa
Exa@ExaAILabs·
We're excited to share that by combining frontier LLMs with dozens of calls to Exa Search, we achieve state-of-the-art performance on agentic search evals. This is at 20x faster latencies owing to parallel tool calling targeting different clusters of information, the token efficiency of our returned text, and the sheer speed of our in-house search. Deep Max is coming soon: exa.ai/blog/deep-max
Exa tweet media
English
12
23
283
35.7K
Nick
Nick@nick_kango·
@xdotli Huge! Congrats Xiangyi. Killin it
Filipino
1
0
2
37
Xiangyi Li
Xiangyi Li@xdotli·
SkillsBench is the fastest benchmark repo that reached 1k GitHub stars. Very proud to achieve this, especially since this is 100% organic We are also cited 3+ times in frontier model cards, 30+ academic citations in within 1.5 months of release. 👇🧵
Xiangyi Li tweet media
English
5
7
41
2.6K
Nick
Nick@nick_kango·
@BoxyInADream @MeganRisdal Glad to hear!! We’re soon going to launch off-platform capabilities, which will allow you to create, run, and build tasks locally on your IDE / coding agents:)
English
1
0
2
26
Jack Assery
Jack Assery@BoxyInADream·
@nick_kango @MeganRisdal Awesome! I really enjoyed making my first benchmark with the task generation agent. I ended up making a task that I want to expand into another benchmark itself and test more models 🙌
English
1
0
1
18