Nick

115 posts

Nick

@nick_kango

PM @ Kaggle (Google DeepMind). My own opinions

san francisco Katılım Ekim 2025

81 Takip Edilen137 Takipçiler

Nick@nick_kango·28 Nis

@spencershum Time to benchmaxx

English

Spencer Shumway@spencershum·28 Nis

@nick_kango I nominate you!

English

Nick@nick_kango·28 Nis

BENCHmark idea: every frontier lab sends 1 representative to compete

Kyle Jeong@kylejeong

this is the only bench i can still beat anthropic at incline barbell 245lbs x 5 @ ~165bw

English

387

Nick@nick_kango·27 Nis

@VivianBala @Gavriel_Cohen @karpathy You should come to the SF Bay Area and talk with us Singaporeans in AI :)

English

135

Vivian Balakrishnan@VivianBala·25 Nis

Thanks @Gavriel_Cohen. You’re right. I never used an IDE. Claude Code made all edits. No @karpathy ‘vibe coding’. All I did was ‘tool assembly’ to create a utility that worked in my domain!

Gavriel Cohen@Gavriel_Cohen

Singapore's Foreign Minister published the architecture for his "second brain for a diplomat" yesterday. Architecture diagrams, design rationale, the works. A developer-style writeup of his own system. It runs on a Raspberry Pi. It connects to his WhatsApp and Gmail, transcribes voice notes locally, ingests speeches and articles, and builds up a knowledge graph over time. It answers questions, drafts speeches, condenses information. He says he doesn't dare switch it off. What @VivianBala built is one-of-one. There's no other setup like it. But what he built it from isn't. He composed four open-source pieces: - @NanoClaw_AI , the agent framework: github.com/qwibitai/nanoc… - Mnemon, the persistent memory layer: github.com/mnemon-dev/mne… - OneCLI, the credential proxy that keeps API keys out of the containers: github.com/onecli/onecli - The LLM Wiki pattern by Andrej Karpathy, the synthesis approach: x.com/karpathy/statu… None of them are his. The composition is his. And then he published the composition: gist.github.com/VivianBalakris… He didn't keep it internal as Singapore's edge. He didn't spin it into a product. He didn't gatekeep. He wrote it up and put it on GitHub. There are tens of thousands of doctors, lawyers, researchers, investors, and operators building one-of-one setups for themselves right now. Some simpler than Vivian's, some more elaborate. The impulse will be to sit on it. Treat it as your edge. Think about what product or company you could spin out of it. Resist that impulse. Vivian put it directly: "The diplomat who learns to work with AI will have a meaningful edge. I think that edge is now." The specific thing Vivian composed will be obsolete in months. His real edge isn't the system. It's his ability to build it. Being plugged in, up to speed, able to cut through the noise and connect the right pieces into something that brings real value. Sharing the blueprint doesn't give that away. It amplifies it. You become a beacon. Other people working on the same things find you. They share what they're building, suggest improvements, point at things you didn't know existed. You learn faster. You stay in the center of where things are happening. Publishing isn't giving away your edge. It's doubling down on it.

English

130

1.3K

305.5K

Nick retweetledi

Omar Sanseviero@osanseviero·27 Nis

ParseBench: A benchmark for document parsing agents @llama_index just shipped a benchmark with 2k verified pages for real enterprise documents. Benchmarks are the major underrated component in the ML ecosystem, so I'm excited to see more entities doing open work in the space

English

15.9K

Nick@nick_kango·27 Nis

100%. Open source the implementation & run it with Kaggle. Every lab today is reporting different numbers for the same benchmark + models, and it benefits no one to have this info asymmetry and confusion

Ivan Leo@ivanleomk

@kaggle is how you create open benchmarks :) Make evaluations open!!

English

Nick@nick_kango·27 Nis

And on top of that, run and publish it publicly on a place like Kaggle. That’s how other researchers discover your work, build on it, and hill climb against what you care about

Logan Kilpatrick@OfficialLoganK

Every company building on top of AI should be making their own benchmarks. This is the way if you want model progress to disproportionally benefit your company.

English

9.8K

Nick@nick_kango·24 Nis

@ivanleomk Ty:) My wife is happier than me that I’m done!

English

Ivan Leo@ivanleomk·24 Nis

@nick_kango Damn dude congrats!

English

Nick@nick_kango·24 Nis

After three years of studying part-time for my master's in computer science, I just finished my final exam for my final course. Learning felt vastly different as I progressed through the course with the genAI explosion. Lots to think about on the implications of learning.

English

347

Nick@nick_kango·24 Nis

@TomasMann1878 Headed for a short weekend holiday, but really just looking forward to not studying everyday after an intense day of work 🥲

English

Tom Mann@TomasMann1878·24 Nis

@nick_kango Congrats! Doing anything to celebrate?

English

Nick@nick_kango·23 Nis

@ivanleomk 💯 we’re always looking to partner with anyone who wants to improve transparency in the open source ecosystem

English

Ivan Leo@ivanleomk·23 Nis

Kaggle is super exciting as a place for huge huge public ope model benchmarks

Nick@nick_kango

Announcing the launch of ParseBench on Kaggle. I'm excited for the first of many partnerships together with the great team at @llama_index

English

1.4K

Nick@nick_kango·23 Nis

Gemini, GPT, and Gemma killed it on this real-world benchmark. Multimodality (esp vision) is so important for accomplishing real world tasks. Particularly impressed with the small models punching above their weight!

Kaggle@kaggle

ParseBench is now live on Kaggle Benchmarks! 🚀 Developed by @llama_index, this benchmark evaluates PDF-to-structured-data conversion, featuring ~2k human-verified pages from real enterprise docs across 5 capability dimensions. 🥇Gemini 3 Flash: 79.3% 🥈GPT 5.4: 72.9% 🥉Gemma 4 31B: 66.4%

English

244

Nick@nick_kango·23 Nis

Announcing the launch of ParseBench on Kaggle. I'm excited for the first of many partnerships together with the great team at @llama_index

LlamaIndex 🦙@llama_index

ParseBench is now live on @Kaggle. The first document OCR benchmark built for AI agents — 2,000 enterprise pages, 167K+ test rules, 5 dimensions that actually break downstream agents. Benchmark your parser against 14 methods including GPT-5 Mini, Gemini 3, Textract, and LlamaParse. Read the full story → llamaindex.ai/blog/llamainde…

English

Nick@nick_kango·23 Nis

It’s mind boggling that we’ve condensed a large part of humanity’s collective knowledge into a bunch of numbers. Makes you wonder about the interface theory of perception — that our view is a visual GUI overlaid over a world of numbers

English

Nick@nick_kango·21 Nis

@KSHartnett @cursor_ai @KSHartnett - Late to the game, but do you guys want to work with us to publish it on kaggle.com/benchmarks ?

English

Kevin Hartnett@KSHartnett·12 Mar

Happy to share for the first time data from CursorBench, @cursor_ai's internal benchmark suite. We think CursorBench is better at showing separation between models than public benchmarks, and more aligned with real developer outcomes. The more precise eval results we generate from CursorBench feed back into how we tune our agent harness and train our model, Composer.

Cursor@cursor_ai

We're sharing a new method for scoring models on agentic coding tasks. Here's how models in Cursor compare on intelligence and efficiency:

English

2.6K

Nick@nick_kango·21 Nis

@ivanleomk Get a solid pull up/dip rack + dip belt + weights!

English

Ivan Leo@ivanleomk·20 Nis

About to buy a personal squat rack and home gym in SF. Any tips? I've been eyeing the PR-4000 Rack Builder from REP fitness but does anyone have any suggestions? :)

English

1.8K

Nick@nick_kango·21 Nis

@lukaslevert Email me nicholaskang@kaggle.com including how to get access to your endpoint and your terms of service!

English

Lukas Levert@lukaslevert·21 Nis

@nick_kango what’s the best way to go about getting added to kaggle.com/benchmarks/goo…? We recently published our results with the Task API on Ultra through Ultra 8x. Endpoint: docs.parallel.ai/task-api/guide… Results: parallel.ai/blog/deepsearc…

English

Nick@nick_kango·21 Nis

@ExaAILabs we'd love to add you guys to our official leaderboard that we authored with the benchmark publishing team: kaggle.com/benchmarks/goo… we'll just need access to an endpoint - how best can we collab?

English

209

Exa@ExaAILabs·20 Nis

We're excited to share that by combining frontier LLMs with dozens of calls to Exa Search, we achieve state-of-the-art performance on agentic search evals. This is at 20x faster latencies owing to parallel tool calling targeting different clusters of information, the token efficiency of our returned text, and the sheer speed of our in-house search. Deep Max is coming soon: exa.ai/blog/deep-max

English

283

35.7K

Nick@nick_kango·21 Nis

@xdotli Huge! Congrats Xiangyi. Killin it

Filipino

Xiangyi Li@xdotli·20 Nis

SkillsBench is the fastest benchmark repo that reached 1k GitHub stars. Very proud to achieve this, especially since this is 100% organic We are also cited 3+ times in frontier model cards, 30+ academic citations in within 1.5 months of release. 👇🧵

English

2.6K

Nick@nick_kango·18 Nis

@BoxyInADream @MeganRisdal Glad to hear!! We’re soon going to launch off-platform capabilities, which will allow you to create, run, and build tasks locally on your IDE / coding agents:)

English

Jack Assery@BoxyInADream·18 Nis

@nick_kango @MeganRisdal Awesome! I really enjoyed making my first benchmark with the task generation agent. I ended up making a task that I want to expand into another benchmark itself and test more models 🙌

English

Jack Assery@BoxyInADream·18 Nis

Is it available to test on community benchmarks on Kaggle? I think Grok would do well on mine.

Elon Musk@elonmusk

Lot of catching up to do. xAI is half the age or less of competitors.

English

343

Keşfet

@spencershum @VivianBala @Gavriel_Cohen @karpathy @llama_index @ivanleomk @TomasMann1878 @KSHartnett