Collov Labs

1

69

Collov Labs@CollovLabs·24 Nis

We are happy to announce $23M Series A. Collov Labs is betting on Visual AI as the next interface — for the 6.8 billion people who've never used AI because text was never the answer. The lab came out of what we saw building our products. People who struggled to write prompts would point their phone at a room and just get it. Real estate agents. Small business owners. First-time AI users. Visual removed friction that text never could. The milestones so far: → 1M+ users worldwide → ~1,000 five-star App Store ratings → Covered the same day by @FortuneMagazine , @axios , @theinformation , @pulse2news, and @UniteAi Led by @MindWorksCap , Taihill Ventures, Brightway Future Capital, and others. The next interface will be the camera. This is just the beginning. — Collov Labs, San Francisco #fundingannouncement #silliconvalley #collovlabs

English

2

6

391

Collov Labs@CollovLabs·20 Şub

Last night felt special. 🧧✨ @CollovLabs hosted another intimate founder & exec gathering — co-hosted by our Head of Growth @laura_llin and Gavin (@N) Llama Venture In the middle of AI agent chaos, model launches, and nonstop hype… we chose to sit down and talk long-term. Thank you @cmigos , Jinjin, @ZiqiPeng Kelly, Qinming, @4lili_lili4 Chris, @jayfunggy for making the room sharp, honest, and generous. 🤝 We talked about: • Real AI agent progress (not Twitter demos 🚀) • What OpenClaw changes in the ecosystem • How founders stay grounded while everything accelerates ⚡ #ChineseNewYear is about reunion and momentum reset. 🏮 AI is competitive. Brutal sometimes. But nights like this remind me — we’re building an ecosystem, not just products. Wishing everyone clarity, courage, and compounding breakthroughs this year. 🥂

English

New post: nanochat miniseries v1 The correct way to think about LLMs is that you are not optimizing for a single specific model but for a family models controlled by a single dial (the compute you wish to spend) to achieve monotonically better results. This allows you to do careful science of scaling laws and ultimately this is what gives you the confidence that when you pay for "the big run", the extrapolation will work and your money will be well spent. For the first public release of nanochat my focus was on end-to-end pipeline that runs the whole LLM pipeline with all of its stages. Now after YOLOing a few runs earlier, I'm coming back around to flesh out some of the parts that I sped through, starting of course with pretraining, which is both computationally heavy and critical as the foundation of intelligence and knowledge in these models. After locally tuning some of the hyperparameters, I swept out a number of models fixing the FLOPs budget. (For every FLOPs target you can train a small model a long time, or a big model for a short time.) It turns out that nanochat obeys very nice scaling laws, basically reproducing the Chinchilla paper plots: Which is just a baby version of this plot from Chinchilla: Very importantly and encouragingly, the exponent on N (parameters) and D (tokens) is equal at ~=0.5, so just like Chinchilla we get a single (compute-independent) constant that relates the model size to token training horizons. In Chinchilla, this was measured to be 20. In nanochat it seems to be 8! Once we can train compute optimal models, I swept out a miniseries from d10 to d20, which are nanochat sizes that can do 2**19 ~= 0.5M batch sizes on 8XH100 node without gradient accumulation. We get pretty, non-itersecting training plots for each model size. Then the fun part is relating this miniseries v1 to the GPT-2 and GPT-3 miniseries so that we know we're on the right track. Validation loss has many issues and is not comparable, so instead I use the CORE score (from DCLM paper). I calculated it for GPT-2 and estimated it for GPT-3, which allows us to finally put nanochat nicely and on the same scale: The total cost of this miniseries is only ~$100 (~4 hours on 8XH100). These experiments give us confidence that everything is working fairly nicely and that if we pay more (turn the dial), we get increasingly better models. TLDR: we can train compute optimal miniseries and relate them to GPT-2/3 via objective CORE scores, but further improvements are desirable and needed. E.g., matching GPT-2 currently needs ~$500, but imo should be possible to do <$100 with more work. Full post with a lot more detail is here: github.com/karpathy/nanoc… And all of the tuning and code is pushed to master and people can reproduce these with scaling_laws .sh and miniseries .sh bash scripts.

3

578

Collov Labs@CollovLabs·9 Oca

This framing — optimizing for a family of models rather than a single checkpoint — feels underappreciated. The fact that nanochat recovers clean Chinchilla-style exponents (≈0.5 / 0.5) at such small scale is especially encouraging. It suggests scaling laws are structural, not an artifact of massive budgets.

Andrej Karpathy@karpathy

English

1

448

Andrej Karpathy@karpathy·8 Oca

New post: nanochat miniseries v1 The correct way to think about LLMs is that you are not optimizing for a single specific model but for a family models controlled by a single dial (the compute you wish to spend) to achieve monotonically better results. This allows you to do careful science of scaling laws and ultimately this is what gives you the confidence that when you pay for "the big run", the extrapolation will work and your money will be well spent. For the first public release of nanochat my focus was on end-to-end pipeline that runs the whole LLM pipeline with all of its stages. Now after YOLOing a few runs earlier, I'm coming back around to flesh out some of the parts that I sped through, starting of course with pretraining, which is both computationally heavy and critical as the foundation of intelligence and knowledge in these models. After locally tuning some of the hyperparameters, I swept out a number of models fixing the FLOPs budget. (For every FLOPs target you can train a small model a long time, or a big model for a short time.) It turns out that nanochat obeys very nice scaling laws, basically reproducing the Chinchilla paper plots: Which is just a baby version of this plot from Chinchilla: Very importantly and encouragingly, the exponent on N (parameters) and D (tokens) is equal at ~=0.5, so just like Chinchilla we get a single (compute-independent) constant that relates the model size to token training horizons. In Chinchilla, this was measured to be 20. In nanochat it seems to be 8! Once we can train compute optimal models, I swept out a miniseries from d10 to d20, which are nanochat sizes that can do 2**19 ~= 0.5M batch sizes on 8XH100 node without gradient accumulation. We get pretty, non-itersecting training plots for each model size. Then the fun part is relating this miniseries v1 to the GPT-2 and GPT-3 miniseries so that we know we're on the right track. Validation loss has many issues and is not comparable, so instead I use the CORE score (from DCLM paper). I calculated it for GPT-2 and estimated it for GPT-3, which allows us to finally put nanochat nicely and on the same scale: The total cost of this miniseries is only ~$100 (~4 hours on 8XH100). These experiments give us confidence that everything is working fairly nicely and that if we pay more (turn the dial), we get increasingly better models. TLDR: we can train compute optimal miniseries and relate them to GPT-2/3 via objective CORE scores, but further improvements are desirable and needed. E.g., matching GPT-2 currently needs ~$500, but imo should be possible to do <$100 with more work. Full post with a lot more detail is here: github.com/karpathy/nanoc… And all of the tuning and code is pushed to master and people can reproduce these with scaling_laws .sh and miniseries .sh bash scripts.

English

227

681

5.4K

708.2K

Collov Labs@CollovLabs·9 Oca

Love the “compute dial” framing. One subtle win here is showing that the N/D ≈ 0.5 balance holds even when training horizons and architectures are constrained — that’s nontrivial. Using CORE instead of validation loss to align miniseries across GPT-2/3 also feels like the right move. Do you think similar compute-optimal curves will hold once objectives move beyond next-token prediction (e.g. interaction or stateful settings)?

English

30

Collov Labs@CollovLabs·7 Oca

@shiri_shh @grok Definately

English

1

11

shirish@shiri_shh·6 Oca

X algorithm, only show this to people who is into > building side projects > early-stage startups > product design > shipping fast > startup founders > cracked developers building cool stuff on the internet for fun and freedom @grok I need more of these people on my timeline.

English

321

17

936

46.2K

Collov Labs@CollovLabs·7 Oca

On the video side, V-JEPA 2 pushes the same principle toward physical prediction + planning, mixing internet-scale video with a small amount of interaction data to get models that start to act, not just recognize. Ref: arxiv.org/abs/2301.08243 arxiv.org/abs/2506.09985

English

2

115

Collov Labs@CollovLabs·7 Oca

This is why we're excited about the recent JEPA line of thinking: instead of reconstructing pixels, you predict representations of missing regions. I-JEPA framed this cleanly for images (predict target-block embeddings from a context block, with masking strategies that force semantic abstractions).

English

0

1

140

Collov Labs@CollovLabs·7 Oca

World model ≠ next-token prediction A “world model” isn’t a bigger sequence model that predicts the next frame/token. It’s a latent state machine: it must compress observations into a state that persists, and it must learn dynamics that are stable enough to roll forward under actions.

English

439

Collov Labs@CollovLabs·18 Ara

@RuhanDong @collov_ai Thanks Ruhan!!

English

27

ruhan@RuhanDong·18 Ara

@CollovLabs @collov_ai Amazing work! 👏

English

0

1

32

Collov Labs@CollovLabs·17 Ara

🚀 Today we’re launching the @collov_ai Design Center —the world’s first interior design AI agent built for #realestate. Not a tool. Not a filter. An AI agent that understands space, style, and intent.

English

23

6

42

7K

Collov Labs@CollovLabs·17 Ara

@neo_lky @collov_ai Thanks @neo_lky for always being with us

English

75

neo@neo_lky·17 Ara

@CollovLabs @collov_ai This is the best team!

English

0

1

87

Collov Labs@CollovLabs·17 Ara

@brandonchen00 @collov_ai 🥰😊😜

QME

62

Brandon@brandonchen00·17 Ara

@CollovLabs @collov_ai Cool!

English

0

1

84

Collov Labs@CollovLabs·17 Ara

@laura_llin @collov_ai ✨✨✨✨

QME

102

Laura Lin@laura_llin·17 Ara

@CollovLabs @collov_ai Go😜

0

124

Collov Labs@CollovLabs·17 Ara

@BSteingraeber @collov_ai Yes please

English

1

77

Burt Steingraeber@BSteingraeber·17 Ara

@CollovLabs @collov_ai Would like to learn more about your app. Can I send you a DM here??

English

0

1

87

Collov Labs@CollovLabs·17 Ara

@daluoseo @collov_ai 🥰🩵🩷

QME

80

大罗SEO@daluoseo·17 Ara

@CollovLabs @collov_ai 厉害了👍

中文

0

2

411

Collov Labs@CollovLabs·17 Ara

@huangzhuoyicn @collov_ai Thanks @huangzhuoyicn for supporting us! Also looking forward to your product😊🥰

English

1

95

Zhuoyi Huang@huangzhuoyicn·17 Ara

@CollovLabs @collov_ai Amazing product, love the aesthetics!! @CollovLabs

English

0

1

116

Collov Labs@CollovLabs·17 Ara

@zhenthebuilder Thanks @zhenthebuilder 🥰🫶 appreciate the support, go @Replit as well!!

English

2

19

Zhen Li@zhenthebuilder·17 Ara

It's very cool that you can use AI agent to do interior design now, i wish i had this for my Pokémon secret base

Collov Labs@CollovLabs

🚀 Today we’re launching the @collov_ai Design Center —the world’s first interior design AI agent built for #realestate. Not a tool. Not a filter. An AI agent that understands space, style, and intent.

English

0

7

854

Collov Labs@CollovLabs·17 Ara

@kathrynwu1 @collov_ai Thanks @kathrynwu1 🥰🥰

English

140

Kathryn Wu@kathrynwu1·17 Ara

@CollovLabs @collov_ai Let’s goo

English