Andrey Kolesnikov
251 posts

Andrey Kolesnikov
@minviable_org
Dad, husband, immigrant, nerd. Built, bought and sold companies. Default to code, law degree is a bonus.
San Francisco, CA انضم Mayıs 2026
95 يتبع57 المتابعون

At $NVDA GTC/Computex in Taipei:
I think we’ll hear about the next AI bottleneck.
That’s owned by a .6 P/B potato farming company in Japan, with a 180 year history.
Their owner cooks those potatoes in night markets for 160 yen a piece.
But that same potato farming equipment used to grow potatoes with optimal sunlight.
Is now required for optical alignment requirements for CPO.
And their unique cooking technique is mandatory to address thermal requirements for Rubin.
Can anyone guess?
English

@asaio87 I’m one of those idiots I guess.
1M MAU app. Most of our growth and feature innovation came after AI usage spread throughout the company. Exited too, made investors happy. All AI.
English

@DanielSmidstrup They are capacity constrained. The rest of the world is fine using Nvidia, for Google it’s declaring an L in GPU race. Hence the focus on TPU8
English

Finally got my visa sorted out and moving to San Francisco, just in time for MS Build and OpenClaw’s after hours! luma.com/OpenClaw-GitHub
English

For the last 6 years I’ve been buying well-run small businesses for 5x earnings.
In the first 30 days, I take the websites offline, move the companies to sad office parks with drop ceilings, install fax machines at the front desk, and bringing in 75 year old actors to pose as the CEO.
I then sell the companies to people with MBAs for 10x revenue so that they can feel useful “turning the company around”
English

@TheChiefNerd BG is spot on with his Frankenstein take. Loudest case for local AI - freedom of intelligence.
English

@antirez How can you forget when reminders are everywhere. I’m keep cancelling subscriptions, banks and other services that haven’t evolved. Some are still running JSX, ASP and other ancient frameworks.
English

@witcheer @NousResearch 27b is not really designed for multi-step and context recall. Something bigger needs to feed it isolated chunks of bound context, it rips.
English

Which local LLM best drives an agent?
I built a benchmark for pairing models with Hermes Agent (@NousResearch) - a CodeAct agent that writes Python to call its tools, not JSON function calls.
4 models, RTX 5090, tested under Hermes's real system prompt.
~~ here is the final leaderboard:
🥇 Qwopus-18B — 92.7
🥈 Qwen3.6-27B — 92.4
🥉 Nemotron-Cascade-2-30B — 90.5
4️⃣ Hermes-4.3-36B — 84.3
~~ no model wins all four axes:
- Qwen 27B = perfect multi-step loops + instruction-following, but weakest long-context recall (~70%)
- Nemotron + Qwopus = flawless long-context (100%) but worst at multi-step (50%)
- Hermes 36B = solid, but OOMs at 64K context on 32GB → that 0 tanks its score
the "best agent model" genuinely depends on your workload.
~~ methodology
most "function-calling" benchmarks score JSON tool calls. Hermes is code-as-action, which means that the model writes Python.
I tested that, under the real ~3.5K-token agent prompt.
English

@TheAhmadOsman traffic-driven quant downcast. PoS as the inverse of QoS.
English

Opus 4.8 could be the same nerfed opus 4.6 in 4bit rather than 1.58bit 🤡
I don't trust those clowns
Don't waste your money on a Claude Max subscription, they will keep rugpulling you

Ahmad@TheAhmadOsman
Claude Code is so good at night/early morning before they start serving it quantized at 1.58-bit for the masses 🤡
English

@LottoLabs @ntbrown01 It is wicked fast, but they need to polish edges around reliability.
English

@LottoLabs 27b-written code will power the software innovation of the next few years. Talk about a model punching way above its weight. Pun intended.
English

@dakshgup cpu compute that generates training data for subsequent gpu evolution
English

@Hikari_07_jp Try @papercliping , helped me regain my sanity. It supports Hermes, which I only use directly for esoteric hand surgeries now.
English

I have 64Gb low cas non-ECC and it’s fine. Color me uneducated, I honestly don’t know why have more RAM (unless it’s Mac). CPU with large cache is much more consequential, X3D edges noticeably on Ryzen builds.
Choke is cross-card tensor parallelism over PCIe and not having NVLink. If 6ks had NVLink it would cannibalize a lot of their DC market.
English

Catching up on latest @theallinpod. It seems like local AI is becoming mainstream. I feel seen.
English

@rohitdotmittal What do you mean by AI? Camera roll, noise cancelling and sms code from messages are AI and we cant function without those.
English

@usr_bin_roygbiv Always run if HM interview is 30 min. This is exactly how much time they’d invest in you.
English












