Paul Nazarov

334 posts

Paul Nazarov banner
Paul Nazarov

Paul Nazarov

@panazarov

Techlead at Yandex Taxi backend. Beach volleyball player. Like photography and psychology

Moscow Katılım Mart 2012
161 Takip Edilen36 Takipçiler
Paul Nazarov retweetledi
Mitchell Hashimoto
Mitchell Hashimoto@mitchellh·
I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out. I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really). It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely. The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture. We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying. I worry.
English
513
1.9K
15.2K
1.5M
Paul Nazarov retweetledi
pc
pc@pcshipp·
Claude: You've used 90% of session limit Me instantly:
pc tweet media
English
780
3.5K
67.7K
2.6M
Paul Nazarov retweetledi
Simon Willison
Simon Willison@simonw·
This is so confusing. Did Anthropic really just drop Claude Code from their $20/month plan? Why would they do that through a pricing page update without making a proper announcement? Plus, $20/month still gets you Cowork, which is just Claude Code wearing a non-threatening hat!
Simon Willison tweet media
English
187
87
1.6K
346.4K
Paul Nazarov retweetledi
Simon Willison
Simon Willison@simonw·
I upgraded my Claude token counter tool to compare different models and Opus 4.7 does appear to use 1.46x times the tokens for text and up to 3x the tokens for images - it's priced the same as Opus 4.6 on a per-token basis so this is actually a pretty big price bump
Simon Willison tweet media
English
113
143
1.6K
146.2K
Paul Nazarov retweetledi
Guri Singh
Guri Singh@heygurisingh·
NVIDIA just dropped a 120B parameter model that only uses 12B at inference. It's called Nemotron 3 Super. 60.47% on SWE-Bench Verified, highest open-weight model ever for real-world coding. 85.6% on PinchBench, best open model as an AI agent brain. 91.75% on RULER at 1M tokens while GPT-OSS-120B collapses to 22.3%. 2.2x faster than GPT-OSS-120B. 7.5x faster than Qwen3.5-122B. Here's what makes this different from every other open model: It fuses 3 architectures into one: → Mamba-2 layers for linear-time sequence processing → LatentMoE, a new expert routing system with 512 total experts, 22 active per token → Strategic Transformer attention layers as "global anchors" LatentMoE is the real breakthrough. It compresses tokens into a latent space before routing to experts. This cuts memory bandwidth and communication costs by 4x while activating MORE experts per token. More experts. Less compute. Better accuracy. The model was trained on 25 TRILLION tokens. Natively in 4-bit precision (NVFP4) from the very first gradient update. Not quantized after training. Trained in 4-bit from day one. Post-training used 21 different RL environments across math, code, STEM, safety, tool use, and long-horizon agentic tasks. It also has built-in speculative decoding via Multi-Token Prediction. Average acceptance length of 3.45 tokens per step, beating DeepSeek-R1's 2.70 across every category. No external draft model needed. The speed is baked into the architecture. CodeRabbit, Factory, and Greptile already shipped integrations. Open weights. Open datasets. Open training recipes. All on HuggingFace. 100% Open Source.
Guri Singh tweet media
English
45
66
441
40.2K
Paul Nazarov retweetledi
Claude
Claude@claudeai·
Introducing Claude Managed Agents: everything you need to build and deploy agents at scale. It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days. Now in public beta on the Claude Platform.
English
2.1K
6K
57K
21.6M
Paul Nazarov retweetledi
Afiz ⚡️
Afiz ⚡️@itsafiz·
LangChain just open-sourced Deep Agents—an agent harness that’s opinionated and ready-to-run out of the box. Instead of wiring up prompts, tools, and context management yourself, you get a working agent immediately and customize what you need. It’s an MIT-licensed system that’s perfect for anyone trying to understand how high-end coding agents are structured. @LangChain What’s inside the harness: - Planning: write_todos for task breakdown and progress tracking. - Filesystem: Full context control via read_file, write_file, edit_file, ls, glob, and grep. - Shell Access: execute for running commands (with sandboxing). - Sub-agents: task tool for delegating work with isolated context windows. - Smart Defaults: Optimized prompts that teach the model how to use these tools effectively. - Context Management: Auto-summarization for long threads and large outputs saved directly to files. Link in the comments
Afiz ⚡️ tweet media
English
17
51
298
23.6K
Paul Nazarov retweetledi
tweet davidson
tweet davidson@andyreed·
when claude creates subagents this is what i picture
tweet davidson tweet media
English
35
212
4.6K
88.6K
Paul Nazarov retweetledi
ᥙ𝓬ƙρы ß 𝓬ყᙏҽρƙαχ
@SwedPaul Сначала Claude сам пишет код, а потом просит 25 баксов чтобы найти в нем же ошибки. Хитро, ничего не скажешь
Русский
1
3
143
3.6K
Paul Nazarov retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.
Andrej Karpathy tweet media
English
335
562
6.5K
637.7K
Paul Nazarov retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Cool chart showing the ratio of Tab complete requests to Agent requests in Cursor. With improving capability, every point in time has an optimal setup that keeps changing and evolving and the community average tracks the point. None -> Tab -> Agent -> Parallel agents -> Agent Teams (?) -> ??? If you're too conservative, you're leaving leverage on the table. If you're too aggressive, you're net creating more chaos than doing useful work. The art of the process is spending 80% of the time getting work done in the setup you're comfortable with and that actually works, and 20% exploration of what might be the next step up even if it doesn't work yet.
Andrej Karpathy tweet media
Michael Truell@mntruell

x.com/i/article/2026…

English
209
333
3.9K
622.1K
Paul Nazarov retweetledi
Thariq
Thariq@trq212·
We've rolled out a new auto-memory feature. Claude now remembers what it learns across sessions — your project context, debugging patterns, preferred approaches — and recalls it later without you having to write anything down.
English
846
1.1K
15.9K
3.2M
Paul Nazarov retweetledi
Lydia Hallie ✨
Lydia Hallie ✨@lydiahallie·
Excited to announce Claude for Open Source ❤️ We're giving 6 months of free Claude Max 20x to open source maintainers and core contributors. If you maintain a popular project or contribute across open source, please apply! claude.com/contact-sales/…
English
584
1.4K
12.5K
1.8M
Paul Nazarov retweetledi
Anthropic
Anthropic@AnthropicAI·
New Engineering blog: We tasked Opus 4.6 using agent teams to build a C compiler. Then we (mostly) walked away. Two weeks later, it worked on the Linux kernel. Here's what it taught us about the future of autonomous software development. Read more: anthropic.com/engineering/bu…
English
856
2.5K
21.3K
8.5M
Paul Nazarov retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Claude has been running my nanochat experiments since morning. It writes implementations, debugs them with toy examples, writes tests and makes them fail/pass, launches training runs, babysits them by tailing logs and pulling stats from wandb, keeps a running markdown file of highlights, keeps a running record of runs and results so far, presents results in nice tables, we just finished some profiling, noticed inefficiencies in the optimizer resolved them and measured improvements. It looked at all PRs to the repo and categorized and prioritized them, made commits against some of them etc. I'm still very much in the loop. It made subtle mistakes that I had to point out. It got confused a few times and (amusingly) admitted that what it said was a "brain fart" (verbatim quote hah). It has missed a few ideas that I had to pitch. It made a bunch of bad design decisions that bloat the code and coupled abstractions that I had to revert. It's not perfect but I'm used to doing all of these things manually, so just seeing it running on the side cranking away at larger scope problems and coordinating all these flows in relatively coherent ways is definitely a new experience and a complete change of workflow.
English
80
192
3.3K
344.8K
Paul Nazarov retweetledi
Simon Willison
Simon Willison@simonw·
@mattyglesias Claude Code turns out to be misnamed: Anthropic thought it was a tool for software developers, it turns out it can help perform any task that can be accomplished by executing commands on a computer
English
42
42
1.3K
133.6K
Paul Nazarov retweetledi
Inżynier-jasnowidz
Inżynier-jasnowidz@vitali_today·
Напоминаю вам, друзья, из-за чего и ради чего выросли цены на оперативную память.
Русский
79
240
5.3K
184.5K
Paul Nazarov retweetledi
DeepSeek
DeepSeek@deepseek_ai·
🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents! 🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. 🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. 📄 Tech report: huggingface.co/deepseek-ai/De… 1/n
DeepSeek tweet media
English
920
2.5K
16.3K
5.3M
Paul Nazarov retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I’m starting to get into a habit of reading everything (blogs, articles, book chapters,…) with LLMs. Usually pass 1 is manual, then pass 2 “explain/summarize”, pass 3 Q&A. I usually end up with a better/deeper understanding than if I moved on. Growing to among top use cases. On the flip side, if you’re a writer trying to explain/communicate something, we may increasingly see less of a mindset of “I’m writing this for another human” and more “I’m writing this for an LLM”. Because once an LLM “gets it”, it can then target, personalize and serve the idea to its user.
English
597
1.1K
13.4K
2.9M
Paul Nazarov retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I took delivery of a beautiful new shiny HW4 Tesla Model X today, so I immediately took it out for an FSD test drive, a bit like I used to do almost daily for 5 years. Basically... I'm amazed - it drives really, really well, smooth, confident, noticeably better than what I'm used to on HW3 (my previous car) and eons ahead of the version I remember driving up highway 280 on my first day at Tesla ~9 years ago, where I had to intervene every time the road mildly curved or sloped. (note this is v13, my car hasn't been offered the latest v14 yet) On the highway, I felt like a passenger in some super high tech Maglev train pod - the car is locked in the center of the lane while I'm looking out from Model X's higher vantage point and its panoramic front window, listening to the (incredible) sound system, or chatting with Grok. On city streets, the car casually handled a number of tricky scenarios that I remember losing sleep over just a few years ago. It negotiated incoming cars in tight lanes, it gracefully went around construction and temporarily in-lane stationary cars, it correctly timed tricky left turns with incoming traffic from both sides, it gracefully gave way to the car that went out of order in the 4-way stop sign, it found a way to squeeze into a bumper to bumper traffic to make its turn, it overtook the bus that was loading passengers but still stopped for the stop sign that was blocked by the bus, and at the end of the route it circled around a parking lot, found a spot and... parked. Basically a flawless drive. For context, I'm used to going out for a brief test drive around the neighborhood to return with 20 clips of things that could be improved. It's new for me to do just that and exactly like I used to, but come back with nothing. Perfect drive, no notes. I expect there's still more work for the team in the long march of 9s, but it's just so cool to see that we're beyond finding issues on any individual ~1 hour drive around the neighborhood, you actually have to go to the fleet and mine them. Back then, I processed the incredible promise of vehicle autonomy at scale (in the fully scaleable, vision only, end-to-end Tesla way) only intellectually, but now it is possible to feel it intuitively too if you just go out for a drive. Wait, of course surround video stream at 60Hz processed by a fully dedicated "driving brain" neural net will work, and it will be so much better and safer than a human driver. Did anyone else think otherwise? I also watched @aelluswamy 's new ICCV25 talk last week (x.com/aelluswamy/sta…) that hints at some of the recent under the hood technical components driving this progress. Sensor streams (videos, maps, kinematics, audio, ...) over long contexts (e.g. ~30 seconds) go into a big neural net, steering/acceleration comes out, optionally with visualization auxiliary data. This is the dream of the complete Software 1.0 -> Software 2.0 re-write that scales fully with data streaming from millions of cars in the fleet and the compute capacity of your chip, not some engineer's clever new DoubleParkedCarHandler C++ abstraction with undefined test-time characteristics of memory and runtime. There's a lot more hints in the video on where things are going with the emerging "robotics+AI at scale stack". World reconstructors, world simulators "dreaming" dynamics, RL, all of these components general, foundational, neural net based, how the car is really just one kind of robot... are people getting this yet? Huge congrats to the team - you're building magic objects of the future, you rock! And I love my car <3.
English
950
2.8K
27.7K
17.9M